๐Ÿ”ฌ Extract and structure Thai documents to Google Sheets using Typhoon OCR and Llama 3.1

โšก 3,545 views ยท ๐Ÿ”ฌ Document Extraction & Analysis

Description

20250522_93100.png โš ๏ธ Note: This template requires a community node and works only on self-hosted n8n installations. It uses the Typhoon OCR Python package and custom command execution. Make sure to install required dependencies locally.


Who is this for?

This template is for developers, operations teams, and automation builders in Thailand (or any Thai-speaking environment) who regularly process PDFs or scanned documents in Thai and want to extract structured text into a Google Sheet.

It is ideal for:


What problem does this solve?

Typhoon OCR is one of the most accurate OCR tools for Thai text. However, integrating it into an end-to-end workflow usually requires manual scripting and data wrangling.

This template solves that by:


What this workflow does

  1. Trigger: Run manually or from any automation source
  2. Read Files: Load local PDF files from a doc/ folder
  3. Execute Command: Run Typhoon OCR on each file using a Python command
  4. LLM Extraction: Send the OCR markdown to an AI model (e.g., GPT-4 or OpenRouter) to extract fields
  5. Code Node: Parse the LLM output as JSON
  6. Google Sheets: Append structured data into a spreadsheet

Setup

1. Install Requirements

2. Create folders

3. Google Sheet

Create a Google Sheet with the following column headers:

book_iddatesubjectdetailsigned_bysigned_by2contactdownload_url

You can use this example Google Sheet as a reference.

4. API Key

Export your TYPHOON_OCR_API_KEY and OPENAI_API_KEY in your environment (or set inside the command string in Execute Command node).


How to customize this workflow


About Typhoon OCR

Typhoon is a multilingual LLM and toolkit optimized for Thai NLP. It includes typhoon-ocr, a Python OCR library designed for Thai-centric documents. It is open-source, highly accurate, and works well in automation pipelines. Perfect for government paperwork, PDF reports, and multilingual documents in Southeast Asia.


๐Ÿ”— Nodes Used

Google Sheets, Basic LLM Chain, Read/Write Files from Disk, OpenRouter Chat Model

๐Ÿ“ฅ Import

Download workflow.json and import into n8n: Workflow menu โ†’ Import from File

๐Ÿ“– Importing guide ยท ๐Ÿ”‘ Credential setup