š§¾ Extract Colombian invoice data from Drive to Sheets with GPT-4o
ā” 167 views Ā· š§¾ Invoice Processing
š” Pro Tip ā HTTP Request scraping tends to break when sites update their markup. If youāre scraping a major platform, check if ScraperNode covers it ā it has maintained scrapers for LinkedIn, Instagram, TikTok, YouTube, and 20+ other platforms that return structured data.
Description
Automate Colombian Cashflow Data Extraction to Google Sheets with AI
Whoās it for
This workflow is designed for finance professionals, accountants, small business owners in Colombia, or anyone needing to automate the extraction of invoice data and its entry into Google Sheets. Itās particularly useful for handling Colombian tax and legal specifics.
How it works / What it does
This workflow automates the process of extracting critical data from invoices and receipts (PDFs and JPEGs) and organizing it in a Google Sheet:
- Triggers: The workflow initiates when a new file is created or an existing file is updated in a designated Google Drive folder.
- File Handling: It first downloads the detected file.
- Routing: A āSwitchā node intelligently routes the file based on its extension ā one path for PDFs and another for JPEGs.
- Data Extraction:
- For PDF files, it directly extracts all text content from the document.
- For JPEG image files, it utilizes an AI Agent (Azure OpenAI) to process the image and extract its textual content.
- AI-Powered Reasoning: Two āReasoning Agentā nodes (Azure OpenAI Chat Models) act as a specialized āColombian Tax and Legal Extraction Agentā. They parse the extracted text from invoices to pull out structured data in JSON format, including:
- Vendor name.
- Modification date.
- Line items with detailed description, sub_total, iva_value, total_amount, category, and sub_category.
- Specific Colombian tax fields like Retefuente and ReteICA.
- The number of items generated.
- Output Parsing: A āStructured Output Parserā node ensures that the AIās output strictly adheres to a predefined JSON schema, guaranteeing consistent data formatting.
- Data Preparation: āEdit Fieldā nodes ensure the AIās extracted data is in a valid format.
- Item Splitting: āSplit dataā nodes separate the āitemsā array from the AIās output, allowing each individual line item from the invoice to be processed as a separate entry for the Google Sheet.
- Google Sheet Integration: Finally, āFill Templateā nodes append the fully processed invoice data (per line item) into your designated Google Sheet.
How to set up
- Google Drive Credentials: Ensure you have configured your Google Drive OAuth2 API credentials in n8n.
- Azure OpenAI Credentials: Set up your Azure OpenAI API credentials, ensuring access to models like
gpt-4o. Or you can simply use your traditional OpenAI or others LLMs. - Google Sheets Credentials: Configure your Google Sheets OAuth2 API credentials.
- Google Drive Folder ID: In the ā1a. Updated file triggerā and ā1b. Created file triggerā nodes, update the
folderToWatchparameter with your specific Google Drive Folder ID. - Google Sheet ID and Sheet Name: In the ā8. Fill Templateā and ā8. Fill Template1ā nodes, update the
documentIdandsheetNameparameters with your specific Google Sheet ID and the name of the sheet where data should be appended.
Requirements
- An active n8n instance.
- A Google Drive account for file uploads.
- A Google Sheets account for data storage.
- An Azure OpenAI account with access to chat models (e.g.,
gpt-4o) for the āAzure OpenAI Chat Modelā nodes and āExtract Data Agentā.
How to customize the workflow
- AI Extraction Prompts: Modify the prompt instructions in the ā5. Reasoning Agentā and ā5. Reasoning Agent1ā nodes if you need to extract different data points or change the output format.
- Google Sheet Column Mappings: Adjust the
columnsmapping in the ā8. Fill Templateā and ā8. Fill Template1ā nodes to match your specific Google Sheet headers and data requirements. - File Types: Extend the ā3. Routeā node to handle additional file types (e.g., DOCX, PNG) by adding new conditions and corresponding extraction nodes.
š Nodes Used
Google Sheets, Google Drive, Google Drive Trigger, Basic LLM Chain, Structured Output Parser, Extract from File
š„ Import
Download workflow.json and import into n8n:
Workflow menu ā Import from File