๐Ÿงพ Extract invoice data from PDFs to JSON with Gemini AI and XML transformation

โšก 1,302 views ยท ๐Ÿงพ Invoice Processing

Description

This n8n workflow converts invoices in PDF format into a structured, ready-to-use JSON, using AI and XML transformation โ€” without writing any code.

๐Ÿš€ How it works

  1. Upload form โ†’ The user uploads a PDF file.

  2. Text extraction โ†’ The PDF content is extracted as plain text.

  3. XML schema definition โ†’ A standard invoice structure is defined with fields such as:

    • Invoice number
    • Customer and issuer details
    • Items with description, quantity, and price
    • Totals and taxes
    • Bank account details
  4. AI (Gemini) โ†’ The model rewrites the PDF text into a valid XML following the predefined schema.

  5. XML cleanup โ†’ Removes extra tags, line breaks, and unnecessary formatting.

  6. JSON conversion โ†’ The XML is transformed into a clean, structured JSON object, ready for integrations, APIs, or storage.

โœจ Benefits

๐Ÿ› ๏ธ Use cases

๐Ÿ”— Nodes Used

n8n Form Trigger, Extract from File, Google Gemini

๐Ÿ“ฅ Import

Download workflow.json and import into n8n: Workflow menu โ†’ Import from File

๐Ÿ“– Importing guide ยท ๐Ÿ”‘ Credential setup