πŸ” Automate document ingestion & RAG system with Google Drive, Sheets & OpenAI

⚑ 1,600 views Β· πŸ” AI RAG & Knowledge Retrieval

Description

1. Overview

The IngestionDocs workflow is a fully automated document ingestion and knowledge management system built with n8n. Its purpose is to continuously ingest organizational documents from Google Drive, transform them into vector embeddings using OpenAI, store them in Pinecone, and make them searchable and retrievable through an AI-powered Q&A interface.

This ensures that employees always have access to the most up-to-date knowledge base without requiring manual intervention.


2. Key Objectives


3. Workflow Breakdown

A. Document Monitoring and Retrieval


B. Record Management (Google Sheets Integration)

To keep track of ingestion states, the workflow uses a Google Sheetsβ€”based Record Manager:\

This guarantees that only new or modified content is processed, avoiding duplication.


C. Document Processing and Vectorization

Once a document is marked as new or updated:\

  1. Default Data Loader extracts its content (binary files supported).\
  1. Recursive Character Text Splitter divides the content into manageable segments with overlap.\
  2. OpenAI Embeddings (text-embedding-3-large) transform each text chunk into a semantic vector.\
  3. Pinecone Vector Store stores these vectors in the configured index:\

This process builds a scalable and queryable knowledge base.


D. Knowledge Base Q&A Interface

The workflow also provides an interactive form-based user interface:\

This creates a self-service knowledge base assistant that employees can query in natural language.


4. Technologies Used


5. End-to-End Data Flow

  1. Employee uploads or updates a document β†’ Google Drive detects the change.\
  2. Workflow downloads and hashes the file β†’ Ensures uniqueness and detects modifications.\
  3. Record Manager (Google Sheets) β†’ Decides whether to skip, insert, or update the record.\
  4. Document Processing β†’ Splitting + Embedding + Storing into Pinecone.\
  5. Knowledge Base Updated β†’ The latest version of documents is indexed.\
  6. Employee asks a question via the web form.\
  7. AI Agent retrieves embeddings from Pinecone + uses GPT-4.1-mini β†’ Generates a contextual answer.\
  8. Answer displayed in styled HTML β†’ Delivered back to the employee through the form interface.

6. Benefits


βœ… In summary, IngestionDocs is a robust AI-driven document ingestion and retrieval system that integrates Google Drive, Google Sheets, OpenAI, and Pinecone within n8n. It continuously builds and maintains a knowledge base of manuals while offering employees an intelligent, user-friendly Q&A assistant for fast and accurate knowledge retrieval.

πŸ”— Nodes Used

Google Sheets, Google Drive, Google Drive Trigger, AI Agent, Embeddings OpenAI, OpenAI Chat Model

πŸ“₯ Import

Download workflow.json and import into n8n: Workflow menu β†’ Import from File

πŸ“– Importing guide Β· πŸ”‘ Credential setup