๐ฌ Process OCR documents from Google Drive into searchable knowledge base with OpenAI & Pinecone
โก 258 views ยท ๐ฌ Document Extraction & Analysis
Description
How it works
This workflow automates a full RAG ingestion pipeline. When a new OCR JSON file is added to a Google Drive folder, the workflow extracts lesson metadata, parses and cleans the Arabic text, generates semantic chunks, creates AI embeddings, and stores them in a Pinecone vector index. After processing, the file is automatically moved to an archive folder to prevent duplicates.
Set up steps
Follow the sticky notes inside the workflow for detailed instructions.
- Connect your Google Drive credentials.
- Replace the input folder ID and archive folder ID with your own.
- Connect your OpenAI account for embeddings.
- Connect your Pinecone API key and select your index.
The workflow is ready to run once credentials and folder paths are configured.
๐ Nodes Used
Google Drive, Google Drive Trigger, Embeddings OpenAI, Recursive Character Text Splitter, Pinecone Vector Store, Default Data Loader
๐ฅ Import
Download workflow.json and import into n8n:
Workflow menu โ Import from File