πŸ” Process documents & build semantic search with OpenAI, Gemini & Qdrant

⚑ 983 views Β· πŸ” AI RAG & Knowledge Retrieval

Description

🎯 Overview

This n8n workflow automates the process of ingesting documents from multiple sources (Google Drive and web forms) into a Qdrant vector database for semantic search capabilities. It handles batch processing, document analysis, embedding generation, and vector storage - all while maintaining proper error handling and execution tracking.

πŸš€ Key Features

πŸ“‹ Use Cases

πŸ”§ Workflow Components

Input Methods

  1. Google Drive Integration

    • Monitors a specific folder for new files
    • Processes existing files in batch mode
    • Supports automatic file conversion to PDF
  2. Web Form Upload

    • Public-facing form for document submission
    • Accepts PDF, DOCX, DOC, and CSV files
    • Processes multiple file uploads in a single submission

Processing Pipeline

  1. File Splitting: Separates multiple uploads into individual items
  2. Document Analysis: Google Gemini extracts document understanding
  3. Text Extraction: Converts documents to plain text
  4. Embedding Generation: Creates vector embeddings via OpenAI
  5. Vector Storage: Inserts documents with embeddings into Qdrant
  6. Loop Control: Manages batch processing with proper state handling

Key Nodes

πŸ› οΈ Technical Implementation

Batch Processing Logic

The workflow uses a clever looping mechanism:

Error Handling

Data Flow

Form Upload β†’ Split Files β†’ Batch Loop β†’ Analyze β†’ Insert β†’ Loop Back
Google Drive β†’ List Files β†’ Batch Loop β†’ Download β†’ Analyze β†’ Insert β†’ Delete β†’ Loop Back

πŸ“Š Performance Considerations

πŸ” Required Credentials

  1. Google Drive OAuth2: For file access and management
  2. OpenAI API: For embedding generation
  3. Qdrant API: For vector database operations
  4. Google Gemini API: For document analysis

πŸ’‘ Implementation Tips

  1. Start Small: Test with a few files before processing large batches
  2. Monitor Costs: Track OpenAI API usage for embedding generation
  3. Backup First: Consider archiving instead of deleting processed files
  4. Check Collections: Ensure Qdrant collection exists before running

🎨 Customization Options

πŸ“ˆ Real-World Application

This workflow was developed to process business documents and legal agreements, making them searchable through semantic queries. It’s particularly useful for organizations dealing with large volumes of regulatory documentation that need to be quickly accessible and searchable.

Chat Interface Testing

The integrated chatbot interface allows users to:

🌟 Benefits

πŸ‘¨β€πŸ’» About the Creator

Jeremy Dawes is the CEO of Jezweb, specializing in AI and automation deployment solutions. This workflow represents practical, production-ready automation that solves real business challenges while maintaining simplicity and reliability.

πŸ“ Notes

πŸ”— Resources


This workflow demonstrates practical automation that bridges document management with modern AI capabilities, creating intelligent document processing systems that scale with your needs.

πŸ”— Nodes Used

Google Drive, Google Drive Trigger, AI Agent, Embeddings OpenAI, Simple Memory, Recursive Character Text Splitter

πŸ“₯ Import

Download workflow.json and import into n8n: Workflow menu β†’ Import from File

πŸ“– Importing guide Β· πŸ”‘ Credential setup