đź“– Auto-update knowledge base with Drive, LlamaIndex & Azure OpenAI embeddings

⚡ 429 views · 📖 Internal Wiki & Knowledge Base

Description

This Workflow auto-ingests Google Drive documents, parses them with LlamaIndex, and stores Azure OpenAI embeddings in an in-memory vector store—cutting manual update time from ~30 minutes to under 2 minutes per doc.

Why Use This Workflow?

Cost Reduction: Eliminates pays monthly fee on cloud just for store knowledge

Ideal For

How It Works

  1. Trigger: Google Drive Trigger watches a specific document or folder for updates.
  2. Data Collection: The updated file is downloaded from Google Drive.
  3. Processing: The file is uploaded to LlamaIndex cloud via an HTTP Request to create a parsing job.
  4. Intelligence Layer: Workflow polls LlamaIndex job status (Wait + Monitor loop). If parsing status equals SUCCESS, the result is retrieved as markdown.
  5. Output & Delivery: Parsed markdown is loaded into LangChain’s Default Data Loader, passed to Azure OpenAI embeddings (deployment “3small”), then inserted into an in-memory vector store.
  6. Storage & Logging: Vector store holds embeddings in memory (good for prototyping). Optionally persist to an external vector DB for production.

Setup Guide

Prerequisites

RequirementTypePurpose
n8n instanceEssentialExecute and import the workflow — use the n8n instance
Google Drive OAuth2EssentialWatch and download documents from Google Drive
LlamaIndex Cloud APIEssentialParse and convert documents to structured markdown
Azure OpenAI AccountEssentialGenerate embeddings (deployment configured to model name “3small”)
Persistent Vector DB (e.g., Pinecone)OptionalPersist embeddings for production-scale search

Installation Steps

  1. Import the workflow JSON into your n8n instance: open your n8n instance and import the file.
  2. Configure credentials:
    • Azure OpenAI: Provide Endpoint, API Key and set deployment name.
    • LlamaIndex API: Create an HTTP Header Auth credential in n8n. Header Name: Authorization. Header Value: Bearer YOUR_API_KEY.
    • Google Drive OAuth2: Create OAuth 2.0 credentials in Google Cloud Console, enable Drive API, and configure the Google Drive OAuth2 credential in n8n.
  3. Update environment-specific values:
    • Replace the workflow’s Google Drive fileId with the GUID or folder ID you want to watch (do not commit public IDs).
  4. Customize settings:
    • Polling interval (Wait node): adjust for faster or slower job status checks.
    • Target file or folder: toggled on the Google Drive Trigger node.
    • Embedding model: change Azure OpenAI deployment if needed.
  5. Test execution:
    • Save changes and trigger a sample file update on Drive. Verify each node runs and the vector store receives embeddings.

Technical Details

Core Nodes

NodePurposeKey Configuration
Knowledge Base Updated Trigger (Google Drive Trigger)Triggers on file/folder changesSet trigger type to specific file or folder; configure OAuth2 credential
Download Knowledge Document (Google Drive)Downloads file binaryOperation: download; ensure OAuth2 credential is selected
Parse Document via LlamaIndex (HTTP Request)Uploads file to LlamaIndex parsing endpointPOST multipart/form-data to /parsing/upload; use HTTP Header Auth credential
Monitor Document Processing (HTTP Request)Polls parsing job statusGET /parsing/job/{{jobId}}; check status field
Check Parsing Completion (If)Branches on job statusCondition: {{$json.status}} equals SUCCESS
Retrieve Parsed Content (HTTP Request)Fetches parsed markdown resultGET /parsing/job/{{jobId}}/result/markdown
Default Data Loader (LangChain)Loads parsed markdown into document formatUse as document source for embeddings
Embeddings Azure OpenAIGenerates embeddings for documentsCredentials: Azure OpenAI; Model/Deployment: 3small
Insert Data to Store (vectorStoreInMemory)Stores documents + embeddingsUse memory store for prototyping; switch to DB for persistence

Workflow Logic

Customization Options

Basic Adjustments:

Advanced Enhancements:

Scaling option:

Performance & Optimization

MetricExpected PerformanceOptimization Tips
Execution time (per doc)~10s–2min (depends on file size & LlamaIndex processing)Chunk large docs; run embeddings in batches
API calls (per doc)3–8 (upload, poll(s), retrieve, embedding calls)Increase poll interval; consolidate requests
Error handlingRetries via Wait loop and If checksAdd exponential backoff, failure notifications, and retry limits

Troubleshooting

ProblemCauseSolution
Authentication errorsInvalid/missing credentialsReconfigure n8n Credentials; do not paste API keys directly into nodes
File not foundIncorrect fileId or permissionsVerify Drive fileId and OAuth scopes; share file with the service account if needed
Parsing stuck in PENDINGLlamaIndex processing delay or rate limitIncrease Wait node interval, monitor LlamaIndex dashboard, add retry limits
Embedding failuresModel/deployment mismatch or quota limitsConfirm Azure deployment name (3small) and subscription quotas

Created by: khmuhtadin
Category: Knowledge Management Tags: google-drive, llamaindex, azure-openai, embeddings, knowledge-base, vector-store

Need custom workflows? Contact us

đź”— Nodes Used

HTTP Request, Google Drive, Google Drive Trigger, Simple Vector Store, Default Data Loader, Embeddings Azure OpenAI

📥 Import

Download workflow.json and import into n8n: Workflow menu → Import from File

📖 Importing guide · 🔑 Credential setup