πŸ“– Interactive knowledge base chat with Supabase RAG using AI πŸ“šπŸ’¬

⚑ 5,388 views Β· πŸ“– Internal Wiki & Knowledge Base

Description

Google Drive File Ingestion to Supabase for Knowledge Base πŸ“‚πŸ’Ύ

Overview 🌟

This n8n workflow automates the process of ingesting files from Google Drive into a Supabase database, preparing them for a knowledge base system. It supports text-based files (PDF, DOCX, TXT, etc.) and tabular data (XLSX, CSV, Google Sheets), extracting content, generating embeddings, and storing data in structured tables. This is a foundational workflow for building a company knowledge base that can be queried via a chat interface (e.g., using a RAG workflow). πŸš€

Problem Solved 🎯

Manually managing a knowledge base with files from Google Drive is time-consuming and error-prone. This workflow solves that by:

Target Audience:

Workflow Description πŸ”

This workflow listens for new or updated files in Google Drive, processes them based on their type, and stores the extracted data in Supabase tables for later retrieval. Here’s how it works:

  1. File Detection: Triggers when a file is created or updated in Google Drive.
  2. File Processing: Loops through each file, extracts metadata, and validates the file type.
  3. Duplicate Check: Ensures the file hasn’t been processed before.
  4. Content Extraction:
    • Text-based Files: Downloads the file, extracts text, splits it into chunks, generates embeddings, and stores the chunks in Supabase.
    • Tabular Files: Extracts data from spreadsheets and stores it as rows in Supabase.
  5. Metadata Storage: Stores file metadata and basic info in Supabase tables.
  6. Error Handling: Logs errors for unsupported formats or duplicates.

Nodes Breakdown πŸ› οΈ

1. Detect New File πŸ””

2. Detect Updated File πŸ””

3. Process Each File πŸ”„

4. Extract File Metadata πŸ†”

5. Check File Type βœ…

6. Find Duplicates πŸ”

7. Handle Duplicates πŸ”„

8. Remove Old Text Data πŸ—‘οΈ

9. Remove Old Data πŸ—‘οΈ

10. Route by File Type πŸ”€

11. Download File Content πŸ“₯

12. Extract PDF Text πŸ“œ

13. Extract DOCX Text πŸ“œ

14. Extract XLSX Data πŸ“Š

15. Split Text into Chunks βœ‚οΈ

16. Generate Text Embeddings 🌐

17. Store Text in Supabase πŸ’Ύ

18. Store Tabular Data πŸ’Ύ

19. Store File Metadata πŸ“‹

20. Record in Knowledge Base πŸ“š

21. Log File Errors ⚠️

22. Log Duplicate Errors ⚠️

Interactive Knowledge Base Chat with Supabase RAG using GPT-4o-mini πŸ“šπŸ’¬

Introduction 🌟

This n8n workflow creates an interactive chat interface that allows users to query a company knowledge base using Retrieval-Augmented Generation (RAG). It retrieves relevant information from text documents and tabular data stored in Supabase, then generates natural language responses using OpenAI’s GPT-4o-mini model. Designed for teams managing internal knowledge, this workflow enables users to ask questions like β€œWhat’s the remote work policy?” or β€œShow me the latest budget data” and receive accurate, context-aware responses in a conversational format. πŸš€

Problem Statement 🎯

Managing a company knowledge base can be a daunting taskβ€”employees often struggle to find specific information buried in documents or spreadsheets, leading to wasted time and inefficiencies. Traditional search methods may not understand natural language queries or provide contextually relevant results. This workflow solves these issues by:

Target Audience πŸ‘₯

This workflow is ideal for:

Workflow Description πŸ”

This workflow consists of a chat interface powered by n8n’s Chat Trigger node, an AI Agent node for RAG, and several tools to retrieve data from Supabase. Here’s how it works step-by-step:

  1. User Initiates a Chat: The user interacts with a chat interface, sending queries like β€œSummarize our remote work policy” or β€œShow budget data for Q1 2025.”
  2. Query Processing with RAG: The AI Agent processes the query using RAG, retrieving relevant data from Supabase tables and generating a response with OpenAI’s GPT-4o-mini model.
  3. Data Retrieval and Response Generation: The workflow uses multiple tools to fetch data:
    • Retrieves text chunks from the documents table using vector search.
    • Fetches tabular data from the document_rows table based on file IDs.
    • Extracts full document text or lists available files as needed.
    • Generates a natural language response combining the retrieved data.
  4. Conversation History Management: Stores the conversation history in Supabase to maintain context for follow-up questions.
  5. Response Delivery: Formats and sends the response back to the chat interface for the user to view.

Nodes Breakdown πŸ› οΈ

1. Start Chat Interface πŸ’¬

2. Process Query with RAG 🧠

3. Retrieve Text Chunks πŸ“„

4. Fetch Tabular Data πŸ“Š

5. Extract Full Document Text πŸ“œ

6. List Available Files πŸ“‹

7. Manage Chat History πŸ’Ύ

8. Format and Send Response πŸ“€

Setup Instructions πŸ› οΈ

Prerequisites πŸ“‹

  1. n8n Setup:
    • Ensure you’re using n8n version 1.0 or higher.
    • Enable the AI features in n8n settings.
  2. Supabase:
    • Create a Supabase project and set up the following tables:
      • documents: id (uuid), content_text (text), embedding (vector(1536)), metadata (jsonb)
      • document_rows: id (uuid), dataset_id (varchar), row_data (jsonb)
      • document_metadata: file_id (varchar), file_name (varchar), file_type (varchar), file_url (text)
      • knowledge_base: id (serial), file_id (varchar), file_name (varchar), file_type (varchar), file_url (text), upload_date (timestamp)
      • n8n_chat_history: id (serial), session_id (varchar), message (text), sender (varchar), timestamp (timestamp)
    • Add the match_documents function to Supabase to enable vector search:
      CREATE OR REPLACE FUNCTION match_documents (
        query_embedding vector(1536),
        match_count int DEFAULT 5,
        filter jsonb DEFAULT '{}'
      ) RETURNS TABLE (
        id uuid,
        content_text text,
        metadata jsonb,
        similarity float
      ) LANGUAGE plpgsql AS $$  
      BEGIN
        RETURN QUERY
        SELECT
          documents.id,
          documents.content_text,
          documents.metadata,
          1 - (documents.embedding <=> query_embedding) as similarity
        FROM documents
        WHERE documents.metadata @> filter
        ORDER BY similarity DESC
        LIMIT match_count;
      END;
        $$;

πŸ”— Nodes Used

Postgres, Google Drive, Gmail, Google Drive Trigger, Supabase, AI Agent

πŸ“₯ Import

Download workflow.json and import into n8n: Workflow menu β†’ Import from File

πŸ“– Importing guide Β· πŸ”‘ Credential setup