πŸ”¬ Build a PDF search system with Mistral OCR and Weaviate DB

⚑ 1,346 views Β· πŸ”¬ Document Extraction & Analysis

Description

Build a PDF to Vector RAG System: Mistral OCR, Weaviate Database and MCP Server

A comprehensive RAG (Retrieval-Augmented Generation) workflow that transforms PDF documents into searchable vector embeddings using advanced AI technologies.

πŸš€ Features

πŸ› οΈ Technologies Used

πŸ“‹ Prerequisites

Before using this template, you’ll need to set up the following credentials:

  1. Mistral Cloud API: For PDF text extraction
  2. Weaviate API: For vector database operations
  3. Cohere API: For embeddings and reranking
  4. HTTP Header Auth: For MCP server authentication

πŸ”§ Setup Instructions

  1. Import the template into your n8n instance
  2. Configure credentials for all required services
  3. Set up Weaviate collection named β€œKnowledgeDocuments”
  4. Configure webhook paths for the MCP server and form trigger
  5. Test the workflow by uploading a PDF document

πŸ“Š Workflow Overview

PDF Upload β†’ Text Extraction β†’ Document Processing β†’ Vector Storage β†’ AI Search
     ↓              ↓                ↓                ↓              ↓
  Form Trigger β†’ Mistral OCR β†’ Prepare Metadata β†’ Weaviate DB β†’ MCP Server

🎯 Use Cases

⚠️ Important Notes

πŸ“ License

This template is provided as-is for educational and commercial use.

πŸ”— Nodes Used

Embeddings Cohere, Recursive Character Text Splitter, n8n Form Trigger, Default Data Loader, MCP Server Trigger, Reranker Cohere

πŸ“₯ Import

Download workflow.json and import into n8n: Workflow menu β†’ Import from File

πŸ“– Importing guide Β· πŸ”‘ Credential setup