🔬 Build document RAG system with Kimi-K2, Gemini embeddings and Qdrant

⚡ 2,035 views · 🔬 Document Extraction & Analysis

Description

screenshot

Generating contextual summaries is an token-intensive approach for RAG embeddings which can quickly rack up costs if your inference provider charges by token usage.

Featherless.ai is an inference provider with a different pricing model - they charge a flat subscription fee (starting from $10) and allows for unlimited token usage instead. If you’re typically spending over $10 - $25 a month, you may find Featherless to be a cheaper and more manageable option for your projects or team.

For this template, Featherless’s unlimited token usage is well suited for generating contextual summaries at high volumes for a majority of RAG workloads.

LLM: moonshotai/Kimi-K2-Instruct Embeddings: models/gemini-embedding-001

How it works

  1. A large document is imported into the workflow using the HTTP node and its text extracted via the Extract from file node. For this demonstration, the UK highway code is used an an example.
  2. Each page is processed individually and a contextual summary is generated for it. The contextual summary generation involves taking the current page, preceding and following pages together and summarising the contents of the current page.
  3. This summary is then converted to embeddings using Gemini-embedding-001 model. Note, we’re using a http request to use the Gemini embedding API as at time of writing, n8n does not support the new API’s schema.
  4. These embeddings are then stored in a Qdrant collection which can then be retrieved via an agent/MCP server or another workflow.

How to use

Requirements

Customising this workflow

🔗 Nodes Used

HTTP Request, Execute Sub-workflow, Execute Workflow Trigger, AI Agent, Call n8n Workflow Tool, Extract from File

📥 Import

Download workflow.json and import into n8n: Workflow menu → Import from File

📖 Importing guide · 🔑 Credential setup