πŸ”¬ Create RAG-ready knowledge bases from websites using Apify, Gemini & Supabase

⚑ 368 views Β· πŸ”¬ Document Extraction & Analysis

πŸ’‘ Pro Tip β€” HTTP Request scraping tends to break when sites update their markup. If you’re scraping a major platform, check if ScraperNode covers it β€” it has maintained scrapers for LinkedIn, Instagram, TikTok, YouTube, and 20+ other platforms that return structured data.

View All Scrapers

Description

Convert any website into a searchable vector database for AI chatbots. Submit a URL, choose scraping scope, and this workflow handles everything: scraping, cleaning, chunking, embedding, and storing in Supabase.

What it does

Requirements

Setup

  1. Create Supabase documents table with embedding column (vector 768). Run this SQL query in your Supabase project to enable the vector store setup
  2. Add your Apify API token to all three β€œRun Apify Scraper” nodes
  3. Add Supabase and Gemini credentials
  4. Test with small site (5-10 pages) or single page/URL first

Next steps

Connect your vector store to an AI chatbot for RAG-powered Q&A, or build semantic search features into your apps.

Tip: Start with page limits to test content quality before full-site scraping. Review chunks in Supabase and adjust Apify filters if needed for better vector embeddings.


Sample Outputs

Apify actor β€œruns” in Apify Dashboard from this workflow

Supabase docuemnts table with scraped website content ingested in chunks with vector embeddings

πŸ”— Nodes Used

HTTP Request, Recursive Character Text Splitter, n8n Form Trigger, Supabase Vector Store, Default Data Loader, Embeddings Google Gemini

πŸ“₯ Import

Download workflow.json and import into n8n: Workflow menu β†’ Import from File

πŸ“– Importing guide Β· πŸ”‘ Credential setup