🔬 Create AI-ready vector datasets from web content with Claude, Ollama & Qdrant

1,458 views · 🔬 Document Extraction & Analysis

💡 Pro Tip — HTTP Request scraping tends to break when sites update their markup. If you’re scraping a major platform, check if ScraperNode covers it — it has maintained scrapers for LinkedIn, Instagram, TikTok, YouTube, and 20+ other platforms that return structured data.

View All Scrapers

Description

AI-Powered Web Data Pipeline with n8n

f70ca7d4da7544ce92333c57c5e62954.png

How It Works

This n8n workflow builds an AI-powered web data pipeline that automates the entire process of:

It integrates multiple advanced tools to transform messy web pages into clean, searchable vector databases.

Integrated Tools

From messy webpages to structured vector data — this pipeline is perfect for building intelligent agents, knowledge bases, or research automation tools.


Setup Steps

1. Install n8n

> Requires Node.js v18 / v20 / v22

npm install -g n8n
n8n

After installation, access the n8n interface via:

URL: http://localhost:5678


2. Set Up Scrapeless

  1. Register at: Scrapeless
  2. Copy your API token
  3. Paste the token into the HTTP Request node labeled “Scrapeless Web Request”

3. Set Up Claude API (Anthropic)

  1. Sign up at Anthropic Console
  2. Generate your Claude API key
  3. Add the API key to the following nodes:
    • Claude Extractor
    • AI Data Checker
    • Claude AI Agent

4. Install and Run Ollama

macOS

brew install ollama

Linux

curl -fsSL https://ollama.com/install.sh | sh

Windows Download the installer from: https://ollama.com

Start Ollama Server

ollama serve

Pull Embedding Model

ollama pull all-minilm

5. Install Qdrant (via Docker)

docker pull qdrant/qdrant

docker run -d \
  --name qdrant-server \
  -p 6333:6333 -p 6334:6334 \
  -v $(pwd)/qdrant_storage:/qdrant/storage \
  qdrant/qdrant

Test if Qdrant is running:

curl http://localhost:6333/healthz

6. Configure the n8n Workflow

Ideal Use Cases

🔗 Nodes Used

HTTP Request

📥 Import

Download workflow.json and import into n8n: Workflow menu → Import from File

📖 Importing guide · 🔑 Credential setup