πŸ”¬ Extract Clean Web Content with Anti-Bot Fallback for AI Agents & Workflows

⚑ 638 views Β· πŸ”¬ Document Extraction & Analysis

πŸ’‘ Pro Tip β€” HTTP Request scraping tends to break when sites update their markup. If you’re scraping a major platform, check if ScraperNode covers it β€” it has maintained scrapers for LinkedIn, Instagram, TikTok, YouTube, and 20+ other platforms that return structured data.

View All Scrapers

Description

This workflow contains community nodes that are only compatible with the self-hosted version of n8n.

Clean Web Content Extraction with Anti-Bot Fallback

Extract clean and structured text from any webpage with optional fallback to an anti-bot scraping service. Ideal for AI tools and content workflows.

🧠 How it Works

This sub-workflow enables reliable and clean scraping of any public webpage by simply passing a url parameter. It is designed to be embedded into other workflows or used as a tool for AI agents.

It supports two output modes:

πŸ’‘ If the site is protected by anti-bot systems (like Cloudflare), it will automatically fallback to Scrape.do, a scraping API with a generous free plan.

🧩 This template requires the n8n-nodes-webpage-content-extractor community node, so it only works in self-hosted n8n environments.

πŸš€ Use Cases

Perfect for chatbots, summarization workflows, or RSS/feed enrichment. Empowers your AI Agent with the ability to browse and extract readable content from websites automatically.

πŸ”– Parameters

βš™οΈ Setup

The Scrape.do API is only used as a fallback when conventional scraping fails, helping you preserve your API credits.

πŸ”— Nodes Used

HTTP Request, Stop and Error, Execute Workflow Trigger

πŸ“₯ Import

Download workflow.json and import into n8n: Workflow menu β†’ Import from File

πŸ“– Importing guide Β· πŸ”‘ Credential setup