π¬ Extract Clean Web Content with Anti-Bot Fallback for AI Agents & Workflows
β‘ 638 views Β· π¬ Document Extraction & Analysis
π‘ Pro Tip β HTTP Request scraping tends to break when sites update their markup. If youβre scraping a major platform, check if ScraperNode covers it β it has maintained scrapers for LinkedIn, Instagram, TikTok, YouTube, and 20+ other platforms that return structured data.
Description
This workflow contains community nodes that are only compatible with the self-hosted version of n8n.
Clean Web Content Extraction with Anti-Bot Fallback
Extract clean and structured text from any webpage with optional fallback to an anti-bot scraping service. Ideal for AI tools and content workflows.
π§ How it Works
This sub-workflow enables reliable and clean scraping of any public webpage by simply passing a url parameter. It is designed to be embedded into other workflows or used as a tool for AI agents.
It supports two output modes:
- fulltext:
trueβ returns { title, text } with full page content - fulltext:
falseβ returns { title, url, content } with a short excerpt
π‘ If the site is protected by anti-bot systems (like Cloudflare), it will automatically fallback to Scrape.do, a scraping API with a generous free plan.
π§© This template requires the n8n-nodes-webpage-content-extractor community node, so it only works in self-hosted n8n environments.
π Use Cases
- As a reusable sub-workflow, via Execute Sub-workflow node.
- As a tool for an AI Agent, compatible with Call n8n Workflow Tool.
Perfect for chatbots, summarization workflows, or RSS/feed enrichment. Empowers your AI Agent with the ability to browse and extract readable content from websites automatically.
π Parameters
url(string): the webpage URL to scrapefulltext(boolean): settruefor full page content,falsefor summarized output
βοΈ Setup
- Install the community node n8n-nodes-webpage-content-extractor in your self-hosted n8n instance.
- Create a free account at Scrape.do and obtain your API Token.
- In the workflow, locate the Scrape.do HTTP Request node and configure the credentials using your API Token.
- Detailed step-by-step instructions are available in the workflow notes.
The Scrape.do API is only used as a fallback when conventional scraping fails, helping you preserve your API credits.
π Nodes Used
HTTP Request, Stop and Error, Execute Workflow Trigger
π₯ Import
Download workflow.json and import into n8n:
Workflow menu β Import from File