π¬ Convert URL HTML to markdown format and get page links
β‘ 4,710 views Β· π¬ Document Extraction & Analysis
π‘ Pro Tip β HTTP Request scraping tends to break when sites update their markup. If youβre scraping a major platform, check if ScraperNode covers it β it has maintained scrapers for LinkedIn, Instagram, TikTok, YouTube, and 20+ other platforms that return structured data.
Description
Use Case
Transform web pages into AI-friendly markdown format:
- You need to process webpage content for LLM analysis
- You want to extract both content and links from web pages
- You need clean, formatted text without HTML markup
- You want to respect API rate limits while crawling pages
What this Workflow Does
The workflow uses Firecrawl.dev API to process webpages:
- Converts HTML content to markdown format
- Extracts all links from each webpage
- Handles API rate limiting automatically
- Processes URLs in batches from your database
Setup
- Create a Firecrawl.dev account and get your API key
- Add your Firecrawl API key to the HTTP Request nodeβs Authorization header
- Connect your URL database to the input node (column name must be βPageβ) or edit the array in
Example fields from data source - Configure your preferred output database connection
How to Adjust it to Your Needs
- Modify input source to pull URLs from different databases
- Adjust rate limiting parameters if needed
- Customize output format for your specific use case
More templates and n8n workflows >>> @simonscrapes
π Nodes Used
HTTP Request
π₯ Import
Download workflow.json and import into n8n:
Workflow menu β Import from File