π π Firecrawl website content extractor
β‘ 713 views Β· π Market Research & Insights
π‘ Pro Tip β HTTP Request scraping tends to break when sites update their markup. If youβre scraping a major platform, check if ScraperNode covers it β it has maintained scrapers for LinkedIn, Instagram, TikTok, YouTube, and 20+ other platforms that return structured data.
Description
π Firecrawl Website Content Extractor (n8n Workflow)
This n8n automation workflow uses Firecrawl API to extract structured data (e.g., quotes and authors) from web pages β such as Quotes to Scrape β and handles retries in case of delayed extraction.
π Workflow Overview
π― Purpose:
- Crawl and extract structured web data using Firecrawl
- Wait for asynchronous scraping to complete
- Retrieve and validate results
- Support retries if content is not ready
π§ Step-by-Step Node Breakdown
1. π§ͺ Manual Trigger
- Node:
When clicking βTest workflowβ - Used to manually test or execute the workflow during setup or debugging.
2. π€ Firecrawl Extract API Request
- Node:
Extract - Sends a
POSTrequest tohttps://api.firecrawl.dev/v1/extract - Payload includes:
urls: List of pages to crawl (https://quotes.toscrape.com/*)prompt: βExtract all quotes and their corresponding authors from the website.βschema: JSON schema defining expected structure (quotes[], each withtextandauthor)
> π Uses an HTTP Header Auth credential for Firecrawl API
3. β±οΈ Wait for 30 Seconds
- Node:
30 Secs - Gives Firecrawl time to finish processing in the background
- Prevents hitting the API before results are ready
4. π₯ Get Results
- Node:
Get Results - Performs a
GETrequest to the status URL using{{ $('Extract').item.json.id }}to retrieve extraction results.
5. β β Condition Check
- Node:
If - Checks if the
dataarray is empty (i.e., no results yet) - If data is empty:
- Waits 10 more seconds and retries
- If data is available:
- Passes data to the next step (e.g., processing or storage)
6. π Retry Delay
- Node:
10 Seconds - Waits briefly before sending another
GETrequest to Firecrawl
7. π οΈ Edit Fields (Optional Output Formatting)
- Node:
Edit Fields - Placeholder to structure or format the extracted results (quotes and authors)
π§Ύ Sticky Note: Firecrawl Setup Guide
Included as an embedded reference:
- π 10% Firecrawl Discount
- π§° Instructions to:
- Add Firecrawl API credentials in n8n
- Use Firecrawl Community Node for self-hosted instances
- Set up the schema and prompt for targeted data extraction
β Key Features
- π API-based crawling with schema-structured output
- β±οΈ Smart waiting + retry mechanism
- π§ AI prompt integration for intelligent data parsing
- βοΈ Flexible for different URLs, prompts, and schemas
π¦ Sample Output Schema
{
"quotes": [
{
"text": "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.",
"author": "Albert Einstein"
},
{
"text": "It is our choices, Harry, that show what we truly are, far more than our abilities.",
"author": "J.K. Rowling"
}
]
}
## π Nodes Used
HTTP Request
## π₯ Import
Download [`workflow.json`](workflow.json) and import into n8n:
**Workflow menu β Import from File**
[π Importing guide](../../../docs/importing-templates.md) Β· [π Credential setup](../../../docs/credential-setup.md)