⚒️ Extract and analyze web data with Bright Data & Google Gemini

304 views · ⚒️ Engineering

💡 Pro Tip — HTTP Request scraping tends to break when sites update their markup. If you’re scraping a major platform, check if ScraperNode covers it — it has maintained scrapers for LinkedIn, Instagram, TikTok, YouTube, and 20+ other platforms that return structured data.

View All Scrapers

Description

This workflow performs structured data extraction and data mining from a web page by combining the capabilities of Bright Data and Google Gemini.

How it Works

This workflow focuses on extracting structured data from a web page using Bright Data’s Web Unlocker Product. It then uses n8n’s AI capabilities, specifically Google Gemini Flash Exp, for information extraction and custom sentiment analysis. The results are sent to webhooks and saved as local files.

Use Cases

Setup Instructions

  1. Bright Data Credentials: You need to have an account and a Web Unlocker zone with Bright Data. Update the Header Auth account credentials in the Perform Bright Data Web Request node.

  2. Google Gemini Credentials: Provide your Google Gemini(PaLM) Api account credentials for the AI-related nodes.

  3. Configure URL and Zone: In the Set URL and Bright Data Zone node, set the web URL you want to scrape and your Bright Data zone.

  4. Update Webhook: Update the Webhook Notification URL in the relevant HTTP Request nodes.

Workflow Logic

  1. Trigger: The workflow is triggered manually.

  2. Set Parameters: It sets the target URL and the Bright Data zone.

  3. Web Request: The workflow performs a web request to the specified URL using Bright Data’s Web Unlocker. The output is formatted as markdown.

  4. Data Extraction & Analysis: The markdown content is then processed by multiple AI nodes to:

    • Extract textual data from the markdown.

    • Perform topic analysis with a structured response.

    • Analyze trends by location and category with a structured response.

  5. Output: The extracted data and analysis are sent to webhooks and saved as JSON files on disk.

Node Descriptions

Node NameDescription
When clicking ‘Test workflow’A manual trigger node to start the workflow.
Set URL and Bright Data ZoneA Set node to define the URL to be scraped and the Bright Data zone to be used.
Perform Bright Data Web RequestAn httpRequest node that performs the web request to Bright Data’s API to retrieve the content.
Markdown to Textual Data ExtractorAn AI node that uses Google Gemini to convert markdown content into plain text.
Google Gemini Chat ModelA node representing the Google Gemini model used for the data extraction.
Topic Extractor with the structured responseAn AI node that performs topic analysis and outputs the results in a structured JSON format.
Trends by location and category with the structured responseAn AI node that analyzes and clusters emerging trends by location and category, outputting a structured JSON.
Initiate a Webhook Notification…These nodes send the output of the AI analysis to a webhook.
Create a binary file…Function nodes that convert the JSON output into binary format for writing to a file.
Write the topics/trends file to diskreadWriteFile nodes that save the binary data to a local file (d:\topics.json and d:\trends.json).

Customization Tips

Suggested Sticky Notes for Workflow

Required Files

Testing Tips

Suggested Tags & Categories

🔗 Nodes Used

Function, HTTP Request, Basic LLM Chain, Read/Write Files from Disk, Google Gemini Chat Model, Information Extractor

📥 Import

Download workflow.json and import into n8n: Workflow menu → Import from File

📖 Importing guide · 🔑 Credential setup