📊 Extract Amazon product data with Scrape.do, GPT-4 & Google Sheets

742 views · 📊 Market Research & Insights

💡 Pro Tip — HTTP Request scraping tends to break when sites update their markup. If you’re scraping a major platform, check if ScraperNode covers it — it has maintained scrapers for LinkedIn, Instagram, TikTok, YouTube, and 20+ other platforms that return structured data.

View All Scrapers

Description

Amazon Product Scraper with Scrape.do & AI Enrichment

> This workflow is a fully automated Amazon product data extraction engine. It reads product URLs from a Google Sheet, uses Scrape.do to reliably fetch each product page’s HTML without getting blocked, and then applies an AI-powered extraction process to capture key product details such as name, price, rating, review count, and description. All structured results are neatly stored back into a Google Sheet for easy access and analysis.

This template is designed for consistency and scalability—ideal for marketers, analysts, and e-commerce professionals who need clean product data at scale.


🚀 What does this workflow do?

🎯 Who is this for?

✨ Benefits

⚙️ How it Works

  1. Manual or Scheduled Trigger: Start the workflow manually or via a cron schedule.
  2. Input Source: Fetch URLs from a Google Sheet (TRACK_SHEET_GID).
  3. Scrape with Scrape.do: Retrieve full HTML from each Amazon product page using your SCRAPEDO_TOKEN.
  4. Clean & Pre-Extract: Strip irrelevant code and use regex to pre-extract key fields.
  5. AI Extraction & Verification: LangChain GPT-4 model refines and validates product name, description, price, rating, and reviews.
  6. Save Results: Append enriched product data to the results sheet (RESULTS_SHEET_GID).

📋 n8n Nodes Used

🔑 Prerequisites

🛠️ Setup

  1. Import the Workflow into your n8n instance.
  2. Set Workflow Variables:
    • SCRAPEDO_TOKEN – your Scrape.do API key.
    • WEB_SHEET_ID – Google Sheet ID.
    • TRACK_SHEET_GID – sheet/tab name for input URLs.
    • RESULTS_SHEET_GID – sheet/tab name for results.
  3. Configure Credentials for Google Sheets and OpenRouter.
  4. Map Columns in the “add results” node to match your Google Sheet (e.g., name, price, rating, reviews, description).
  5. Run or Schedule: Start manually or configure a schedule for continuous data extraction.

This Amazon Product Scraper delivers fast, reliable, and AI-enriched product data, ensuring your e-commerce analytics, pricing strategies, or market research stay accurate and fully automated.

🔗 Nodes Used

Google Sheets, HTTP Request, Basic LLM Chain, OpenAI Chat Model, Structured Output Parser

📥 Import

Download workflow.json and import into n8n: Workflow menu → Import from File

📖 Importing guide · 🔑 Credential setup