πŸ“Š Extract product info from webpage screenshots using Dumpling AI and GPT-4o

⚑ 551 views Β· πŸ“Š Market Research & Insights

πŸ’‘ Pro Tip β€” HTTP Request scraping tends to break when sites update their markup. If you’re scraping a major platform, check if ScraperNode covers it β€” it has maintained scrapers for LinkedIn, Instagram, TikTok, YouTube, and 20+ other platforms that return structured data.

View All Scrapers

Description

Who is this for?

This workflow is perfect for eCommerce teams, market researchers, and product analysts who want to track or extract product information from websites that restrict scraping tools. It’s also useful for virtual assistants handling product comparison tasks.


What problem is this workflow solving?

Many eCommerce and retail sites use dynamic content or anti-bot protections that make traditional scraping methods unreliable. This workflow bypasses those issues by taking a screenshot of the full page, using OCR to extract visible text, and summarizing product information with GPT-4oβ€”all fully automated.


What this workflow does

This workflow monitors a Google Sheet for new URLs. Once a new link is added, it performs the following steps:

  1. Trigger on New URL in Sheet – Watches for new rows added to a Google Sheet.
  2. Screenshot URL via Dumpling AI – Sends the URL to Dumpling AI’s screenshot endpoint to capture a full-page image of the product webpage.
  3. Save Screenshot to Drive Folder – Uploads the screenshot to a specific Google Drive folder for reference or logging.
  4. Extract Text from Screenshot with Dumpling AI – Uses Dumpling AI’s image-to-text endpoint to pull all visible content from the screenshot.
  5. Extract Product Info from Screenshot Text with GPT-4o – Sends the extracted raw text to GPT-4o, prompting it to identify structured product information such as product name, price, ratings, deals, and purchase options.
  6. Split Each Product Entry – Splits the GPT response (an array of product objects) so each product becomes an individual item for saving.
  7. Save Products info to Google Sheet – Appends each product’s structured details to a separate sheet in the same spreadsheet.

Setup

  1. Google Sheet

    • Create a Google Sheet with at least two sheets:
      • Sheet1 should contain a header row with a column labeled URL.
      • Sheet2 should contain headers: Product Name, price, purchased, ratings, deal, buyingOptions.
    • Connect your Google account in both the trigger and final write-back node.
  2. Dumpling AI

    • Sign up at Dumpling AI
    • Create an API key and use it for both HTTP modules:
      • Screenshot URL via Dumpling AI
      • Extract Text from Screenshot with Dumpling AI
    • The screenshot endpoint used is https://app.dumplingai.com/api/v1/screenshot.
  3. Google Drive

    • Create a folder for storing screenshots.
    • In the Save Screenshot to Drive Folder node, select the correct folder or provide the folder ID.
    • Make sure permissions allow uploading from n8n.
  4. OpenAI

    • Provide an API key for GPT-4o in the Extract Product Info from Screenshot Text with GPT-4o node.
    • The prompt is structured to return structured product listings in JSON format.
  5. Split & Save

    • Split Each Product Entry takes the array of product objects from GPT and makes each one a separate execution.
    • Save Products info to Google Sheet writes structured fields into Sheet2 under:
      Product Name, price, purchased, ratings, deal, buyingOptions.

How to customize this workflow


Notes

πŸ”— Nodes Used

Google Sheets, HTTP Request, Google Drive, Google Sheets Trigger, OpenAI

πŸ“₯ Import

Download workflow.json and import into n8n: Workflow menu β†’ Import from File

πŸ“– Importing guide Β· πŸ”‘ Credential setup