πŸ“Š Scrape books from URL with Dumpling AI, clean HTML, save to Sheets, email as CSV

⚑ 3,236 views Β· πŸ“Š Market Research & Insights

πŸ’‘ Pro Tip β€” HTTP Request scraping tends to break when sites update their markup. If you’re scraping a major platform, check if ScraperNode covers it β€” it has maintained scrapers for LinkedIn, Instagram, TikTok, YouTube, and 20+ other platforms that return structured data.

View All Scrapers

Description

πŸ‘₯ Who is this for?

This workflow is ideal for virtual assistants, researchers, developers, automation specialists, and data analysts who need to regularly extract and organize structured product information (like books) from a website. It’s especially useful for those working with catalog-based websites who want to automate extraction and delivery of clean, sorted data.


🧩 What problem is this solving?

Manually copying product listings like book titles and prices from a website into a spreadsheet is slow and repetitive. This automation solves that problem by scraping content using Dumpling AI, extracting the right data using CSS selectors, and formatting it into a clean CSV file that is sent to your emailβ€”all triggered automatically when a new URL is added to Google Sheets.


βš™οΈ What this workflow does

This template automates an entire content scraping and delivery process:


πŸ› οΈ Setup

  1. Google Sheets

    • Create a sheet titled something like URLs
    • Add your product listing URLs (e.g., http://books.toscrape.com)
    • Connect the Google Sheets trigger node to your sheet
    • Ensure you have proper credentials connected
  2. Dumpling AI

    • Create an account at Dumpling AI - Generate your API key

    • Set the HTTP Method to POST and pass the URL dynamically from the Google Sheet

    • Use Header Auth to include your API key in the request header

    • Make sure "cleaned": "True" is included in the body for optimized HTML output

  3. HTML Node

    • The first HTML node extracts the main book container blocks using:
      .row > li
    • The second HTML node parses out the individual fields:
      • title: h3 > a (via the title attribute)
      • price: .price_color
  4. Sort Node

    • Sorts books by price in descending order
    • Note: price is extracted as a string, ensure it’s parsable if you plan to use numeric filtering later
  5. Convert to CSV

    • The JSON data is passed into a Convert node and transformed into a CSV file
  6. Gmail

    • Sends the CSV as an attachment to a designated email

πŸ”„ How to customize this workflow


⚠️ Dependencies and Notes

πŸ”— Nodes Used

HTTP Request, Gmail, Google Sheets Trigger, Convert to File

πŸ“₯ Import

Download workflow.json and import into n8n: Workflow menu β†’ Import from File

πŸ“– Importing guide Β· πŸ”‘ Credential setup