๐ Crawl websites & answer questions with GPT-5 nano and Google Sheets
โก 1,950 views ยท ๐ Market Research & Insights
๐ก Pro Tip โ HTTP Request scraping tends to break when sites update their markup. If youโre scraping a major platform, check if ScraperNode covers it โ it has maintained scrapers for LinkedIn, Instagram, TikTok, YouTube, and 20+ other platforms that return structured data.
Description
Web Consultation & Crawling Chatbot with Google Sheets Memory
Who is this workflow for? This workflow is designed for SEO analysts, content creators, marketing agencies, and developers who need to index a website and then interact with its content as if it were a chatbot. โ Note: if the site contains many pages, AI token consumption can generate high costs, especially during the initial crawling and analysis phase.
1. Initial Mode (first use with a URL)
When the user enters a URL for the first time:
-
URL validation using AI (gpt-5-nano).
-
Automatic sitemap discovery via
robots.txt. -
Relevant sitemap selection (pages, posts, categories, or tags) using GPT-4o according to configured options. (Includes โOPTIONSโ node to precisely choose which types of URLs to process)
-
Crawling of all selected pages:
-
Downloads HTML of each page.
-
Converts HTML to Markdown.
-
AI analysis to extract:
- Detected language.
- Heading hierarchy (H1, H2, etc.).
- Internal and external links.
- Content summary.
-
-
Structured storage in Google Sheets:
- Lang
- H1 and hierarchy
- External URLs
- Internal URLs
- Summary Content
- Data schema (flag to enable agent mode)
When finished, the sheet is marked with Data schema = true, signaling that the site is indexed.
2. Agent Mode (subsequent queries)
If the URL has already been indexed (Data schema = true):
-
The chat becomes a LangChain Agent that:
- Reads the database in Google Sheets.
- Can perform real-time HTTP requests if it needs updated information.
- Responds as if it were the website, using stored and live data.
This allows the user to ask questions such as:
- โWhatโs on the contact page?โ
- โHow many external links are there on the homepage?โ
- โGive me all the H1 headings from the services pagesโ
- โWhat CTA would you suggest for my page?โ
- โHow would you expand X content?โ
Use cases
- Build a chatbot that answers questions about a websiteโs content.
- Index and analyze full websites for future queries.
- SEO tool to list headings, links, and content summaries.
- Assistant for quick exploration of a siteโs structure.
- Generate improvement recommendations and content strategies from site data.
๐ Nodes Used
Google Sheets, HTTP Request, Stop and Error, Markdown, AI Agent, OpenAI Chat Model
๐ฅ Import
Download workflow.json and import into n8n:
Workflow menu โ Import from File