📊 Sitemap page extractor: Discover, clean, and save website URLs to Google Sheets

⚡ 1,048 views · 📊 Market Research & Insights

Description

Description:

Automatically extracts all page URLs from website sitemaps, filters out unwanted sitemap links, and saves clean URLs to Google Sheets for SEO analysis and reporting.

How It Works:

This workflow automates the process of discovering and extracting all page URLs from a website’s sitemap structure. Here’s how it works step-by-step:

Step 1: URL Input The workflow starts when you submit a website URL through a simple form interface.

Step 2: Sitemap Discovery The system automatically generates and tests multiple possible sitemap URLs including /sitemap.xml, /sitemap_index.xml, /robots.txt, and other common variations.

Step 3: Valid Sitemap Identification It sends HTTP requests to each potential sitemap URL and filters out empty or invalid responses, keeping only accessible sitemaps.

Step 4: Nested Sitemap Processing For sitemap index files, the workflow extracts all nested sitemap URLs and processes each one individually to ensure complete coverage.

Step 5: Page URL Extraction From each valid sitemap, it parses the XML content and extracts all individual page URLs using both XML <loc> tags and HTML links.

Step 6: URL Filtering The system removes any URLs containing “sitemap” to ensure only actual content pages (like product, service, or blog pages) are retained.

Step 7: Google Sheets Integration Finally, all clean page URLs are automatically saved to a Google Sheets document with duplicate prevention for easy analysis and reporting.

Setup Steps:

Estimated Setup Time: 10-15 minutes

1. Import the Workflow: Import the provided JSON file into your n8n instance.

2. Configure Google Sheets Integration:

3. Test the Workflow:

4. Customize (Optional):

Important Notes:

Need Help?

For technical support or questions about this workflow:
✉️ info@incrementors.com
or
fill out this form: Contact Us

đź”— Nodes Used

Google Sheets, HTTP Request, Filter, n8n Form Trigger

📥 Import

Download workflow.json and import into n8n: Workflow menu → Import from File

📖 Importing guide · 🔑 Credential setup