πŸ“Š Track certification requirement changes with ScrapeGraphAI, GitHub and email

⚑ 11 views Β· πŸ“Š Market Research & Insights

πŸ’‘ Pro Tip β€” If you need GitHub data beyond what the REST API gives you, ScraperNode has a repository scraper that extracts metadata at scale without token rate limits.

View All Scrapers

Description

Job Posting Aggregator with Email and GitHub

⚠️ COMMUNITY TEMPLATE DISCLAIMER: This is a community-contributed template that uses ScrapeGraphAI (a community node). Please ensure you have the ScrapeGraphAI community node installed in your n8n instance before using this template.

This workflow automatically aggregates certification-related job-posting requirements from multiple industry sources, compares them against last year’s data stored in GitHub, and emails a concise change log to subscribed professionals. It streamlines annual requirement checks and renewal reminders, ensuring users never miss an update.

Pre-conditions/Requirements

Prerequisites

Required Credentials

Specific Setup Requirements

ResourcePurposeExample
GitHub RepositoryStores certification_requirements.json versioned annuallyhttps://github.com/<you>/cert-requirements.git
Watch List FileList of page URLs & selectors to scrapeSaved in the repo under /config/watchList.json
Email ListSemicolon-separated list of recipientsme@company.com;team@company.com

How it works

This workflow automatically aggregates certification-related job-posting requirements from multiple industry sources, compares them against last year’s data stored in GitHub, and emails a concise change log to subscribed professionals. It streamlines annual requirement checks and renewal reminders, ensuring users never miss an update.

Key Steps:

Set up steps

Setup Time: 15-25 minutes

  1. Install Community Node: From n8n UI β†’ Settings β†’ Community Nodes β†’ search and install β€œScrapeGraphAI”.
  2. Create/Clone GitHub Repo: Add an empty certification_requirements.json ( {} ) and a config/watchList.json with an array of objects like:
    [
      {
        "url": "https://cert-body.org/requirements",
        "selector": "#requirements"
      }
    ]
  3. Generate GitHub PAT: Scope repo, store in n8n Credentials as β€œGitHub API”.
  4. Add ScrapeGraphAI Credential: Paste your API key into n8n Credentials.
  5. Configure Email Credentials: E.g., SMTP with username/password or OAuth2.
  6. Open Workflow: Import the template JSON into n8n.
  7. Update Environment Variables (in the Code node or via n8n variables):
    • GITHUB_REPO (e.g., user/cert-requirements)
    • EMAIL_RECIPIENTS
  8. Test Run: Trigger manually. Verify email content and GitHub commit.
  9. Schedule: Add a Cron node (optional) for yearly or quarterly automatic runs.

Node Descriptions

Core Workflow Nodes:

Data Flow:

  1. Manual Trigger β†’ Code (Load Watch List) β†’ SplitInBatches
  2. SplitInBatches β†’ ScrapeGraphAI β†’ Merge
  3. Merge β†’ GitHub (Read File) β†’ IF (Change Detector)
  4. IF (True) β†’ Email Send β†’ GitHub (Upsert File)

Customization Examples

Adjusting Scraper Configuration

// Inside the Watch List JSON object
{
  "url": "https://new-association.com/cert-update",
  "selector": ".content article:nth-of-type(1) ul"
}

Custom Email Template

// In Email Send node β†’ HTML Content
<div>
  <h2>πŸ“‹ Certification Updates – {{ $json.date }}</h2>
  <p>The following certifications have new requirements:</p>
  <ul>
    {{ $json.diffHtml }}
  </ul>
  <p>For full details visit our GitHub repo.</p>
</div>

Data Output Format

The workflow outputs structured JSON data:

{
  "timestamp": "2024-09-01T12:00:00Z",
  "source": "watchList.json",
  "current": {
    "AWS-SAA": "Version 3.0, requires renewed proctored exam",
    "PMP": "60 PDUs every 3 years"
  },
  "previous": {
    "AWS-SAA": "Version 2.0",
    "PMP": "60 PDUs every 3 years"
  },
  "changes": {
    "AWS-SAA": "Updated to Version 3.0; exam format changed."
  }
}

Troubleshooting

Common Issues

  1. ScrapeGraphAI returns empty data – Check CSS/XPath selectors and ensure page is publicly accessible.
  2. GitHub authentication fails – Verify PAT scope includes repo and that the credential is linked in both GitHub nodes.

Performance Tips

Pro Tips:

πŸ”— Nodes Used

Send Email, GitHub

πŸ“₯ Import

Download workflow.json and import into n8n: Workflow menu β†’ Import from File

πŸ“– Importing guide Β· πŸ”‘ Credential setup