π¬ Automatically track certification changes with ScrapeGraphAI, GitLab and Rocket.Chat
β‘ 20 views Β· π¬ Document Extraction & Analysis
π‘ Pro Tip β HTTP Request scraping tends to break when sites update their markup. If youβre scraping a major platform, check if ScraperNode covers it β it has maintained scrapers for LinkedIn, Instagram, TikTok, YouTube, and 20+ other platforms that return structured data.
Description
Certification Requirement Tracker with Rocket.Chat and GitLab
β οΈ COMMUNITY TEMPLATE DISCLAIMER: This is a community-contributed template that uses ScrapeGraphAI (a community node). Please ensure you have the ScrapeGraphAI community node installed in your n8n instance before using this template.
This workflow automatically scrapes certification-issuing bodies once a year, detects any changes in certification or renewal requirements, creates a GitLab issue for the responsible team, and notifies the relevant channel in Rocket.Chat. It helps professionals and compliance teams stay ahead of changing industry requirements and never miss a renewal.
Pre-conditions/Requirements
Prerequisites
- An n8n instance (self-hosted or n8n.cloud)
- ScrapeGraphAI community node installed and activated
- Rocket.Chat workspace with Incoming Webhook or user credentials
- GitLab account with at least one repository and a Personal Access Token (PAT)
- Access URLs for all certification bodies or industry associations you want to monitor
Required Credentials
- ScrapeGraphAI API Key β Enables web scraping services
- Rocket.Chat Credentials β Either:
- Webhook URL, or
- Username & Password / Personal Access Token
- GitLab Personal Access Token β To create issues and comments via API
Specific Setup Requirements
| Service | Requirement | Example/Notes |
|---|---|---|
| Rocket.Chat | Incoming Webhook URL OR user credentials | https://chat.example.com/hooks/abc123β¦ |
| GitLab | Personal Access Token with api scope | Generate at Settings β Access Tokens |
| ScrapeGraphAI | Domain whitelist (if running behind firewall) | Allow outbound HTTPS traffic to target sites |
| Cron Schedule | Annual (default) or custom interval | 0 0 1 1 * for 1-Jan every year |
How it works
This workflow automatically scrapes certification-issuing bodies once a year, detects any changes in certification or renewal requirements, creates a GitLab issue for the responsible team, and notifies the relevant channel in Rocket.Chat. It helps professionals and compliance teams stay ahead of changing industry requirements and never miss a renewal.
Key Steps:
- Scheduled Trigger: Fires annually (or any chosen interval) to start the check.
- Set Node β URL List: Stores an array of certification-body URLs to scrape.
- Split in Batches: Iterates over each URL for parallel scraping.
- ScrapeGraphAI: Extracts requirement text, effective dates, and renewal info.
- Code Node β Diff Checker: Compares the newly scraped data with last yearβs GitLab issue (if any) to detect changes.
- IF Node β Requirements Changed?: Routes the flow based on change detection.
- GitLab β Create/Update Issue: Opens a new issue or comments on an existing one with details of the change.
- Rocket.Chat β Notify Channel: Sends a message summarizing any changes and linking to the GitLab issue.
- Merge Node: Collects all branch results for a final summary report.
Set up steps
Setup Time: 15-25 minutes
- Install Community Node: In n8n, navigate to Settings β Community Nodes and install βScrapeGraphAIβ.
- Add Credentials:
a. In Credentials, create βScrapeGraphAI APIβ.
b. Add your Rocket.Chat Webhook or PAT.
c. Add your GitLab PAT withapiscope. - Import Workflow: Copy the JSON template into n8n (Workflows β Import).
- Configure URL List: Open the Set β URL List node and replace the sample array with real certification URLs.
- Adjust Cron Expression: Double-click the Schedule Trigger node and set your desired frequency.
- Customize Rocket.Chat Channel: In the Rocket.Chat β Notify node, set the
channelor use an incoming webhook. - Run Once for Testing: Execute the workflow manually to ensure issues and notifications are created as expected.
- Activate Workflow: Toggle Activate so the schedule starts running automatically.
Node Descriptions
Core Workflow Nodes:
- stickyNote β Workflow Notes: Contains a high-level diagram and documentation inside the editor.
- Schedule Trigger β Initiates the yearly check.
- Set (URL List) β Holds certification body URLs and meta info.
- SplitInBatches β Iterates through each URL in manageable chunks.
- ScrapeGraphAI β Scrapes each certification page and returns structured JSON.
- Code (Diff Checker) β Compares the current scrape with historical data.
- If β Requirements Changed? β Switches path based on diff result.
- GitLab β Creates or updates issues, attaches JSON diff, sets labels (
certification,renewal). - Rocket.Chat β Posts a summary message with links to the GitLab issue(s).
- Merge β Consolidates batch results for final logging.
- Set (Success) β Formats a concise success payload.
Data Flow:
- Schedule Trigger β Set (URL List) β SplitInBatches β ScrapeGraphAI β Code (Diff Checker) β If β GitLab / Rocket.Chat β Merge
Customization Examples
Add Additional Metadata to GitLab Issue
// Inside the GitLab "Create Issue" node βοΈ
{
"title": `Certification Update: ${$json.domain}`,
"description": `**What's Changed?**\n${$json.diff}\n\n_Last checked: {{$now}}_`,
"labels": "certification,compliance," + $json.industry
}
Customize Rocket.Chat Message Formatting
// Rocket.Chat node β JSON parameters
{
"text": `:bell: *Certification Update Detected*\n>*${$json.domain}*\n>See the GitLab issue: ${$json.issueUrl}`
}
Data Output Format
The workflow outputs structured JSON data:
{
"domain": "example-cert-body.org",
"scrapeDate": "2024-01-01T00:00:00Z",
"oldRequirements": "Original text β¦",
"newRequirements": "Updated text β¦",
"diff": "- Continuous education hours increased from 20 to 24\n- Fee changed to $200",
"issueUrl": "https://gitlab.com/org/compliance/-/issues/42",
"notification": "sent"
}
Troubleshooting
Common Issues
- No data returned from ScrapeGraphAI β Confirm the target site is publicly accessible and not blocking bots. Whitelist the domain or add proper headers via ScrapeGraphAI options.
- GitLab issue not created β Check that the PAT has
apiscope and the project ID is correct in the GitLab node. - Rocket.Chat message fails β Verify webhook URL or credentials and ensure the channel exists.
Performance Tips
- Limit the batch size in SplitInBatches to avoid API rate limits.
- Schedule the workflow during off-peak hours to minimize load.
Pro Tips:
- Store last-year scrapes in a dedicated GitLab repository to create a complete change log history.
- Use n8nβs built-in Execution History Pruning to keep the database slim.
- Add an Error Trigger workflow to notify you if any step fails.
π Nodes Used
Slack, GitLab, Schedule Trigger
π₯ Import
Download workflow.json and import into n8n:
Workflow menu β Import from File