🎣 Indeed job scraper with AI filtering & company research using Apify and Tavily

2,819 views · 🎣 Lead Generation & Enrichment

💡 Pro Tip — Job boards are notoriously hard to scrape — CAPTCHAs, rate limits, constantly changing layouts. ScraperNode has maintained scrapers for Indeed jobs, Glassdoor reviews, and Glassdoor jobs that handle all of that for you.

View All Scrapers

Description

This workflow contains community nodes that are only compatible with the self-hosted version of n8n.

This workflow scrapes job listings on indeed via Apify, automatically gets that dataset, extracts information about the listing filters jobs off relevance, finds a decision maker at the company and updates a database (google sheets) with that info for outreach. All you need to do is run Apify actor then the database will update with the processed data.

Benefits:

Complete Job search Automation - A webhook monitors the Apify actor which sends a integration and starts the process AI-Powered Filter - Uses ChatGPT to analyze content/context, identify company goals, and filters based on job description Smart Duplicate Prevention - Automatically tracks processed job listings in a database to avoid redundancy Multi-Platform Intelligence - Combines Indeed scraping, web research via Tavily, and enriches each listing Niche Focus - Process content from multiple niches 6 currently (hardcoded) but can be changed to fit other niches (just prompt the “job filter” node)

How It Works:

  1. Indeed Job Discovery:
  1. Oncoming Data Processing:
  1. Job Analysis & Filter:
  1. Enrich & Update Database:

Required Google Sheets Database Setup:

Before running this workflow, create a Google Sheets database with these exact column headers: Essential Columns:

jobUrl - Unique identifier for job listings title - Position Title descriptionText - Description of job listing hiringDemand/isHighVolumeHiring - Are they hiring at high volume? hiringDemand/isUrgentHire - Are they hiring at high urgency? isRemote - Is this job remote? jobType/0 - Job type: In person, Remote, Part-time, etc. companyCeo/name - CEO name collected from Tavily’s search icebreaker - Column for holding custom icebreakers for each job listing (Not completed in the workflow. I will upload another that does this called “Personalized IJSFE”) scrapedCeo - CEO name collected from Apify Scraper email - Email listed on for job listing companyName - Name of company that posted the job companyDescription - Description of the company that posted the job companyLinks/corporateWebsite - Website of the company that posted the job companyNumEmployees - Number of employees the company listed that they have location/country - Location of where the job is to take place salary/salaryText - Salary on job listing

Setup Instructions:

Create a new Google Sheet with these column headers in the first row Name the sheet whatever you please Connect your Google Sheets OAuth credentials in n8n Update the document ID in the workflow nodes

The merge logic relies on the id column to prevent duplicate processing, so this structure is essential for the workflow to function correctly. Feel free to reach out for additional help or clarification at my gmail: terflix45@gmail.com and I’ll get back to you as soon as I can.

Set Up Steps:

  1. Configure Apify Integration:
  1. Set Up AI Services:
  1. Database Configuration:
  1. Content Filtering Setup:

🔗 Nodes Used

Google Sheets, Webhook, Filter, OpenAI

📥 Import

Download workflow.json and import into n8n: Workflow menu → Import from File

📖 Importing guide · 🔑 Credential setup