🎣 Job post to sales lead pipeline with Scrape.do, Apollo.io & OpenAI

113 views · 🎣 Lead Generation & Enrichment

💡 Pro Tip — If you’re pulling LinkedIn data through HTTP requests or dealing with API restrictions, there’s a community node called ScraperNode that handles this natively. It has dedicated scrapers for profiles, companies, jobs, and people search — you just pass a URL and get structured data back.

View All Scrapers

Description

Lead Sourcing by Job Posts For Outreach With Scrape.do API & Open AI & Google Sheets

Overview

This n8n workflow automates the complete lead generation process by scraping job postings from Indeed, enriching company data via Apollo.io, identifying decision-makers, and generating personalized LinkedIn outreach messages using OpenAI. It integrates with Scrape.do for reliable web scraping, Apollo.io for B2B data enrichment, OpenAI for AI-powered personalization, and Google Sheets for centralized data storage.

Perfect for: Sales teams, recruiters, business development professionals, and marketing agencies looking to automate their outbound prospecting pipeline.


Workflow Components

1. ⏰ Schedule Trigger

PropertyValue
TypeSchedule Trigger
PurposeAutomatically initiates workflow on a recurring schedule
FrequencyWeekly (Every Monday)
Time00:00 UTC

Function: Ensures consistent, hands-off lead generation by running the pipeline automatically without manual intervention.


2. 🔍 Scrape.do Indeed API

PropertyValue
TypeHTTP Request (GET)
PurposeScrapes job listings from Indeed via Scrape.do proxy API
Endpointhttps://api.scrape.do
Output FormatMarkdown

Request Parameters:

ParameterValueDescription
tokenAPI TokenScrape.do authentication
urlIndeed Search URLTarget job search page
supertrueUses residential proxies
geoCodeusUS-based content
rendertrueJavaScript rendering enabled
devicemobileMobile viewport for cleaner HTML
outputmarkdownLightweight text output

Function: Fetches Indeed job listings with anti-bot bypass, returning clean markdown for easy parsing.


3. 📋 Parse Indeed Jobs

PropertyValue
TypeCode Node (JavaScript)
PurposeExtracts structured job data from markdown
ModeRun once for all items

Extracted Fields:

FieldDescriptionExample
jobTitlePosition title”Senior Data Engineer”
jobUrlIndeed job linkhttps://indeed.com/viewjob?jk=abc123
jobIdIndeed job identifier”abc123”
companyNameHiring company”Acme Corporation”
locationCity, State”San Francisco, CA”
salaryPay range”$120,000 - $150,000”
jobTypeEmployment type”Full-time”
sourceData source”Indeed”
dateFoundScrape date”2025-01-15”

Function: Parses markdown using regex patterns, filters invalid entries, and deduplicates by company name.


4. 📊 Add New Company (Google Sheets)

PropertyValue
TypeGoogle Sheets Node
PurposeStores parsed job postings for tracking
OperationAppend rows
Target Sheet”Add New Company”

Function: Creates a historical record of all discovered job postings and companies for pipeline tracking.


PropertyValue
TypeHTTP Request (POST)
PurposeEnriches company data via Apollo.io API
Endpointhttps://api.apollo.io/v1/organizations/search
AuthenticationHTTP Header Auth (x-api-key)

Request Body:

{
  "q_organization_name": "Company Name",
  "page": 1,
  "per_page": 1
}

Response Fields:

FieldDescription
idApollo organization ID
nameOfficial company name
website_urlCompany website
linkedin_urlLinkedIn company page
industryBusiness sector
estimated_num_employeesCompany size
founded_yearYear established
city, state, countryLocation details
short_descriptionCompany overview

Function: Retrieves comprehensive company intelligence including LinkedIn profiles, industry classification, and employee count.


6. 📤 Extract Apollo Org Data

PropertyValue
TypeCode Node (JavaScript)
PurposeParses Apollo response and merges with original data
ModeRun once for each item

Function: Extracts relevant fields from Apollo API response and combines with job posting data for downstream processing.


PropertyValue
TypeHTTP Request (POST)
PurposeFinds decision-makers at target companies
Endpointhttps://api.apollo.io/v1/mixed_people/search
AuthenticationHTTP Header Auth (x-api-key)

Request Body:

{
  "organization_ids": ["apollo_org_id"],
  "person_titles": [
    "CTO",
    "Chief Technology Officer",
    "VP Engineering",
    "Head of Engineering",
    "Engineering Manager",
    "Technical Director",
    "CEO",
    "Founder"
  ],
  "page": 1,
  "per_page": 3
}

Response Fields:

FieldDescription
first_nameContact first name
last_nameContact last name
titleJob title
emailEmail address
linkedin_urlLinkedIn profile URL
phone_numberDirect phone

Function: Identifies key stakeholders and decision-makers based on configurable title filters.


8. 📝 Format Leads

PropertyValue
TypeCode Node (JavaScript)
PurposeStructures lead data for outreach
ModeRun once for all items

Function: Combines person data with company context, creating comprehensive lead profiles ready for personalization.


9. 🤖 Generate Personalized Message (OpenAI)

PropertyValue
TypeOpenAI Node
PurposeCreates custom LinkedIn connection messages
Modelgpt-4o-mini
Max Tokens150
Temperature0.7

System Prompt:

You are a professional outreach specialist. Write personalized LinkedIn connection request messages. Keep messages under 300 characters. Be friendly, professional, and mention a specific reason for connecting based on their role and company.

User Prompt Variables:

VariableSource
Name$json.fullName
Title$json.title
Company$json.companyName
Industry$json.industry
Job Context$json.jobTitle

Function: Generates unique, contextual outreach messages that reference specific hiring activity and company details.


10. 🔗 Merge Lead + Message

PropertyValue
TypeCode Node (JavaScript)
PurposeCombines lead data with generated message
ModeRun once for each item

Function: Merges OpenAI response with lead profile, creating the final enriched record.


11. 💾 Save Leads to Sheet

PropertyValue
TypeGoogle Sheets Node
PurposeStores final lead data with personalized messages
OperationAppend rows
Target Sheet”Leads”

Data Mapping:

ColumnData
First NameLead’s first name
Last NameLead’s last name
TitleJob title
CompanyCompany name
LinkedIn URLProfile link
CountryLocation
IndustryBusiness sector
Date AddedTimestamp
Source”Indeed + Apollo”
Personalized MessageAI-generated outreach text

Function: Creates actionable lead database ready for outreach campaigns.


Workflow Flow

⏰ Schedule Trigger


🔍 Scrape.do Indeed API ──► Fetches job listings with JS rendering


📋 Parse Indeed Jobs ──► Extracts company names, job details


📊 Add New Company ──► Saves to Google Sheets (Companies)


🏢 Apollo Org Search ──► Enriches company data


📤 Extract Apollo Org Data ──► Parses API response


👥 Apollo People Search ──► Finds decision-makers


📝 Format Leads ──► Structures lead profiles


🤖 Generate Personalized Message ──► AI creates custom outreach


🔗 Merge Lead + Message ──► Combines all data


💾 Save Leads to Sheet ──► Final storage (Leads)

Configuration Requirements

API Keys & Credentials

CredentialPurposeWhere to Get
Scrape.do API TokenWeb scraping with anti-bot bypassscrape.do/dashboard
Apollo.io API KeyB2B data enrichmentapollo.io/settings/integrations
OpenAI API KeyAI message generationplatform.openai.com
Google Sheets OAuth2Data storagen8n Credentials Setup

n8n Credential Setup

Credential TypeConfiguration
HTTP Header Auth (Apollo)Header: x-api-key, Value: Your Apollo API key
OpenAI APIAPI Key: Your OpenAI API key
Google Sheets OAuth2Complete OAuth flow with Google

Key Features

🔍 Intelligent Job Scraping

🏢 B2B Data Enrichment

🤖 AI-Powered Personalization

📊 Automated Data Management


Use Cases

🎯 Sales Prospecting

👥 Recruiting & Talent Acquisition

📈 Market Intelligence

🤝 Partnership Development


Technical Notes

SpecificationValue
Processing Time2-5 minutes per run (depending on job count)
Jobs per Run~25 unique companies
API Calls per Run1 Scrape.do + ~25 Apollo Org + ~25 Apollo People + ~75 OpenAI
Data Accuracy90%+ for company matching
Success Rate99%+ with proper error handling

Rate Limits to Consider

ServiceFree Tier LimitRecommendation
Scrape.do1,000 credits/month~40 runs/month
Apollo.io100 requests/dayAdd Wait nodes if needed
OpenAIBased on usageMonitor costs (~$0.01-0.05/run)
Google Sheets300 requests/minuteNo issues expected

Setup Instructions

Step 1: Import Workflow

  1. Copy the JSON workflow configuration
  2. In n8n: Workflows → Import from JSON
  3. Paste configuration and save

Step 2: Configure Scrape.do

  1. Sign up at scrape.do
  2. Navigate to Dashboard → API Token
  3. Copy your token
  4. Token is embedded in URL query parameter (already configured)

To customize search:

Change the `url` parameter in "Scrape.do Indeed API" node:
- q=data+engineer (search term)
- l=Remote (location)
- fromage=7 (last 7 days)

Step 3: Configure Apollo.io

  1. Sign up at apollo.io
  2. Go to Settings → Integrations → API Keys
  3. Create new API key
  4. In n8n: Credentials → Add Credential → Header Auth
    • Name: x-api-key
    • Value: Your Apollo API key
  5. Select this credential in both Apollo HTTP nodes

Step 4: Configure OpenAI

  1. Go to platform.openai.com
  2. Create new API key
  3. In n8n: Credentials → Add Credential → OpenAI
  4. Paste API key
  5. Select credential in “Generate Personalized Message” node

Step 5: Configure Google Sheets

  1. Create new Google Spreadsheet
  2. Create two sheets:
    • Sheet 1: “Add New Company”
      • Columns: companyName | jobTitle | jobUrl | location | salary | source | postedDate
    • Sheet 2: “Leads”
      • Columns: First Name | Last Name | Title | Company | LinkedIn URL | Country | Industry | Date Added | Source | Personalized Message
  3. Copy Sheet ID from URL
  4. In n8n: Credentials → Add Credential → Google Sheets OAuth2
  5. Update both Google Sheets nodes with your Sheet ID

Step 6: Test and Activate

  1. Manual Test: Click “Execute Workflow” button
  2. Verify Each Node: Check outputs step by step
  3. Review Data: Confirm data appears in Google Sheets
  4. Activate: Toggle workflow to “Active”

Error Handling

Common Issues

IssueCauseSolution
”Invalid character: [“Empty/malformed company nameCheck Parse Indeed Jobs output
”Node does not have credentials”Credential not linkedOpen node → Select credential
Empty Parse ResultsIndeed HTML structure changedCheck Scrape.do raw output
Apollo Rate Limit (429)Too many requestsAdd 5-10s Wait node between calls
OpenAI TimeoutToo many tokensReduce batch size or max_tokens
”Your request is invalid”Malformed JSON bodyVerify expression syntax in HTTP nodes

Troubleshooting Steps

  1. Verify Credentials: Test each credential individually
  2. Check Node Outputs: Use “Execute Node” for debugging
  3. Monitor API Usage: Check Apollo and OpenAI dashboards
  4. Review Logs: Check n8n execution history for details
  5. Test with Sample: Use known company name to verify Apollo

For production use, consider adding:

- IF node after Apollo Org Search to handle empty results
- Error Workflow trigger for notifications
- Wait nodes between API calls for rate limiting
- Retry logic for transient failures

Performance Specifications

MetricValue
Execution Time2-5 minutes per scheduled run
Jobs Discovered~25 per Indeed page
Leads Generated1-3 per company (based on title matches)
Message QualityProfessional, contextual, <300 chars
Data FreshnessReal-time from Indeed + Apollo
Storage FormatGoogle Sheets (unlimited rows)

API Reference

Scrape.do API

EndpointMethodPurpose
https://api.scrape.doGETDirect URL scraping

Documentation: scrape.do/documentation

Apollo.io API

EndpointMethodPurpose
/v1/organizations/searchPOSTCompany lookup
/v1/mixed_people/searchPOSTPeople search

Documentation: apolloio.github.io/apollo-api-docs

OpenAI API

EndpointMethodPurpose
/v1/chat/completionsPOSTMessage generation

Documentation: [platform.openai.com

🔗 Nodes Used

Google Sheets, HTTP Request, OpenAI, n8n Form Trigger

📥 Import

Download workflow.json and import into n8n: Workflow menu → Import from File

📖 Importing guide · 🔑 Credential setup