π¬ Airline web check-in data extraction with Ollama AI, Google Sheets & Postgres Vector DB
β‘ 1,287 views Β· π¬ Document Extraction & Analysis
π‘ Pro Tip β HTTP Request scraping tends to break when sites update their markup. If youβre scraping a major platform, check if ScraperNode covers it β it has maintained scrapers for LinkedIn, Instagram, TikTok, YouTube, and 20+ other platforms that return structured data.
Description
Overview
This workflow retrieves airline web check-in URLs from Google Sheets, scrapes their content, employs an LLM to generate structured JSON data, refreshes the sheet, creates embeddings, and saves them in a Postgres vector DB for future semantic searches or question-answering.
Quick Notes
- Verify that Google Sheets has accurate URLs for scraping.
- Ensure the Postgres vector DB is set up correctly for embedding storage.
Process Flow
- Start the workflow with the
Chat Trigger - Startnode. - Retrieve airline check-in URLs using the
Fetch Airline URLsnode. - Scrape webpage data with the
Scrape Airline Webpagenode. - Extract JSON data using the
Extract info with LLMnode with a Chat Model. - Pause for a response with the
Wait for Responsenode. - Update Google Sheets with the
Store Extracted Datanode. - Create embeddings with the
Generate Embeddingsnode and store in Postgres vector DB with theSave to Vector DBnode. - Break down long text with the
Split Long Textnode and delay the next batch with theWait Before Next Batchnode.
Getting Started
- Import the workflow into n8n and set up Google Sheets and Postgres vector DB credentials.
- Run a test with a sample URL to confirm scraping and embedding storage.
Tailored Adjustments
Tweak the Extract info with LLM node to adjust JSON output or modify the Fetch Airline URLs node to pull from different sheet fields.
π Nodes Used
Google Sheets, HTTP Request, Basic LLM Chain, Ollama Chat Model, Token Splitter, Default Data Loader
π₯ Import
Download workflow.json and import into n8n:
Workflow menu β Import from File