๐Ÿ”ฌ Extract links and URLs from PDF documents using PDF.co

โšก 339 views ยท ๐Ÿ”ฌ Document Extraction & Analysis

Description

๐Ÿ“ Description

This workflow allows you to extract all links (URLs) contained in a PDF file by converting it to HTML via PDF.co and then extracting the URLs present in the resulting HTML.

Unlike the traditional Read PDF node, which only returns visible link text, this flow provides the full active URLs, making further processing and analysis easier.


๐Ÿ“Œ Use Cases


๐Ÿ”— Workflow Overview

  1. User uploads a PDF file via a web form.
  2. The PDF is uploaded to PDF.co.
  3. The PDF is converted to HTML (preserving links).
  4. The converted HTML is downloaded.
  5. URLs are extracted from the HTML using a custom code node.

โš™๏ธ Node Breakdown

1. Load PDF (formTrigger)

2. Upload (PDF.co API)

3. PDF to HTML (PDF.co API)

4. Get HTML (HTTP Request)

5. Code1 (Function / Code)


๐Ÿ“Ž Requirements


๐Ÿ› ๏ธ Suggested Next Steps


๐Ÿ“ค Importing the Template

Import this workflow into n8n via Import workflow and paste the provided JSON.


If you want help adding extra steps or optimizing the URL extraction, just ask!


If you want, I can also prepare this as a Canva visual template for you. Would you like that?

๐Ÿ”— Nodes Used

HTTP Request, n8n Form Trigger

๐Ÿ“ฅ Import

Download workflow.json and import into n8n: Workflow menu โ†’ Import from File

๐Ÿ“– Importing guide ยท ๐Ÿ”‘ Credential setup