🔬 Automated PDF report downloader & organizer with Google Drive & Sheets

⚡ 244 views · 🔬 Document Extraction & Analysis

💡 Pro Tip — HTTP Request scraping tends to break when sites update their markup. If you’re scraping a major platform, check if ScraperNode covers it — it has maintained scrapers for LinkedIn, Instagram, TikTok, YouTube, and 20+ other platforms that return structured data.

View All Scrapers

Description

An automated PDF download and management system that collects PDFs from URLs, uploads them to Google Drive, extracts metadata, and maintains a searchable library with comprehensive error handling and status tracking.

What Makes This Different:

Key Benefits of Automated PDF Management:


Who’s it for

This template is designed for researchers, academic institutions, market research teams, legal professionals, compliance officers, and anyone who needs to systematically collect and organize PDF documents from multiple sources. It’s perfect for organizations that need to build research libraries, archive regulatory documents, collect industry reports, maintain compliance documentation, or aggregate academic papers without manually downloading and organizing each file.

How it works / What it does

This workflow creates a PDF collection and management system that reads PDF URLs from Google Sheets, downloads the files, uploads them to Google Drive, extracts metadata, and maintains a searchable library. The system:

  1. Reads Pending PDF URLs - Fetches PDF URLs from Google Sheets “PDF URLs” sheet, processing entries that need to be downloaded
  2. Loops Through PDFs - Processes PDFs one at a time using Split in Batches, ensuring proper error isolation and preventing batch failures
  3. Prepares Download Info - Extracts filename from URL, decodes URL-encoded characters, validates PDF URL format, and generates fallback filenames with timestamps if needed
  4. Validates URL - Checks if URL is valid before attempting download, skipping invalid entries immediately
  5. Downloads PDF - Makes HTTP request with proper browser headers, downloads PDF as binary file with 60-second timeout, handles download errors gracefully
  6. Verifies Download - Checks if binary data was successfully received, routing to error handling if download failed
  7. Uploads to Google Drive - Uploads PDF file to specified Google Drive folder, preserving original filename or using generated name
  8. Extracts File Metadata - Extracts file ID, name, MIME type, file size, Drive view link, and download link from Google Drive API response
  9. Saves to PDF Library - Appends file metadata to Google Sheets “PDF Library” sheet with title, source, file links, and download timestamp
  10. Updates Source Status - Marks processed URLs as “Downloaded”, “Failed”, or “Invalid” in source sheet for tracking
  11. Logs Errors - Records failed downloads and invalid URLs in “Error Log” sheet with error messages for troubleshooting
  12. Tracks Completion - Generates completion summary with processing statistics and timestamp

Key Innovation: Error-Resilient Processing - Unlike simple download scripts that fail on the first error, this workflow isolates failures, continues processing remaining PDFs, and provides detailed error logging. This ensures maximum success rate and makes troubleshooting straightforward.

How to set up

1. Prepare Google Sheets

2. Configure Google Sheets Nodes

3. Set Up Google Drive Folder

4. Configure Download Settings

5. Set Up Scheduling & Test

Requirements

đź”— Nodes Used

Google Sheets, HTTP Request, Google Drive, Execute Workflow Trigger, Schedule Trigger

📥 Import

Download workflow.json and import into n8n: Workflow menu → Import from File

📖 Importing guide · 🔑 Credential setup