πŸ”¬ Process WhatsApp PDFs with AWS Textract OCR via S3

⚑ 19 views Β· πŸ”¬ Document Extraction & Analysis

Description

This n8n template demonstrates how to automatically extract text content from PDF documents received via WhatsApp messages using OCR.

It is designed for use cases where users submit documents through WhatsApp and the document content needs to be digitized for further processing β€” such as document analysis, AI-powered workflows, compliance checks, or data ingestion.

Good to know

How it works

  1. The workflow is triggered when an incoming WhatsApp message containing a PDF document is received.
  2. The PDF file is downloaded from WhatsApp’s media endpoint using an HTTP Request node.
  3. The downloaded PDF is uploaded to an AWS S3 bucket to make it accessible for OCR processing.
  4. AWS Textract is invoked to analyze the PDF stored in S3 and extract all readable text content.
  5. The Textract response is parsed and consolidated into a clean, ordered text output representing the PDF’s content.

How to use

Requirements

Customising this workflow

πŸ”— Nodes Used

HTTP Request, AWS S3, WhatsApp Trigger

πŸ“₯ Import

Download workflow.json and import into n8n: Workflow menu β†’ Import from File

πŸ“– Importing guide Β· πŸ”‘ Credential setup