๐Ÿ”ฌ Convert PDF, DOC, and images to Markdown using Datalab.to API

โšก 1,040 views ยท ๐Ÿ”ฌ Document Extraction & Analysis

Description

This n8n workflow converts various file formats (.pdf, .doc, .png, .jpg, .webp) to clean markdown text using the datalab.to API. Perfect for AI agents, LLM processing, and RAG (Retrieval Augmented Generation) data preparation for vector databases.

Workflow Description

Input

Processing Steps

  1. File Validation: Check file type and size constraints
  2. HTTP Request Node:
    • Method: POST to https://api.datalab.to/v1/marker
    • Headers: X-API-Key with your datalab.to API key
    • Body: Multipart form data with the file
  3. Response Processing: Extract the converted markdown text
  4. Output Formatting: Clean and structure the markdown for downstream use

Output

Setup Instructions

  1. Get API Access: Sign up at datalab.to to obtain your API key
  2. Configure Credentials:
    • Create a new credential in n8n
    • Add Generic Header: X-API-Key with your API key as the value
  3. Import Workflow: Ready to process files immediately

Use Cases

The workflow handles the complexity of different file formats while delivering consistent, AI-ready markdown output for your automation needs.

๐Ÿ”— Nodes Used

HTTP Request, n8n Form Trigger

๐Ÿ“ฅ Import

Download workflow.json and import into n8n: Workflow menu โ†’ Import from File

๐Ÿ“– Importing guide ยท ๐Ÿ”‘ Credential setup