🔬 Transcribing Telegram voice messages using Whisper and Gemini with a fallback mechanism

341 views · 🔬 Document Extraction & Analysis

Description

🎙️ n8n Workflow: Voice Message Transcription with Access Control

This n8n workflow enables automated transcription of voice messages in Telegram groups with built-in access control and intelligent fallback mechanisms. It’s designed for teams that need to convert audio messages to text while maintaining security and handling various audio formats.


📌 Section 1: Trigger & Access Control

⚡ Receive Message (Telegram Trigger)

Purpose: Captures incoming messages from users in your Telegram group.

How it works: When a user sends a message (voice, audio, or text), the workflow is triggered and the sender’s information is captured.

Benefit: Serves as the entry point for the entire transcription pipeline.

🔐 Sender Verification

Purpose: Validates whether the sender has permission to use the transcription service.

Logic: Check sender against authorized users list If authorized → Proceed to next step If not authorized → Send “Access denied” message and stop workflow

Benefit: Prevents unauthorized users from consuming AI credits and accessing the service.


📌 Section 2: Message Type Detection

🎵 Audio/Voice Recognition

Purpose: Identifies the type of incoming message and audio format.

Why it’s needed: Telegram handles different audio types with different statuses:

Process:

  1. Check if message contains audio/voice content
  2. If no audio file detected → Send “No audio file found” message
  3. If audio detected → Assign file ID and proceed to format detection

🧩 File Type Determination (IF Node)

Purpose: Identifies the specific audio format for proper processing.

Supported formats:

Logic:

If format recognized → Proceed to transcription If format not recognized → Send “File format not recognized” message

Benefit: Ensures compatibility with transcription services by validating file types upfront.


📌 Section 3: Primary Transcription (OpenAI)

📥 File Download

Purpose: Downloads the audio file from Telegram for processing.

🤖 OpenAI Transcription

Purpose: Transcribes audio to text using OpenAI’s Whisper API.

Why OpenAI: High-quality transcription with cost-effective pricing.

Process:

  1. Send downloaded file to OpenAI transcription API
  2. Simultaneously send notification: “Transcription started”
  3. If successful → Assign transcribed text to variable and proceed
  4. If error occurs → Trigger fallback mechanism

Benefit: Fast, accurate transcription with multi-language support.


📌 Section 4: Fallback Transcription (Gemini)

🛟 Gemini Backup Transcription

Purpose: Provides a safety net if OpenAI transcription fails.

Process:

  1. Receives file only if OpenAI node returns an error
  2. Downloads and processes the same audio file
  3. Sends to Google Gemini for transcription
  4. Assigns transcribed text to the same text variable

Benefit: Ensures high reliability—if one service fails, the other takes over automatically.


📌 Section 5: Message Length Handling

📏 Text Length Check (IF Node)

Purpose: Determines if the transcribed text exceeds Telegram’s character limit.

Logic:

If text ≤ 4000 characters → Send directly to Telegram If text > 4000 characters → Split into chunks

Why: Telegram has a 4,000-character limit per message.

✂️ Text Splitting (Code Node)

Purpose: Breaks long transcriptions into 4,000-character segments.

Process:

  1. Receives text longer than 4,000 characters
  2. Splits text into chunks of ≤4,000 characters
  3. Maintains readability by avoiding mid-word breaks
  4. Outputs array of text chunks

📌 Section 6: Response Delivery

💬 Send Transcription (Telegram Node)

Purpose: Delivers the transcribed text back to the Telegram group.

Behavior:

Benefit: Users receive complete transcriptions regardless of length, ensuring no content is lost.


📊 Workflow Overview Table

SectionNode NamePurpose
1. TriggerReceive MessageCaptures incoming Telegram messages
2. Access ControlSender VerificationValidates user permissions
3. DetectionAudio/Voice RecognitionIdentifies message type and audio format
4. ValidationFile Type CheckVerifies supported audio formats
5. DownloadFile DownloadRetrieves audio file from Telegram
6. Primary AIOpenAI TranscriptionMain transcription service
7. Fallback AIGemini TranscriptionBackup transcription service
8. ProcessingText Length CheckDetermines if splitting is needed
9. SplittingCode NodeBreaks long text into chunks
10. ResponseSend to TelegramDelivers transcribed text

🎯 Key Benefits


🚀 Use Cases

🔗 Nodes Used

Telegram, Telegram Trigger, OpenAI, Google Gemini

📥 Import

Download workflow.json and import into n8n: Workflow menu → Import from File

📖 Importing guide · 🔑 Credential setup