πŸ”¬ Analyze images, videos, documents & audio with Gemini Tools and Qwen LLM Agent

⚑ 4,379 views Β· πŸ”¬ Document Extraction & Analysis

Description

πŸ“ Analyze uploaded images, videos, audio, and documents with specialized tools β€” powered by a lightweight language-only agent.


🧭 What It Does

This workflow enables multimodal file analysis using Google Gemini tools connected to a text-only LLM agent. Users can upload images, videos, audio files, or documents via a chat interface. The workflow will:


πŸš€ Use Cases


🎯 Why This Architecture Matters

Unlike end-to-end multimodal LLMs (like Gemini 1.5 or GPT-4o), this template:

βœ… Advantages

FeatureBenefit
🧩 ModularLLM + Tools are decoupled; can update them independently
πŸ’Έ Cost-EfficientNo need to pay for full multimodal models; only use tools when needed
πŸ”§ Tool-based ReasoningAgent invokes tools on demand, just like OpenAI’s Toolformer setup
⚑ FastGroq LLMs offer ultra-fast responses with low latency
πŸ“š MemoryIncludes context buffer for multi-turn chats (15 messages)

πŸ§ͺ How It Works

πŸ”Ή Input via Chat

πŸ”Ή File Handling

πŸ”Ή Prompt Construction

πŸ”Ή Agent Reasoning

The agent autonomously decides whether and how to use tools, then responds with concise output.


🧱 Nodes & Services

CategoryNode / ToolPurpose
Chat InputchatTriggerUser interface with file support
File ProcessingsplitOut, splitInBatchesProcess each uploaded file
UploadgoogleGeminiUploads each file to Gemini, gets URL
Metadataset, aggregateBuilds structured file info
AI AgentLangchain AgentReceives context + file data
ToolsgoogleGeminiToolAnalyze media with Gemini
LLMlmChatGroq (Qwen 32B)Text reasoning, high-speed
MemorymemoryBufferWindowMaintains session context

βš™οΈ Setup Instructions

1. πŸ”‘ Required Credentials

2. 🧩 Nodes That Need Setup

3. ⚠️ File Size & Format Considerations


πŸ› οΈ Optional Improvements


πŸ§ͺ Example Use Case

> β€œHola, ΒΏquΓ© dice este PDF?”

Uploads a document β†’ Agent routes it to Gemini DOCUMENT tool β†’ Receives extracted content β†’ LLM summarizes it in Spanish.


🧰 Tags

multimodal, agent, langchain, groq, gemini, image analysis, audio analysis, document parsing, video analysis, file uploader, chat assistant, LLM tools, memory, AI tools

πŸ“‚ Files

πŸ”— Nodes Used

AI Agent, Simple Memory, Chat Trigger, Groq Chat Model, Google Gemini

πŸ“₯ Import

Download workflow.json and import into n8n: Workflow menu β†’ Import from File

πŸ“– Importing guide Β· πŸ”‘ Credential setup