🎬 Narrating over a video using multimodal AI

⚑ 9,818 views · 🎬 Content Creation & Video

Description

This n8n template takes a video and extracts frames from it which are used with a multimodal LLM to generate a script. The script is then passed to the same multimodal LLM to generate a voiceover clip.

This template was inspired by Processing and narrating a video with GPT’s visual capabilities and the TTS API

How it works

Sample the finished product here: https://drive.google.com/file/d/1-XCoii0leGB2MffBMPpCZoxboVyeyeIX/view?usp=sharing

Requirements

Customising this workflow

πŸ”— Nodes Used

Edit Image, HTTP Request, Google Drive, Basic LLM Chain, OpenAI Chat Model, Convert to File

πŸ“₯ Import

Download workflow.json and import into n8n: Workflow menu β†’ Import from File

πŸ“– Importing guide Β· πŸ”‘ Credential setup