Create Video Narration with n8n, OpenAI & Python

This workflow automates video narration by extracting frames using Python, generating scripts with OpenAI’s GPT-4o, and converting them into voiceovers. Save hours by transforming videos into narrated audio clips effortlessly.
lmChatOpenAi
code
httpRequest
+9
Learn how to Build this Workflow with AI:
Workflow Identifier: 1345
NODES in Use: Manual Trigger, HTTP Request, Code, Split Out, Split In Batches, Convert To Binary, Edit Image, Aggregate, Chain LLM, OpenAI, Wait, Google Drive

Press CTRL+F5 if the workflow didn't load.

Visit through Desktop for Best experience

Opening Problem Statement

Meet Anna, a documentary filmmaker who spends countless hours manually scripting voiceovers for her nature videos. For her latest project, she has a captivating 3-minute video of Indian street scenes. But writing a narration that truly matches the visual story requires carefully watching the footage multiple times and crafting the script line by line. This process is time-consuming, error-prone, and often delayed because she juggles many projects simultaneously.

Anna estimates she spends over 6 hours per video just on scripting, with frequent rewrites adding frustration. What if she could automate this entire narration process and get a ready-to-use script and voiceover audio with minimal effort? This is exactly the scenario our n8n workflow tackles: turning any video into a narrated audio clip rapidly and with precision.

What This Automation Does

When you run this workflow, several clearly defined processes occur that transform a raw video into a narrated audio file ready for use or sharing:

  • Download a video file automatically from a public URL (in the demo case, a sample from Pixabay).
  • Extract up to 90 evenly distributed video frames using Python and OpenCV to capture the essence of the visual story.
  • Batch these frames in groups of 15 to manage token limits for language model processing.
  • Generate a continuous narration script in the style of David Attenborough using OpenAI’s GPT-4o multimodal large language model (LLM) that understands images.
  • Convert the full script into an MP3 voiceover using OpenAI’s text-to-speech (TTS) capabilities.
  • Upload the resulting audio file to Google Drive for easy access and sharing.

This process saves hours of manual scripting and recording, allowing creators like Anna to focus on editing and creative direction.

Prerequisites ⚙️

  • n8n account — set up and ready to build workflows.
  • OpenAI API credentials — to use GPT-4o for script generation and TTS for audio creation.
  • Google Drive account with API credentials — for uploading the final audio clip.
  • Python environment in n8n Code node — for running OpenCV frame extraction.
  • Ability to download videos via HTTP Request node.

Step-by-Step Guide

Step 1: Trigger the Workflow Manually

Open n8n, then navigate and add the Manual Trigger node labeled “When clicking ‘Test workflow’”. This node allows us to kick off the process whenever we want.

After configuring, test the trigger by clicking “Execute Workflow”. You should see it successfully activate in the n8n editor.

Common mistake: Forgetting to connect this node to the next step will halt execution.

Step 2: Download the Source Video

Add the HTTP Request node named “Download Video”. Configure the URL to https://cdn.pixabay.com/video/2016/05/12/3175-166339863_small.mp4. Leave the method as GET.

This downloads the video to be processed. Run this node alone to verify it downloads without error.

Common mistake: Using an unsupported video format or URL will cause downstream errors in frame extraction.

Step 3: Extract Evenly Distributed Frames from Video

Insert a Code node (Python) named “Capture Frames” connected to the HTTP Download node. Here, we run a Python script using OpenCV to:

  • Decode the base64 video data
  • Load the video into OpenCV
  • Calculate frame count to take up to 90 frames evenly spaced
  • Convert each selected frame to a base64-encoded JPEG image string
  • Output a list of these base64 frames

Copy/Paste the Python code provided in the workflow:

import cv2
import numpy as np
import base64

def extract_evenly_distributed_frames_from_base64(base64_string, max_frames=90):
    video_bytes = base64.b64decode(base64_string)
    video_path = '/tmp/temp_video.mp4'
    with open(video_path, 'wb') as video_file:
        video_file.write(video_bytes)
    video_capture = cv2.VideoCapture(video_path)
    total_frames = int(video_capture.get(cv2.CAP_PROP_FRAME_COUNT))
    step_size = max(1, total_frames // (max_frames - 1))
    selected_frames_base64 = []
    for i in range(0, total_frames, step_size):
        video_capture.set(cv2.CAP_PROP_POS_FRAMES, i)
        ret, frame = video_capture.read()
        if ret:
            frame_base64 = convert_frame_to_base64(frame)
            selected_frames_base64.append(frame_base64)
        if len(selected_frames_base64) >= max_frames:
            break
    video_capture.release()
    return selected_frames_base64

def convert_frame_to_base64(frame):
    ret, buffer = cv2.imencode('.jpg', frame)
    if not ret:
        return None
    frame_base64 = base64.b64encode(buffer).decode('utf-8')
    return frame_base64

base64_video = _input.item.binary.data.data
frames_base64 = extract_evenly_distributed_frames_from_base64(base64_video, max_frames=90)

return { "output": frames_base64 }

Execute this node. It may take 1-2 minutes for a 3MB video.

Common mistake: Not having OpenCV installed or not setting “mode” to runOnceForEachItem could cause errors.

Step 4: Split Extracted Frames into Individual Items

Add the Split Out node named “Split Out Frames” to separate the array of base64 frames into individual outputs for batch processing.

Run this node and check outputs show individual frames.

Step 5: Batch Frames for LLM Processing

Add a Split In Batches node “For Every 15 Frames” and set the batch size to 15. This allows sending chunks of images to OpenAI to adhere to token limits.

Common mistake: Setting batch size too large may hit LLM’s limits; too small will increase calls and cost.

Step 6: Convert Base64 Frames to Binary Images

Use Convert To Binary node “Convert to Binary” to transform base64 strings into binary image files for resizing and LLM input.

Step 7: Resize Frames for Optimal Input

Add Edit Image node “Resize Frame”. Set width and height to 768px, format JPEG. This optimizes image size for OpenAI models.

Step 8: Aggregate Resized Frames

Connect an Aggregate node “Aggregate Frames” to combine resized frame binaries into one payload for the OpenAI model.

Step 9: Generate Narration Script Using Multimodal LLM

Configure the Chain LLM node “Generate Narration Script” with the prompt to create a short voiceover script in the style of David Attenborough. It uses the binary image batch inputs and remembers previous parts to produce a cohesive script.

Important prompt snippet:
These are frames of a video. Create a short voiceover script in the style of David Attenborough. Only include the narration.

This node loops through batches, generating partial scripts, maintaining context to build a single continuous narration.

Step 10: Wait Node to Manage Rate Limits

Include a Wait node “Stay Within Service Limits” to prevent hitting API rate limits. Adjust or remove based on your OpenAI plan.

Step 11: Combine Text Scripts into Full Narration

Use an Aggregate node “Combine Script” to merge all partial scripts into one complete text output.

Step 12: Generate Voice Over Audio from Script

Add the OpenAI node “Use Text-to-Speech” configured to use the text-to-speech resource. Input is the combined text, output format is MP3.

Example input:
{{ $json.data.map(item => item.text).join('n') }}

Step 13: Upload Audio File to Google Drive

Finally, add the Google Drive node “Upload to GDrive” to save the MP3 audio clip. Use dynamic naming like “narrating-video-using-vision-ai-20240725123000.mp3” and specify the target folder ID.

You can then access or share the audio file easily.

Customizations ✏️

  • Change Narration Style: In the “Generate Narration Script” node, modify the prompt text from “David Attenborough” to any other narrator or style for different voiceover tone.
  • Adjust Frame Count: In the “Capture Frames” Python node, change max_frames=90 to a higher or lower number to affect detail vs. processing time.
  • Batch Size Tuning: In “For Every 15 Frames”, modify batchSize for different token management. Larger batches reduce API calls but can hit limits.
  • Change Upload Destination: Alter the “Upload to GDrive” node folderId to save audio in a different Google Drive folder or use an alternate cloud storage integration.
  • Skip Frame Resizing: Remove or adjust the “Resize Frame” node if you want full resolution frames processed by the model, noting increased payload size.

Troubleshooting 🔧

Problem: “Video frame extraction fails with OpenCV error.”
Cause: Video format unsupported or missing OpenCV Python dependencies.
Solution: Ensure the video URL links to an MP4 or supported format. Verify Python environment has OpenCV installed.

Problem: “OpenAI API rate limit exceeded.”
Cause: Too many requests sent in quick succession.
Solution: Use the “Stay Within Service Limits” Wait node or increase your OpenAI plan.

Problem: “Google Drive upload fails with authorization error.”
Cause: Expired or wrong Google Drive OAuth credentials.
Solution: Re-authenticate Google Drive credentials in n8n and check folder permissions.

Pre-Production Checklist ✅

  • Test video download URL for accessibility and format compatibility.
  • Run Python code in Capture Frames node standalone to confirm frame extraction.
  • Validate OpenAI API credentials and quota.
  • Check Google Drive connection and folder ID correctness.
  • Execute the workflow with a small video sample first to verify timing and outputs.
  • Backup existing generated files before batch runs.

Deployment Guide

Once tested, activate the workflow in n8n. Use manual trigger or schedule as needed for batch processing projects.

Monitor the execution logs in n8n’s editor for any API errors or performance issues. Adjust the Wait node timing for rate compliance.

You can self-host n8n using platforms like Hostinger for more control and scalability.

FAQs

Can I use a different video source?
Yes, any accessible video URL in MP4 format supported by OpenCV should work fine.

Does this consume a lot of OpenAI credits?
The frame batching and script generation uses tokens proportionally to video length and batch size, so monitor usage accordingly.

Is the audio file secure?
Files are uploaded to your Google Drive account under your control, ensuring privacy and security.

Can I increase the number of frames for better narration?
Yes, but be mindful of processing time and memory usage. The default 90 frames balance detail and performance.

Conclusion

By following this detailed tutorial, you’ve created an automated video narration pipeline that downloads a video, extracts key frames, leverages OpenAI’s multimodal GPT-4o to write an engaging script, and converts it to a professional voiceover audio clip — all deployed with n8n workflow automation.

This saves filmmakers, marketers, and content creators hours of tedious scripting and recording, accelerating project turnaround and creative focus.

Next steps? Try customizing narration style, adding subtitles from the script, or incorporating different AI voices for varied effects. Automation with n8n and OpenAI opens many creative doors!

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation in n8n (Beginner Guide)

A complete beginner guide to building an AI-powered SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free