Automate Image Captioning with Google Gemini and n8n

This workflow solves the problem of manually creating captions for images by using Google’s powerful Gemini AI model within n8n to generate accurate, creative captions and overlay them on images automatically, saving hours of manual work.
manualTrigger
editImage
chainLlm
+6
Workflow Identifier: 1207
NODES in Use: Manual Trigger, HTTP Request, Edit Image, Langchain Chain LLM, Google Gemini Chat Model, Structured Output Parser, Code, Merge, Sticky Note

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

Opening Problem Statement

Meet Sarah, a content creator for a digital magazine. Every week, Sarah manually finds images and crafts captions that fit both the visual content and editorial tone. Although skilled, she spends a frustrating 3-4 hours each week just captioning images for her posts. On busy days, errors slip in—captions that don’t quite match the image context or awkwardly worded titles leading to viewer confusion. This manual process not only wastes valuable time but also delays her publishing schedule and reduces overall creative output.

What if Sarah could fully automate creating descriptive, witty captions directly onto her images without compromising quality or style? Imagine how much time she would reclaim, how much consistency she could maintain, and how professional her posts would instantly look. This exact challenge is what the n8n workflow utilizing Google Gemini solves—it transforms raw images into captioned visuals automatically, reducing hours of tedious labor to mere seconds.

What This Automation Does

When this workflow runs, it streamlines Sarah’s captioning task through a series of specific, automated steps:

  • Automatically downloads an image from a given URL — no manual saving needed.
  • Resizes the image to optimize it for AI processing, ensuring better caption generation.
  • Leverages Google’s Gemini AI model to analyze the image and generate a creative caption with a punny title, crafted to match the image content.
  • Calculates the precise placement and formatting for the caption text to be overlaid on the image aesthetically.
  • Uses the Edit Image node to overlay the AI-generated caption onto the original image, creating a polished final graphic.
  • Outputs a fully captioned image ready for publishing, eliminating manual editing and reducing errors.

This automation can save Sarah up to 4 hours weekly and consistently produce professional-quality captions that engage her audience and streamline editorial workflows.

Prerequisites ⚙️

  • n8n Account: You need an active n8n instance to build and run this workflow. Self-hosting is an option for advanced users.
  • Google Gemini (PaLM) API Credentials 🔑: Access to Google’s Gemini AI through the PaLM API to generate image captions.
  • HTTP Request Node Access 🔌: To fetch images via URLs.
  • Edit Image Node 📁: For resizing images and overlaying text captions.
  • Code Node ⚙️: To calculate text placement dynamically based on image size and text length.

Step-by-Step Guide

1. Trigger the Workflow Manually

Navigate to Triggers and add a Manual Trigger node named “When clicking ‘Test workflow’”. This node allows you to start the workflow on demand from n8n’s editor interface.

Expected Outcome: You will be able to run the workflow manually for testing or production use.

Common Mistake: Forgetting to set up a trigger node prevents the workflow from executing.

2. Download the Image with HTTP Request Node

After the trigger, add an HTTP Request node called “Get Image”. Configure the URL field with the image source, e.g., “https://images.pexels.com/photos/1267338/pexels-photo-1267338.jpeg?auto=compress&cs=tinysrgb&w=600”.

Expected Outcome: The node downloads the image binary to be used later in the workflow.

Visual Tip: You will see the image data in the node’s output under binary data.

Common Mistake: Using an invalid or non-direct image URL will cause the node to fail fetching the image.

3. Resize the Image for AI Processing

Add an Edit Image node named “Resize For AI”. Set operation to “resize” and dimensions to 512×512 pixels. This optimizes the image size for the AI model’s input requirements.

Expected Outcome: The image is resized, reducing processing time and improving caption accuracy.

Common Mistake: Resizing to incompatible dimensions might cause the AI model to produce poor captions.

4. Extract Image Info for Caption Positioning

Add another Edit Image node named “Get Info” with the operation set to “information”. This node extracts image dimensions needed for further positioning calculations.

Expected Outcome: Image size metadata is available for the code node calculations.

Common Mistake: Omitting image info extraction breaks the positioning calculations downstream.

5. Generate Caption Using Google Gemini AI

Insert the Image Captioning Agent node which leverages the Google Gemini Chat Model. This Langchain LLM Chain node takes the resized image binary as input and prompts the model to generate a caption title and text with context and creativity.

AI Prompt Example: “Generate a caption for this image. Provide a punny title describing who, when, where, and context.”

Expected Outcome: The model outputs a structured caption JSON with “caption_title” and “caption_text” fields.

Common Mistake: Incorrect API credentials or model selection will prevent caption generation.

6. Parse and Merge Caption Output

This workflow includes a Structured Output Parser node configured to parse the caption JSON from the AI model’s response. Then the output is merged back with the image info using two Merge nodes to combine the necessary data for caption overlay.

Expected Outcome: The caption text and image size data are unified for further processing.

Common Mistake: Parsing errors occur if the AI output deviates from the expected schema.

7. Calculate Caption Positioning Dynamically

Add a Code node named “Calculate Positioning” set to run once for each item. Use this JavaScript code snippet:

const { size, output } = $input.item.json;

const lineHeight = 35;
const fontSize = Math.round(size.height / lineHeight);
const maxLineLength = Math.round(size.width / fontSize) * 2;
const text = `"${output.caption_title}". ${output.caption_text}`;
const numLinesOccupied = Math.round(text.length / maxLineLength);

const verticalPadding = size.height * 0.02;
const horizontalPadding = size.width * 0.02;
const rectPosX = 0;
const rectPosY = size.height - (verticalPadding * 2.5) - (numLinesOccupied * fontSize);
const textPosX = horizontalPadding;
const textPosY = size.height - (numLinesOccupied * fontSize) - (verticalPadding / 2);

return {
  caption: {
    fontSize,
    maxLineLength,
    numLinesOccupied,
    rectPosX,
    rectPosY,
    textPosX,
    textPosY,
    verticalPadding,
    horizontalPadding,
  }
}

Expected Outcome: This node calculates where and how the caption rectangle and text should be positioned on the image dynamically.

Common Mistake: Running this node improperly or editing the code incorrectly can misplace the caption.

8. Overlay the Caption Text on the Image

Use another Edit Image node titled “Apply Caption to Image” with multiStep operations:

  • Draw a semi-transparent black rectangle at the bottom of the image.
  • Overlay the caption title and text with white font color and Arial typeface.

Expected Outcome: The final image includes a visually appealing caption positioned at the bottom.

Common Mistake: Incorrect font paths or colors can make the caption unreadable.

Customizations ✏️

  • Change Caption Style: In the “Apply Caption to Image” node, modify font color, font size, or background rectangle opacity to match your brand.
  • Use Different AI Model: Swap out the “Google Gemini Chat Model” with another supported LLM in Langchain to experiment with caption styles or languages.
  • Use Dynamic Image Sources: Replace the “Get Image” HTTP Request node URL to accept webhook input, enabling captions for user-submitted images.
  • Adjust Caption Position: Modify the “Calculate Positioning” code logic to place the caption at different parts of the image like top or center.
  • Add Watermarks: Extend the Edit Image node operations to include logos or copyright marks besides the caption for branding.

Troubleshooting 🔧

Problem: “Google Gemini API authentication failed.”
Cause: Incorrect API key or missing credentials.
Solution: Go to credential manager in n8n, verify API key for Google Gemini, and ensure the key has necessary permissions.

Problem: “Edit Image node does not output image.”
Cause: Input image binary missing or node misconfigured.
Solution: Confirm the previous node outputs binary and the Edit Image node is set to operate on the correct input.

Problem: “Caption parsing errors in Structured Output Parser.”
Cause: AI response format changed or schema mismatch.
Solution: Update the JSON schema example in the Structured Output Parser node to match the current AI output format.

Pre-Production Checklist ✅

  • Confirm Google Gemini API credentials are active and correctly configured.
  • Test HTTP Request node with accessible image URL.
  • Verify Edit Image node operations (resize, info, and multiStep) function as expected.
  • Run the workflow manually and check the caption appears on the output image.
  • Backup workflow before deploying to avoid losing customizations.

Deployment Guide

Activate the workflow in n8n by switching it from draft to active. Use the manual trigger or integrate a webhook trigger for automatic image captioning. Monitor execution logs in n8n to ensure smooth operation and error-free runs. Schedule the workflow if needed for batch processing images periodically.

Conclusion

By following this guide, you’ve built an efficient, AI-powered image captioning automation using n8n and Google Gemini. You’ve cut down hours spent creating captions manually and enhanced your content’s professionalism with dynamic, context-aware captions overlaid on images.

Next, consider extending this workflow to support batch image processing, incorporate multi-language captions, or integrate social media publishing nodes to automatically post captioned images.

This automation not only saves time but consistently produces engaging visuals that elevate your digital content strategy.

Promoted by BULDRR AI

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation in n8n (Beginner Guide)

A complete beginner guide to building an AI-powered SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free