Google Gemini & n8n: Auto Caption Your Images with AI

This workflow uses Google Gemini’s vision model within n8n to generate catchy captions for images automatically. It overlays captions on images, saving time and enhancing visual content for blogs, marketing, and social media.
manualTrigger
httprequest
editImage
+6
Workflow Identifier: 1735
NODES in Use: Manual Trigger, HTTP Request, Edit Image, Chain LLM, Code, Merge, Sticky Note, LangChain Google Gemini, Structured Output Parser
Auto caption images with Google Gemini in n8n

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

What This Automation Does

This workflow takes an image URL, resizes the image, creates a caption using Google Gemini AI, calculates where the caption fits best, and then adds the caption directly on the image.
It helps users save many hours by automating writing and placing captions on images.

The output is a ready-to-use image with a clear, creative caption layered on it.


Who Should Use This Workflow

If you spend a lot of time writing captions for photos on social media or marketing, this workflow is for you.
It is good for people who want reliable and nicely styled captions automatically added on their images.

Anyone managing brands, blogs, or social accounts with images can benefit.


Tools and Services Used

These elements combine to create a workflow that downloads, processes, captions, and styles images automatically.


Inputs, Processing Steps, and Output

Inputs

  • Image URL (example: a free Pexels stock image).
  • Google PaLM API Key for access to Gemini Chat Model.

Processing Steps

  • Download the image from URL using the Get Image node.
  • Resize the image to 512 by 512 pixels with Resize For AI node.
  • Extract image size and info with Get Info node.
  • Send the image to Google Gemini via Image Captioning Agent node to generate a caption with title and description.
  • Parse the AI reply to separate caption title and text in Structured Output Parser node.
  • Merge image data and AI output with Merge Image & Caption node.
  • Calculate caption font size and position in the Calculate Positioning code node using JavaScript.
  • Merge all data in Merge Caption & Positions node.
  • Overlay caption with background rectangle and text on the image in Apply Caption to Image using Edit Image node.

Output

Final image with a white, readable caption placed near the bottom on a semi-transparent background.


Beginner Step-by-Step: How to Use This Workflow in Production

Import the Workflow

  1. Click the Download button on this page to save the workflow JSON file.
  2. Open n8n editor where workflows are created.
  3. Use the menu option “Import from File” and select the file downloaded.

Configure Credentials and Parameters

  1. Add or update Google PaLM API Key credentials inside n8n credentials manager.
  2. Check the Get Image node and replace the example image URL with any public image URL desired.
  3. If needed, update IDs, emails, channels, or folder fields in nodes to match your use case.
  4. Make sure font path and styles in Apply Caption to Image node meet your branding or preferences.

Run and Test

  1. Click “Execute Workflow” to run manually and observe each node’s output.
  2. Check the final image output for correct caption placement and styling.
  3. Correct any errors found during tests by checking nodes parameters and code.

Activate for Production

  1. Replace the manual trigger with real triggers such as webhook or scheduled triggers as fits workflow usage.
  2. Turn workflow activation on to run automatically.
  3. Monitor executions on the n8n dashboard for problems or API usage limits.

Using this way removes building difficulties and gets you a working solution quickly.


Code to Calculate Caption Positioning

The JavaScript code in the Calculate Positioning node figures out where to put the caption and the font size based on the image size.
It makes sure the text fits and stays readable.

const { size, output } = $input.item.json;

const lineHeight = 35;
const fontSize = Math.round(size.height / lineHeight);
const maxLineLength = Math.round(size.width/fontSize) * 2;
const text = `"${output.caption_title}". ${output.caption_text}`;
const numLinesOccupied = Math.round(text.length / maxLineLength);

const verticalPadding = size.height * 0.02;
const horizontalPadding = size.width * 0.02;
const rectPosX = 0;
const rectPosY = size.height - (verticalPadding * 2.5) - (numLinesOccupied * fontSize);
const textPosX = horizontalPadding;
const textPosY = size.height - (numLinesOccupied * fontSize) - (verticalPadding/2);

return {
 caption: {
 fontSize,
 maxLineLength,
 numLinesOccupied,
 rectPosX,
 rectPosY,
 textPosX,
 textPosY,
 verticalPadding,
 horizontalPadding,
 }
}

Customization Ideas

  • Change the image URL to any publicly accessible image in the Get Image node.
  • Use a different AI model to generate captions by replacing the Image Captioning Agent node with other AI nodes available like OpenAI.
  • Edit the caption style in the Apply Caption to Image node by changing font, colors, or background transparency.
  • Modify the prompt in the AI node to change caption tone or detail.

Troubleshooting Common Problems

  • API authentication error in Google Gemini Chat Model:
    Check if Google PaLM API Key is correct and not expired. Refresh credentials in n8n.
  • Image does not show or caption is missing:
    Verify connections between merge nodes and the edit image node. Confirm font paths and drawing steps.
  • Caption text cutoff or overlapping:
    Adjust font size and positions by editing JavaScript in the Calculate Positioning node.

Pre-Production Checklist

  • Check if Google PaLM API Key is valid and tested.
  • Make sure image URL is accessible.
  • Run the workflow manually before activating.
  • Back up workflow JSON before changes.
  • Review output images for caption quality.

Deployment Guide

After successful tests, connect the manual trigger node to other triggers as needed, like webhooks or schedules.
Activate the workflow to run automatically.

Watch workflow runs in n8n dashboard for errors.
Increase API call limits on Google Cloud if processing many images.

Consider self-host n8n for full control on running long or heavy workflows.


Summary

→ Saves hours spent on writing image captions.
→ Automatically downloads, resizes, captions, and styles images.
→ Uses Google Gemini AI for creative, pun-filled captions.
→ Adds captions on images so they are ready for social media.
→ Easy to set up in n8n by importing and adding your API keys.


Auto caption images with Google Gemini in n8n

Visit through Desktop to Interact with the Workflow.

Frequently Asked Questions

Yes, replace the Image Captioning Agent node with AI nodes that support image input and text caption output.
Yes, each Google Gemini AI call uses API usage credits from your Google PaLM plan.
Check node connections, font paths, and drawing steps in the Apply Caption to Image node.
Image data flows within n8n and API calls are encrypted, but sensitive images should be handled carefully.

Promoted by BULDRR AI

Related Workflows

Automate Twist Channel Creation and Messaging with n8n

This workflow automates creating and updating a channel in Twist and sending a personalized message to specific users. It eliminates manual setup errors and saves time managing Twist communications.

Automate Ideogram Image Generation with Google Sheets & Gmail

This workflow automates graphic design image generation via Ideogram AI, storing image data in Google Sheets and Google Drive, with email alerts via Gmail. It saves designers hours by automating image creation, remixing, review, and record-keeping.

Automate IT Support with Slack and OpenAI in n8n

Streamline IT support by automating Slack message handling using n8n and OpenAI. This workflow handles Slack DMs, filters bots, queries a Confluence knowledge base, and delivers AI-generated responses, improving support efficiency and response time.

Automate Crypto Analysis with CoinMarketCap & n8n AI Agent

Discover how this unique n8n workflow leverages CoinMarketCap’s multi-agent AI to deliver precise, real-time cryptocurrency insights directly via Telegram. Manage crypto data analysis efficiently with automated multi-source API integration.

Automate Gumroad to Beehiiv Subscriber Sync with n8n

Learn how to automatically add new Gumroad sales customers as Beehiiv newsletter subscribers using n8n automation. This workflow saves time by syncing sales data to Google Sheets CRM and notifying your Telegram channel instantly.

Generate On-Brand Blog Articles Using n8n and OpenAI

This workflow automates the creation of on-brand blog articles by analyzing existing company content using n8n and OpenAI. It extracts article structures and brand voice to produce consistent draft articles, saving significant content creation time.
1:1 Free Strategy Session
Your competitors are already automating. Are you still paying for it manually?

Do you want to adopt AI Automation?

Every hour your team does repetitive work, you're burning real money.
While you wait, faster businesses are cutting costs and moving quicker.
AI and automations aren't the future anymore — they're the present.

Book a live 1-on-1 session where we show you exactly which of your daily tasks can be automated — and what it’s costing you not to.