Google Gemini & n8n: Auto Caption Your Images with AI

This workflow uses Google Gemini’s vision model within n8n to generate catchy captions for images automatically. It overlays captions on images, saving time and enhancing visual content for blogs, marketing, and social media.
manualTrigger
httprequest
editImage
+6
Workflow Identifier: 1735
NODES in Use: Manual Trigger, HTTP Request, Edit Image, Chain LLM, Code, Merge, Sticky Note, LangChain Google Gemini, Structured Output Parser

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

Opening Problem Statement

Meet Claire, a social media manager who spends hours every week captioning images for her brand’s posts. Claire often struggles with finding the right words that capture the essence of an image and engage her audience. This task is repetitive, time-consuming—often consuming more than 5 hours weekly—and prone to inconsistent quality. She wishes for a way to automate this task so she can focus more on strategy and less on manual caption writing.

This very challenge is precisely what this n8n workflow solves by combining cutting-edge AI from Google’s Gemini model with intelligent image editing to generate and overlay captions automatically on any input image.

What This Automation Does

When you run this workflow, it:

  • Downloads an image automatically from a URL (in this example, a Pexels stock photo).
  • Resizes the image to a standard 512×512 size to optimize it for AI processing.
  • Uses the Google Gemini Chat Model to generate a creative, pun-filled caption title and descriptive text for the image.
  • Calculates optimal caption placement on the image with precise position and font size using a custom JavaScript code node.
  • Overlays the generated caption onto the image using the built-in Edit Image node in n8n.
  • Outputs a captioned image ready for publishing or further use.

Altogether, this automation can save Claire (and you) hours each week by fully automating image caption creation and styling, ensuring consistent, high-quality captions every time.

Prerequisites ⚙️

  • n8n account with access to workflow editing. (Self-hosting option possible via Hostinger here.)
  • Google Gemini API credentials configured in n8n (Google PaLM API account).
  • Basic familiarity with n8n nodes such as HTTP Request, Edit Image, Code, and LangChain AI nodes.

Step-by-Step Guide

1. Trigger the Workflow Manually

Start by clicking the “Test workflow” button inside n8n. This uses the Manual Trigger node to initiate the automation. You’ll see the workflow activate, and the following nodes execute in sequence.

2. Download the Image Using HTTP Request Node

Navigate to the node named Get Image. This node uses an HTTP GET request to fetch the image from https://images.pexels.com/photos/1267338/pexels-photo-1267338.jpeg?auto=compress&cs=tinysrgb&w=600.
You can replace this URL with any publicly accessible image URL. After executing, you will have the image binary data loaded into the workflow.

3. Resize the Image for AI Processing

Next, the Resize For AI node modifies the downloaded image to 512×512 pixels. This standardization ensures the AI model performs optimally. Look inside the node parameters to confirm the width and height fields set to 512.

4. Gather Image Metadata

The Get Info node performs operations to extract image metadata, such as dimensions. This info is later used for helping position captions correctly on the image.

5. Generate a Caption Using Google Gemini Chat Model

This is the core AI functionality. The workflow sends the resized image binary to the Image Captioning Agent node, which is a Chain LLM (LangChain Large Language Model) node configured to request a caption.

The agent formats a detailed prompt explaining it should create a caption with a punny title including details like who, when, where, and context for the image. The Google Gemini Chat Model then produces structured caption data.

6. Parse AI Output for Structured Caption

Though not explicitly connected in this workflow snippet, the node Structured Output Parser is set up to parse the AI-generated JSON caption into clear fields: caption_title and caption_text.

7. Combine Image and Caption Data

The Merge Image & Caption node joins the original image metadata and the generated caption text into a single data set for further processing.

8. Use Code Node to Calculate Caption Position

The Calculate Positioning node runs JavaScript to:

  • Determine font size based on image height.
  • Calculate how many lines the caption will occupy.
  • Determine x/y coordinates for caption placement near the bottom of the image.

This makes sure the caption fits nicely without covering important parts of the image.

The JavaScript snippet to use (copy/paste if customizing) is:

const { size, output } = $input.item.json;

const lineHeight = 35;
const fontSize = Math.round(size.height / lineHeight);
const maxLineLength = Math.round(size.width/fontSize) * 2;
const text = `"${output.caption_title}". ${output.caption_text}`;
const numLinesOccupied = Math.round(text.length / maxLineLength);

const verticalPadding = size.height * 0.02;
const horizontalPadding = size.width * 0.02;
const rectPosX = 0;
const rectPosY = size.height - (verticalPadding * 2.5) - (numLinesOccupied * fontSize);
const textPosX = horizontalPadding;
const textPosY = size.height - (numLinesOccupied * fontSize) - (verticalPadding/2);

return {
 caption: {
 fontSize,
 maxLineLength,
 numLinesOccupied,
 rectPosX,
 rectPosY,
 textPosX,
 textPosY,
 verticalPadding,
 horizontalPadding,
 }
}

9. Combine Caption Position Info with Image Data

The Merge Caption & Positions node combines the calculated caption positioning data with the image and caption details, readying everything for drawing.

10. Overlay Caption on Image

Finally, the Apply Caption to Image node uses the Edit Image node’s multi-step drawing abilities to:

  • Draw a semi-transparent rectangle at the bottom of the image to improve caption readability.
  • Render the caption title and text over the rectangle in white Arial font sized appropriately.

Check the parameters for font path, font size, and colors to customize the look.

Customizations ✏️

  • Change Image Source URL: In the Get Image node, update the URL parameter to any image you want automatically captioned.
  • Use Different AI Model: Swap out the Google Gemini Chat Model node for another AI node like OpenAI or Stable Diffusion with compatible image caption prompts.
  • Adjust Caption Style: Modify fonts, colors, or the rectangle’s transparency in Apply Caption to Image node for branding consistency.
  • Alter Caption Template: Edit the Chain LLM prompt in Image Captioning Agent node to change caption style or detail level.

Troubleshooting 🔧

  • Problem: “API authentication error in Google Gemini Chat Model.”

    Cause: Invalid or expired Google PaLM API credentials.

    Solution: Go to n8n Credentials, refresh the Google Gemini API token, and retest the workflow.
  • Problem: “Image not appearing or caption overlay missing.”

    Cause: Incorrect merging of nodes or drawing parameters.

    Solution: Verify connections between Merge Image & Caption and Merge Caption & Positions nodes. Check font paths and drawing steps in Apply Caption to Image node.
  • Problem: “Caption text is cut off or overlaps image content.”

    Cause: Incorrect font size or positioning calculations.

    Solution: Adjust the JavaScript code in the Calculate Positioning node for font size or padding.

Pre-Production Checklist ✅

  • Confirm Google PaLM API credentials are active and tested.
  • Verify the image URL is valid and publicly accessible.
  • Test the workflow step-by-step using manual trigger before deployment.
  • Backup your workflow JSON before making modifications.
  • Check the final image output for caption visibility and quality.

Deployment Guide

Once tested, activate the workflow in n8n by setting the manual trigger to a suitable real-world trigger if desired (e.g., webhook, scheduled trigger).

Monitor your workflow executions via n8n’s dashboard for errors or exceptions. Adjust API rate limits in Google Cloud if processing many images.

FAQs

  • Q: Can I use another AI model instead of Google Gemini?

    A: Yes, replace the LangChain Google Gemini node with other AI nodes compatible with image input and caption output.
  • Q: Does generating captions consume API credits?

    A: Yes, each AI call will consume API usage on your Google PaLM plan.
  • Q: Is the image data secure?

    A: The data flows within n8n and API calls are encrypted, but treat sensitive images cautiously.

Conclusion

By following this tutorial, you have built a powerful image captioning automation with n8n and Google Gemini. This workflow saved Claire hours weekly with consistent, creative captions auto-applied to images—perfect for marketing, blogs, or social media content.

Next, consider enhancing this by integrating batch image processing, adding language translation to captions, or automatically publishing images to social channels. Remember, automation frees time for creativity.

Enjoy your newfound productivity! 👏

Promoted by BULDRR AI

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation in n8n

A complete beginner guide to building an AI-powered SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free