1. Opening Problem Statement
Meet Jenny, the manager of a petting zoo who maintains a website with photos of the animals and events. Jenny spends hours manually tagging images to highlight important subjects—like rabbits—in her photos for promotional materials. Every time she updates the site, she repeats this tedious task, wasting valuable time and risking errors in object labeling. Manually drawing bounding boxes over multiple images leads to inconsistencies and slows down her marketing efforts, resulting in lost opportunities and increased frustration.
This is where automation can save Jenny a considerable amount of effort by letting AI do the heavy lifting. Specifically, using the latest capabilities of Google’s Gemini 2.0 multimodal model, she can detect objects in images by simply prompting what to look for, and automatically visualize these detections on the photos.
2. What This Automation Does
This n8n workflow automates image object detection using Google Gemini 2.0’s prompt-based bounding box feature. Here’s what happens when you run it:
- Downloads a test image of a petting zoo with multiple rabbits.
- Uses the Gemini 2.0 API to prompt detection of all rabbits in the image.
- Retrieves bounding box coordinates normalized to a 0-1000 scale from the AI response.
- Scales the normalized coordinates to fit the actual image dimensions.
- Draws colored bounding boxes over the detected rabbits in the original image.
- Outputs the image with visual bounding boxes enabling quick validation and further use.
This eliminates hours of manual image annotation, drastically reduces human error, and empowers users like Jenny to handle various complex detection tasks just by adjusting the prompt.
3. Prerequisites ⚙️
- n8n Account: You will need access to an n8n automation platform, either via n8n cloud or self-hosted options like Hostinger.
- Google Gemini (PaLM) API account 🔑: API credentials for Google Gemini 2.0 to use the object detection model.
- Image URL source 🔌: A publicly accessible image URL to test the workflow, here we use a petting zoo photo.
4. Step-by-Step Guide
Step 1: Trigger the Workflow Manually
In n8n, look for the Manual Trigger node (When clicking ‘Test workflow’). Click “Execute Workflow” to start the process on-demand. This is useful for testing and debugging your automation. You should see the node execute successfully.
Step 2: Download the Test Image
Next, the workflow uses the HTTP Request node (Get Test Image) to fetch an image from a specified URL. In this case, the URL is:
https://www.stonhambarns.co.uk/wp-content/uploads/jennys-ark-petting-zoo-for-website-6.jpg
Ensure the URL returns a valid image. After this step runs, the image data will be available in binary format to subsequent nodes.
Step 3: Extract Image Information
The Edit Image node (Get Image Info) retrieves metadata about the downloaded image, especially the width and height in pixels. This information is crucial for later scaling of bounding box coordinates.
Step 4: Use Google Gemini 2.0 for Object Detection
The workflow calls the HTTP Request node (Gemini 2.0 Object Detection) to send the image to the Google Gemini API. It posts a JSON request with a prompt: “I want to see all bounding boxes of rabbits in this image.”
The request includes the image data as inline base64 encoded binary. The API responds with detected objects including the bounding box coordinates normalized on a 0-1000 scale.
Step 5: Set Image Dimensions and Coordinates Variables
The Set node (Get Variables) assigns variables for the coordinates array, as well as the image width and height, to be used in the next calculations.
Here, we parse the JSON response’s bounding box data and keep the width and height from the image info.
Step 6: Scale Normalized Coordinates to Actual Pixels
Using the Code node (Scale Normalised Coords), a JavaScript snippet recalculates bounding box areas scaled to the original image dimensions:
const { coords, width, height } = $input.first().json;
const scale = 1000;
const scaleCoordX = (val) => (val * width) / scale;
const scaleCoordY = (val) => (val * height) / scale;
const normalisedOutput = coords
.filter(coord => coord.box_2d.length === 4)
.map(coord => {
return {
xmin: coord.box_2d[1] ? scaleCoordX(coord.box_2d[1]) : coord.box_2d[1],
xmax: coord.box_2d[3] ? scaleCoordX(coord.box_2d[3]) : coord.box_2d[3],
ymin: coord.box_2d[0] ? scaleCoordY(coord.box_2d[0]) : coord.box_2d[0],
ymax: coord.box_2d[2] ? scaleCoordY(coord.box_2d[2]) : coord.box_2d[2],
}
});
return {
json: {
coords: normalisedOutput
},
binary: $('Get Test Image').first().binary
}This step converts the AI’s relative coordinates into pixel values matching the actual photo size.
Step 7: Draw Bounding Boxes on the Image
The final Edit Image node (Draw Bounding Boxes) receives the scaled coordinates and draws colorful bounding boxes around each detected rabbit. The node is configured with multiple draw operations specifying start and end X/Y pixels and color code #ff00f277. The output is an image with visible highlights around target objects.
Step 8: Review the Output
You can add further nodes to save or share the resulting image. In this demo, the workflow ends after drawing the bounding boxes, but you can easily extend it to upload the image to cloud storage or send via email.
5. Customizations ✏️
Customize the Object Detection Prompt
In the Gemini 2.0 Object Detection HTTP node, change the prompt text inside the JSON body from “all bounding boxes of rabbits” to any other subject you want to detect, such as “cars,” “dogs,” or “people with umbrellas.” This allows flexible context-based image detection.
Adjust the Image Source URL
Update the HTTP Request node (Get Test Image) to fetch a different image by modifying the URL parameter. Useful to test different photos or your own data.
Modify Bounding Box Appearance
Within the Draw Bounding Boxes node, you can change the color or add more draw operations to highlight additional objects. Adjust stroke thickness or corner radius if supported for better visuals.
Extend to Save or Send Results
Add a cloud storage node (e.g., Google Drive, Dropbox) or email node (e.g., Gmail) after the drawing step to automatically archive or share processed images.
Increase Detection Accuracy with Different Prompts
Experiment with different prompts or multiple API calls for detecting complex or overlapping objects to improve detection robustness.
6. Troubleshooting 🔧
Problem: API returns no bounding boxes
Cause: The prompt is unclear or the image content does not match requested objects.
Solution: Refine your prompt for more specific request (e.g., “all rabbits” vs “all animals”) and verify the image content is appropriate to detect those objects.
Problem: Coordinates do not align with image
Cause: Image dimensions used for scaling are incorrect or not updated.
Solution: Confirm the Edit Image node (Get Image Info) correctly extracts width and height, and the scaling logic in the Code node (Scale Normalised Coords) matches those values.
Problem: Workflow fails at API call step
Cause: Invalid or expired Google Gemini API credentials.
Solution: Reconfigure your Google Palm API credentials in n8n under the Gemini 2.0 HTTP Request node settings.
7. Pre-Production Checklist ✅
- Verify your Google Gemini API credentials are active and permissions granted.
- Test the image URL to confirm it returns a valid image accessible without authentication.
- Run the workflow manually and watch each node output in n8n editor to confirm data flow.
- Validate the scaling code outputs reasonable bounding box coordinates compared to the image size.
- Prepare backup workflow versions before deploying with custom images or prompts.
8. Deployment Guide
After testing, activate your workflow by setting triggers as needed (manual or timed triggers). The manual trigger included suits simple on-demand runs.
For ongoing use, integrate other trigger nodes like Webhooks or Schedules to automate detection for new images uploaded to your systems.
Monitor recent workflow executions in n8n to catch any API changes or errors.
9. FAQs
Can I use images stored locally instead of URLs?
Yes, if you can upload images to a web-accessible location or use n8n’s binary data handling to pass them directly into the HTTP request node for the Gemini API.
Does this consume my Google Gemini API credits?
Yes, each API call to Gemini 2.0 for object detection uses your quota according to Google’s pricing and limits.
Is my data secure when using this API?
Google Gemini API uses secure HTTPS connections. Always safeguard your API keys and avoid exposing sensitive images unnecessarily.
Can this handle detection in complex images with many objects?
While Gemini 2.0 is advanced, very crowded scenes might require multiple passes or more specific prompts for best results.
10. Conclusion
By following this guide, you’ve set up an advanced image object detection workflow using Google Gemini 2.0 inside n8n. You automated tagging of rabbit objects from images, cutting down hours of manual labor and boosting accuracy.
This approach scales to many detection scenarios just by changing prompts in the HTTP node, empowering you to create responsive AI-assisted image processing pipelines.
Next, consider adding automated storage and sharing of annotated images or linking this with real-time content updates on websites or marketing platforms.
Keep experimenting with different object detection prompts and image sources to unlock new creative automation possibilities! Happy automating!