Vision-Based AI Scraper with Python, Gemini & Google Sheets

This workflow automates product data extraction using vision-based AI agents powered by Google Gemini and ScrapingBee. It solves the time-consuming manual scraping by converting screenshots into structured product data stored in Google Sheets, improving accuracy and efficiency.
manualTrigger
agent
googleSheets
+9
Workflow Identifier: 1997
NODES in Use: manualTrigger, googleSheets, set, httpRequest, agent, lmChatGoogleGemini, toolWorkflow, outputParserStructured, splitOut, markdown, executeWorkflowTrigger, stickyNote
Automate data extraction with n8n and Google Gemini

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

What this workflow does

This workflow gets product details from many e-commerce webpages automatically. It solves the problem of manual copying and mistakes in price and brand information collection. The workflow uses AI to read screenshots and HTML to get product names, prices, brands, and promotions. The result is clean, organized data saved in Google Sheets for easy analysis.

This helps save many hours of work and improves the quality of competitor data.


Who should use this workflow

This is good for anyone who needs to track product info on many websites. Especially helpful for marketing teams, price watchers, and analysts who want fast and accurate data without typing or copying manually.

You do not need deep technical skill but should know basic n8n workflow operation.


Tools and services used in this workflow

  • Google Sheets: Stores product URLs and saves scraped product data.
  • ScrapingBee API: Captures full-page screenshots and raw HTML of product pages.
  • Google PaLM API (Google Gemini): Analyzes screenshots with vision AI to extract product details.
  • n8n platform: Runs the automation workflow connecting services and processing data.

Inputs, processing steps, and outputs

Inputs

  • A list of product URLs from a Google Sheet named “List of URLs”.
  • API credentials for ScrapingBee and Google PaLM.

Processing Steps

  • Read product page URLs from Google Sheets.
  • Send URLs to ScrapingBee to get full-page screenshots.
  • Use Google Gemini AI model to read screenshots and extract product information.
  • If screenshot data is incomplete, fallback to fetch page HTML via ScrapingBee and parse with AI.
  • Parse AI output into structured JSON with product fields.
  • Split the JSON array into individual product items.
  • Append parsed product details into Google Sheets “Results” tab.

Outputs

  • Clean, structured product data including title, price, brand, promotions saved in Google Sheets.
  • Data ready for analysis and reporting with minimal manual work.

Beginner step-by-step: How to use this workflow in n8n for production

Step 1: Download and import the workflow

  1. Download the workflow file using the Download button on this page.
  2. Open the n8n editor where you want to use this workflow.
  3. Use the Import from File feature in n8n to upload this workflow JSON file.

Step 2: Add credentials and configure nodes

  1. Add ScrapingBee API key under credentials in n8n.
  2. Add Google PaLM API credentials for the Google Gemini model.
  3. Provide Google Sheets service account credentials with access to the correct spreadsheet.
  4. Update the Google Sheets node with the correct document ID for the URLs sheet and the results tab if needed.
  5. If required, update any email addresses or channels in notifications or sub-workflows.

Step 3: Test the workflow

  1. Use the manual trigger Manual Trigger (When clicking ‘Test workflow’) to run the workflow once.
  2. Check the workflow logs and Google Sheets results to make sure data is fetched and saved as expected.

Step 4: Activate for production use

  1. After confirming test works, activate the workflow in n8n.
  2. Optionally replace the manual trigger with a time trigger to run daily or weekly.
  3. Monitor execution and errors regularly to ensure consistent data flow.

For users running on their own server, consider using self-host n8n for more control and reliability.


Edge cases and failure handling

The workflow uses a fallback method if the AI cannot extract data from screenshots. It fetches the HTML version of the page and retries AI parsing there. This helps catch missing or unclear info.

If API keys are wrong or expired, the workflow will stop. Check credentials regularly.

Google Sheets formatting mistakes like mismatched columns may cause data to save incorrectly. Make sure sheet columns match expected fields exactly.


Customization ideas

  • Change fields extracted by updating the JSON schema in the Structured Output Parser. For example, add product ratings or stock availability.
  • Use different AI models compatible with LangChain instead of Google Gemini if wanted for other AI behavior.
  • Set up automated triggers to run scraping regularly without manual intervention.
  • Add filtering nodes to only scrape certain domains or categories based on URL patterns.
  • Capture only relevant screenshot areas for focused AI reading and reduced API usage.

Summary of benefits and results

✓ Saves many hours of manual scraping work
✓ Improves accuracy and consistency of product competitor data
✓ Produces structured, detailed product info for easy analysis
✓ Uses AI vision with screenshot and HTML fallback for better data collection
✓ Integrates with Google Sheets for convenient data storage and reporting

→ Data is ready for pricing strategies, competitive analysis, and market insights faster
→ Automation reduces errors and manual effort in e-commerce data gathering


Automate data extraction with n8n and Google Gemini

Visit through Desktop to Interact with the Workflow.

Frequently Asked Questions

The workflow sends full-page screenshots to the Google Gemini AI model via Google PaLM API. The AI reads the image to find product titles, prices, brands, and promotions and returns structured data.
If screenshot data is incomplete, the workflow fetches the page HTML from ScrapingBee and sends that text to the AI as a fallback. This helps extract missing details using HTML parsing.
Scraped product information is parsed into JSON and then appended as rows in a ‘Results’ tab in Google Sheets. This makes the data easy to access for reports or analysis.
Download the workflow file from the page, import it into n8n using ‘Import from File,’ add all required API keys and credentials, update Google Sheets IDs if necessary, run a manual test, then activate it for production use.

Promoted by BULDRR AI

Related Workflows

Automate Twist Channel Creation and Messaging with n8n

This workflow automates creating and updating a channel in Twist and sending a personalized message to specific users. It eliminates manual setup errors and saves time managing Twist communications.

Automate Ideogram Image Generation with Google Sheets & Gmail

This workflow automates graphic design image generation via Ideogram AI, storing image data in Google Sheets and Google Drive, with email alerts via Gmail. It saves designers hours by automating image creation, remixing, review, and record-keeping.

Automate IT Support with Slack and OpenAI in n8n

Streamline IT support by automating Slack message handling using n8n and OpenAI. This workflow handles Slack DMs, filters bots, queries a Confluence knowledge base, and delivers AI-generated responses, improving support efficiency and response time.

Automate Crypto Analysis with CoinMarketCap & n8n AI Agent

Discover how this unique n8n workflow leverages CoinMarketCap’s multi-agent AI to deliver precise, real-time cryptocurrency insights directly via Telegram. Manage crypto data analysis efficiently with automated multi-source API integration.

Automate Gumroad to Beehiiv Subscriber Sync with n8n

Learn how to automatically add new Gumroad sales customers as Beehiiv newsletter subscribers using n8n automation. This workflow saves time by syncing sales data to Google Sheets CRM and notifying your Telegram channel instantly.

Generate On-Brand Blog Articles Using n8n and OpenAI

This workflow automates the creation of on-brand blog articles by analyzing existing company content using n8n and OpenAI. It extracts article structures and brand voice to produce consistent draft articles, saving significant content creation time.
1:1 Free Strategy Session
Your competitors are already automating. Are you still paying for it manually?

Do you want to adopt AI Automation?

Every hour your team does repetitive work, you're burning real money.
While you wait, faster businesses are cutting costs and moving quicker.
AI and automations aren't the future anymore — they're the present.

Book a live 1-on-1 session where we show you exactly which of your daily tasks can be automated — and what it’s costing you not to.