Automate Etsy Data Mining with Bright Data & Google Gemini

This automation workflow scrapes Etsy product data using Bright Data’s Web Unlocker and enhances the extraction with Google Gemini AI, streamlining complex paginated data mining and product info retrieval seamlessly. Perfect for sellers or analysts needing accurate, real-time Etsy insights.
httpRequest
lmChatGoogleGemini
manualTrigger
+8
Workflow Identifier: 1923
NODES in Use: Manual Trigger, Sticky Note, Set, HTTP Request, Google Gemini Chat Model, Split Out, Split In Batches, Information Extractor, ReadWrite File, Function, OpenAI Chat Model

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

What This Automation Does

This workflow takes a search URL for products on Etsy and scrapes multiple pages of results automatically using Bright Data’s Web Unlocker API.
Then it uses Google Gemini AI to pull out product info like images, names, links, brands, and prices into clean data.
Finally, it sends summary info to a webhook and saves detailed JSON files locally.
This cuts the time for Etsy research from many hours to just minutes and keeps the data accurate.


Tools and Services Used

  • n8n automation platform: For building and running the workflow.
  • Bright Data Web Unlocker API: To bypass Etsy’s scraping blocks and get raw page content.
  • Google Gemini AI (via Google PaLM API): To extract structured product information from raw HTML or markdown.
  • Optional – OpenAI GPT-4 API: Alternative AI model for data extraction if preferred.


Inputs, Processing Steps, and Outputs

Inputs

  • Etsy search URL with query and pagination (e.g., “wall art for mum” sorted by newest).
  • Bright Data Web Unlocker API credentials.
  • Google Gemini API credentials.
  • Optional OpenAI API credentials for AI extraction.

Processing Steps

  • POST Etsy search URL and related data to Bright Data API to retrieve raw page content.
  • Use Google Gemini AI to find all pagination links from the first page.
  • Loop through each pagination URL, scrape content again with Bright Data API.
  • Use AI to extract structured product details from each page’s content.
  • Send summaries to a webhook for external notification.
  • Encode and save all detailed data to local disk as JSON files.

Outputs

  • Structured JSON files of Etsy product data stored locally for offline use.
  • Real-time webhook notifications carrying product summary data.


Beginner Step-by-Step: How to Use This Workflow in n8n

Importing the Workflow

  1. Download the provided workflow JSON file using the Download button on this page.
  2. Open the n8n editor where you want to run the workflow.
  3. Click “Import from File” and select the downloaded workflow file.

Configuring Credentials

  1. Go to each node that requires external access, like the HTTP Request nodes.
    Add or update Bright Data API credentials with your valid API Key.
  2. Enter Google Gemini API keys where the Google Gemini nodes ask for credentials.
  3. If using OpenAI alternative nodes, set the OpenAI API Key in those nodes.
  4. Update any placeholder URLs, such as the webhook URL, with your own target endpoint.

Testing and Activation

  1. Run the workflow once manually by triggering the Manual Trigger node to check for errors.
  2. Watch the output of each node to confirm data passes correctly.
  3. Fix any errors related to credentials or file paths that appear.
  4. When tests succeed, activate the workflow to run automatically or on schedule.
  5. Ensure your n8n setup has permission to write to the local disk if saving JSON files.

You can setup scheduled runs using a Cron node if regular updates are needed.

For running this on your own server or VPS, consider self-host n8n resources.


Why This Workflow Exists

Manually copying Etsy product data from pages is slow and error-prone.
This workflow automates that task so the user spends less time collecting and more time creating or selling.
It works better because APIs avoid blocks and use AI to clean the messy data.
The user gets fresh product info faster and can make better choices about their shop.


How The Workflow Works

The first input is a search URL capturing the product keyword and page.
The workflow sends this URL to Bright Data’s API to get the page’s raw content even if Etsy tries to block the scraper.
Then, Google Gemini reads that raw content to find all pagination links on the page.
The system loops over all pagination URLs, sending each one again to Bright Data’s API for content retrieval.
For each page content, AI models extract useful product information and parse results into structured data.
This structured data can then be pushed out via webhooks or saved locally as JSON files the user can open later.


Common Issues and How To Fix Them

  • HTTP Request Fails 401 Unauthorized: Check that Bright Data API Key in the HTTP Request node is correct and not expired.
  • AI Extraction Returns Empty Data: Make sure raw HTML text is passed fully and JSON schemas in AI Information Extractor nodes match exactly.
  • File Write Permission Errors: Confirm the ReadWrite File node targets a writable folder and n8n process has write permission.


Customization Ideas

  • Change the search keywords in the Set node by editing the Etsy URL to track other products.
  • Replace Google Gemini AI nodes with OpenAI GPT-4 nodes for different extraction results.
  • Modify file output format in the ReadWrite File node to save CSV or XML instead of JSON.
  • Update webhook URLs in the HTTP Request notification node to call any desired service or messaging app.
  • Extend pagination scraping by increasing loop limits or batch sizes to gather more pages.


Summary and Results

✓ The workflow quickly collects Etsy product data from many pages automatically.
✓ It turns blocked web requests into structured JSON data.
✓ Webhook notifications keep the user updated in real time.
→ Research time drops from hours to minutes.
→ Data accuracy improves over manual copying.
→ The user can better watch competitors and update their own shop.


Frequently Asked Questions

HTTP 401 errors happen when Bright Data API credentials are missing, wrong, or expired in the HTTP Request node.
Empty or wrong AI output is caused by not passing full raw HTML or not matching the JSON schema correctly in the Information Extractor nodes.
File write errors occur if the file path is not writable or n8n’s process does not have permission; update paths and check system rights.
Yes, the workflow can run on a private server. Using self-host n8n helps setup secure and stable hosting.

Promoted by BULDRR AI

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation Workflows in n8n

A complete beginner guide to building an AI SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free