Opening Problem Statement
Meet Sarah, an Etsy shop owner specializing in unique wall art designs. Sarah wants to keep up with market trends by mining competitor product data on Etsy. However, manually browsing through pages, copying product details, and analyzing offerings costs her at least 5-6 hours weekly — time she could better spend creating new art. Worse yet, manual scraping often misses pages or yields inconsistent data, sabotaging her strategic decisions.
Sarah needs a solution to automate this tedious and error-prone Etsy data mining task. She wants real-time, structured insights with minimal manual effort — a smart workflow that extracts paginated product data accurately across search queries and summarizes the key product info swiftly.
What This Automation Does
This workflow automates Etsy product data extraction through a powerful combination of Bright Data’s Web Unlocker API and Google Gemini’s large language model for intelligent data parsing. When triggered, it:
- Sets a specific Etsy search query URL to target products (e.g., “wall art for mum” sorted by newest).
- Uses Bright Data’s API to bypass restrictions and scrape Etsy’s paginated search results reliably.
- Extracts and parses multiple pages of product listings by looping through detected pagination URLs.
- Invokes Google Gemini’s AI to intelligently extract structured product info like images, names, URLs, brands, and pricing from the raw scraped HTML or markdown data.
- Triggers webhook notifications with summarized data or saves detailed JSON files of the scraped content locally.
- Allows flexibility with AI models—optionally using OpenAI GPT-4 for alternative extraction logic.
By automating these steps, Sarah cuts down her Etsy market research time from hours to minutes, improving her responsiveness and competitive edge.
Prerequisites ⚙️
- n8n automation platform account (cloud or self-hosted) 🔌
- Bright Data Web Unlocker API access and valid credentials 🔑
- Google Gemini API credentials via Google PaLM API 🔑
- Optional: OpenAI API credentials if choosing GPT-4 AI extraction 🔑
- Basic knowledge of n8n workflows and credentials management ⏱️
Step-by-Step Guide
1. Starting the Workflow with Manual Trigger
Navigate to your n8n editor and add the Manual Trigger node. This node allows you to run the workflow manually for testing or scheduled runs.
Expected: When you click “Execute”, the workflow begins processing.
Common mistake: Forgetting to connect this trigger to the next node will prevent the workflow from running.
2. Define the Etsy Search Query URL
Add a Set node named “Set Etsy Search Query.” Here, assign two string fields: url and zone.
Example: https://www.etsy.com/search?q=wall+art+for+mum&order=date_desc&page=1&ref=pagination for url and web_unlocker1 for zone.
Expected: This defines the target search result URL and Bright Data zone for scraping.
Common mistake: Not encoding spaces as plus signs (+) in search terms may break the URL.
3. Scrape Etsy Search Results via Bright Data API
Add the HTTP Request node named “Perform Etsy Web Request.” Configure it to POST to https://api.brightdata.com/request using header authentication credentials.
Set body parameters:
zonefrom the previous node’szonefieldurlwith?product=unlocker&method=apiappendedformat = rawdata_format = markdown
Expected: This requests HTML content bypassing web scraping blocks.
Common mistake: Missing or invalid authentication headers will cause request failures.
4. Extract Pagination Links Using Google Gemini AI
Add the Google Gemini Chat Model node with model models/gemini-2.0-flash-exp to process the raw HTML data.
Use an Information Extractor node named “Extract Paginated Resultset” with a JSON schema to parse pagination URLs and page numbers.
Expected: Detect multiple pagination links for following pages.
Common mistake: Incorrect JSON schema mapping may prevent pagination extraction.
5. Split Pagination URLs for Looping
Use the Split Out node to separate each pagination URL for iterative processing.
Expected: Each item contains one page URL for sequential scraping.
Common mistake: Splitting on wrong fields causes empty or malformed batches.
6. Loop Over Each Pagination URL
Add a Split In Batches node named “Loop Over Items” to control looping through pagination URLs.
Expected: Each batch triggers a new HTTP request to scrape that page.
Common mistake: Without proper batch size settings, nodes may overload or time out.
7. Re-scrape Each Page in Loop using Bright Data API
Add another HTTP Request node “Perform Etsy web request over the loop” similar to Step 3 but dynamically using the pagination URLs.
Expected: Scraper visits each page URL with web unlocker protection.
Common mistake: Not forwarding the dynamic pagination URL correctly in the body parameter.
8. Extract Product Listing Info via AI
Use the Information Extractor node “Extract Item List with the Product Info” to parse product details including image, name, URL, brand, and pricing from the scraped page content.
Expected: Get structured JSON of product listings per page.
Common mistake: Schema mismatch or improper text input can cause extraction failures.
9. Notify Webhook with Extracted Data
Configure the HTTP Request node “Initiate a Webhook Notification for the extracted data” to POST JSON summaries to a webhook URL like https://webhook.site/....
Expected: External systems receive timely notifications.
Common mistake: Forgetting to set the webhook URL or incorrect POST payload formatting.
10. Save Scraped Data to Local Disk
Use a Function node “Create a binary data” to encode JSON data in base64 for file writing.
Follow with a ReadWrite File node “Write the scraped content to disk” specifying dynamic file names by page number (e.g., d:Esty-Scraped-Content-1.json).
Expected: JSON files are saved locally for offline review.
Common mistake: Incorrect file path permissions can cause write errors.
11. Optional AI Model Swap
You can replace Google Gemini AI nodes with OpenAI Chat Model nodes for extraction flexibility, requiring OpenAI API credentials.
Expected: Seamless substitution if preferred or necessary for your use case.
Customizations ✏️
- Change Etsy search terms: Modify the
urlin the “Set Etsy Search Query” node to target different product categories or keywords. - Switch AI extraction models: Toggle between Google Gemini and OpenAI nodes to experiment with data extraction accuracy or cost.
- Save output format: Adjust the
ReadWrite Filenode to save in CSV or XML instead of JSON by changing the data transformation step. - Webhook URL customization: Update “Initiate a Webhook Notification” node to notify your preferred endpoint or integrate with messaging apps.
- Paginate deeper: Adjust loop batch size or alter extraction nodes to scrape more pages for more extensive data coverage.
Troubleshooting 🔧
Problem: HTTP Request Fails with 401 Unauthorized
Cause: Incorrect or expired Bright Data API credentials.
Solution: Go to the “Perform Etsy Web Request” node → Credentials tab → Re-enter valid Header Auth credentials.
Problem: AI Extraction Node Returns Empty or Malformed Data
Cause: Input text formatting or schema mismatch for the Information Extractor node.
Solution: Verify raw HTML content is correctly passed. Confirm JSON schema correctness under “Extract Paginated Resultset” and “Extract Item List with the Product Info” nodes.
Problem: File Write Fails with Permission Error
Cause: Insufficient file system permissions or invalid file path.
Solution: Check and update the “Write the scraped content to disk” node’s file path. Ensure n8n has write permissions for the target folder.
Pre-Production Checklist ✅
- Verify Bright Data header auth credentials are active and accurate.
- Confirm Google Gemini API credentials are set and authorized.
- Test the manual trigger and each connected node output sequentially.
- Validate that pagination links are correctly extracted and looped over.
- Confirm webhook URL is correctly set and accessible.
- Ensure file path in the write node exists and is writable.
Deployment Guide
Activate the workflow by enabling it on n8n. You can schedule this workflow to run at intervals using a Cron Trigger node if you want automated periodic scraping.
Monitor workflow executions from the n8n dashboard to catch any errors early.
Optionally integrate webhook receivers or local storage for ongoing data analysis.
FAQs
Can I use other proxies or scraping services instead of Bright Data?
Yes, but you’ll need to configure their API endpoints accordingly in the HTTP Request nodes.
Does this workflow consume many API credits?
Bright Data and Google Gemini usage depends on your subscription plans; monitor usage to avoid overages.
Is my Etsy data safe during scraping?
The workflow uses authenticated, reputable APIs in compliance with Etsy’s data policies. Do review relevant terms regularly.
Can I scale this to thousands of pages?
Yes, but consider API limits and workflow execution times in n8n. Use batch controls wisely.
Conclusion
By building this Etsy data mining automation using Bright Data’s powerful web unlocking API and Google Gemini AI, you’ve turned a time-consuming manual task into a fast, reliable process. You now gain structured insights from paginated Etsy search results with minimal effort, improving your market intelligence, and helping steer your Etsy shop strategy confidently.
You’ve saved hours weekly and reduced error risks in data collection. Next steps could include integrating price tracking alerts, competitive sentiment analysis, or auto-updating your product databases.
With n8n and these AI-enhanced scraping nodes, your Etsy research just got smarter and faster.