Opening Problem Statement
Meet Sarah, an e-commerce analyst at a growing tech gadgets startup. Every week, she manually tracks Amazon’s best-selling electronics to monitor trends, competitor pricing, and customer ratings. This painstaking process involves copying product details, images, and offer information from multiple Amazon pages, then consolidating everything into spreadsheets for analysis. Sarah spends upwards of 5 hours weekly on this repetitive task, prone to errors and outdated data usage, which slows down her team’s response to market changes.
Sound familiar? If you rely on manual scraping or inconsistent data sources, lost productivity and misinformation can directly impact your sales strategy and decision-making. Fortunately, with automation, Sarah can reclaim her time and improve data accuracy significantly.
What This Automation Does
This automation eliminates the tedious manual scraping by programmatically extracting Amazon’s Best Seller electronics data using powerful tools. Here’s exactly what happens when you run this workflow:
- Fetches live Amazon Best Seller electronics page data via a reliable web scraping API (Bright Data), bypassing manual copy-paste.
- Processes the raw HTML response and uses Google Gemini’s advanced language model to extract well-structured product information.
- Extracts key product details such as rank, title, image, star rating, total ratings, offers, and product URLs automatically.
- Transforms the extracted information into a clear, structured JSON object matching a defined schema for easy downstream use.
- Notifies your team or systems by sending the structured data automatically to any webhook or API endpoint for further actions.
By converting hours of manual compilation into minutes of automated processing, this workflow saves you valuable time and reduces human error, enabling real-time competitive analysis.
Prerequisites ⚙️
- n8n account (self-hosting supported if preferred for full control) 🔑
- Bright Data API account for web scraping (paid service) 🔐
- Google Gemini (PaLM) API credentials for AI-powered text extraction 💬
- Access to an HTTP endpoint to receive webhook notifications (e.g., webhook.site or your own server) 🔌
Step-by-Step Guide
Step 1: Manual Trigger Setup
In n8n, create a Manual Trigger node to start the workflow on demand. This is useful for initial testing and scheduled runs. You do this by clicking Add Node → Manual Trigger. This allows you to run the extraction anytime you want.
Step 2: Set Amazon URL and Bright Data Zone
Add a Set node. Configure it to assign two variables: url set to the Amazon Best Seller electronics page link (e.g., “https://www.amazon.in/gp/bestsellers/electronics/1389432031?product=unlocker&method=api”) and zone as your Bright Data scraping zone (e.g., “web_unlocker1”). This prepares the data for the HTTP request node.
Step 3: Fetch Amazon Best Seller Products with Bright Data
Add an HTTP Request node. Configure it as follows:
- Method: POST
- URL: https://api.brightdata.com/request
- Body Parameters: Include
zone,url, andformat = rawbundled as JSON. - Authorization: HTTP Header Auth with your Bright Data API key in headers.
This node sends the scrape request and receives raw HTML content of the Amazon Best Seller page.
Step 4: Extract Structured Data Using Google Gemini Chat Model
Add the Google Gemini Chat Model node from the LangChain n8n nodes collection. Configure it for the “models/gemini-2.0-flash-exp” model with your Google PaLM API credentials. This node will process the raw HTML text extracted from Bright Data output.
Step 5: Use LangChain Information Extractor Node for Schema-based Extraction
Connect the output of the HTTP Request node to the Information Extractor node (LangChain extension). Here, paste the custom JSON schema that defines the Amazon Best Seller page data structure:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Amazon Bestsellers - Smartphones & Basic Mobiles",
"type": "object",
"properties": {
"category": {"type": "string"},
"description": {"type": "string"},
"page": {"type": "string"},
"bestsellers": {
"type": "array",
"items": {
"type": "object",
"properties": {
"rank": {"type": "integer"},
"title": {"type": "string"},
"image": {"type": "string", "format": "uri"},
"rating": {
"type": "object",
"properties": {
"stars": {"type": "number"},
"total_ratings": {"type": "integer"}
},
"required": ["stars", "total_ratings"]
},
"offer": {"type": "string"},
"product_url": {"type": "string", "format": "uri"}
},
"required": ["rank", "title", "image", "rating", "offer", "product_url"]
}
}
},
"required": ["category", "description", "page", "bestsellers"]
}This turns unstructured HTML into clean JSON with all product attributes automatically extracted.
Step 6: Send Extracted Data to Webhook for Notifications
Add an HTTP Request node configured to POST your JSON summary to any webhook or API endpoint. In the example, a webhook.site URL is used. This enables downstream systems or team notifications whenever new data is extracted.
Step 7: Connect All Nodes Sequentially
Ensure the workflow connects as follows:
- Manual Trigger → Set node for Amazon URL & zone
- Set node → HTTP Request to Bright Data
- HTTP Request → Information Extractor
- Information Extractor → Webhook Notifier HTTP Request
Final outputs will be the structured, insightful data delivered at your webhook endpoint.
Customizations ✏️
- Update Amazon URL and Zone: Change values in the Set node to target any Amazon category or product list by replacing the URL. Adjust the Bright Data zone to match your scraping subscription.
- Enhance Extraction Schema: Modify the JSON schema in the Information Extractor node to capture additional attributes like product brand, price history, or shipping info for deeper analysis.
- Change Notification Endpoint: Replace webhook.site URL with your own webhook or integration endpoint (Slack, Teams, or database API) to automate alerts or data storage.
Troubleshooting 🔧
Problem: “HTTP Request to Bright Data fails with 403 Forbidden.”
Cause: Invalid API key or incorrect zone parameter.
Solution: Verify your Bright Data credentials and update the zone value accurately in the Set node.
Problem: “Information Extractor returns empty or partial data.”
Cause: Schema misalignment or raw data format changed.
Solution: Review and update your JSON schema in the Information Extractor node to match the latest Amazon page structure.
Pre-Production Checklist ✅
- Verify Bright Data API credentials and subscription status.
- Confirm Google Gemini API keys are active and valid.
- Update and test the Amazon URL for current bestseller listings.
- Test webhook endpoint for data reception.
- Run manual trigger to ensure end-to-end data extraction and delivery.
Deployment Guide
Activate your workflow in n8n by toggling it from inactive to active. Use scheduled triggers or manual runs depending on your use case. Monitor executions via n8n’s UI, reviewing workflow run logs for errors. Automate delivery frequency as needed to keep competitive data fresh.
FAQs
Q: Can I use another web scraping API instead of Bright Data?
A: Yes, but you must adjust the HTTP Request node parameters and authentication accordingly.
Q: Does this workflow consume many API credits?
A: Frequency and volume of requests to Bright Data and Google Gemini affect cost. Optimize runs to control spending.
Q: Is data secure with this workflow?
A: Credentials are stored securely in n8n. Use HTTPS endpoints for webhook delivery to maintain security.
Q: Can the workflow handle multiple Amazon categories?
A: Yes, by updating the URL in the Set node, you can target different product categories or pages.
Conclusion
By following this guide, you transformed manual, error-prone Amazon Best Seller data collection into a streamlined, automated process using Bright Data and Google Gemini in n8n. You gained precise, structured product insights with just a click, saving hours of labor weekly and enabling quicker strategy pivots based on up-to-date marketplace intelligence.
Next steps? Consider integrating this data into Google Sheets for collaborative tracking, adding email alerts for team updates, or expanding scraping to other e-commerce sites using the same principles.
Happy automating!