Automated Web Scraping with n8n and OpenRouter GPT-4

This n8n workflow automates scraping product data from URLs listed in Google Sheets using BrightData API and OpenRouter’s GPT-4, cleaning and extracting structured product details efficiently. It saves hours and eliminates manual errors in data collection.
lmChatOpenRouter
httpRequest
googleSheets
+6
Workflow Identifier: 1586
NODES in Use: Manual Trigger, Google Sheets, Split In Batches, HTTP Request, Code, OpenRouter Chat Model, Chain LLM, Structured Output Parser, Split Out

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

What This Automation Does

This workflow gets product info from a list of URLs in Google Sheets.
It solves the problem of manually copying product details from many competitor websites.
The workflow scrapes web pages, cleans the HTML, extracts product data using GPT-4, and puts the data back into Google Sheets.
You save time, avoid missing info, and get fresh, structured data to use.

The inputs are URLs from a sheet.
Processing includes scraping with BrightData API, cleaning unwanted HTML parts, running a language model to pull product name, description, rating, reviews count, and price.
Outputs are rows added into a result sheet with clean product data.


Tools and Services Used

  • Google Sheets: Stores URLs and results.
  • BrightData Web Scraping API: Retrieves the raw HTML from product pages.
  • OpenRouter GPT-4.1 Model: Processes cleaned HTML to extract product data.
  • n8n Automation Platform: Runs workflow nodes and manages data flow.


Beginner Step-by-Step: How to Use This Workflow in n8n

Importing and Setup

  1. Download the workflow file from the Download button on this page.
  2. Open your n8n editor already logged in.
  3. Click “Import from File” and select the downloaded workflow file.
  4. Once imported, add your Google Sheets OAuth2 credentials in the Google Sheets nodes.
  5. Enter your BrightData API Key in the “scrap url” HTTP Request node headers.
  6. Check and update the Google Sheets document ID and sheet names if your sheet names or IDs differ.
  7. If you want, review the code in the “clean html” node and use the exact JavaScript snippet provided.
  8. Verify the OpenRouter Chat Model node is set to use GPT-4.1 and your OpenRouter API Key is active.
  9. Test the workflow by clicking the Manual Trigger node and see outputs step by step.
  10. After tests pass, activate the workflow with the toggle at the top right to run automatically.
  11. Optional: Schedule the workflow or connect it to another trigger to run as needed.

Tips for Easy Configuration

  1. Use environment variables for all API Keys and tokens to keep credentials safe.
  2. Keep your Google Sheets tidy and avoid empty rows in the URLs sheet.
  3. Monitor logs on run to catch any early errors.
  4. For running on your own server, consider self-host n8n.


Inputs, Processing Steps, Outputs

Inputs

  • A list of product URLs stored in a Google Sheet.

Processing Steps

  1. Read URLs: The workflow reads URLs from the input sheet using the Google Sheets node.
  2. Batch URLs: Using the Split In Batches node, URLs are sent in batches one at a time.
  3. Scrape HTML: The scrap url HTTP Request node sends each URL to BrightData API to get raw HTML.
  4. Clean HTML: A Code node runs JavaScript code to remove scripts, styles, comments, head tags, and classes.
  5. Extract Data: The cleaned HTML gets passed to the OpenRouter GPT-4.1 model using the OpenRouter Chat Model node plus Chain LLM + Structured Output Parser nodes to create strict JSON product data.
  6. Split Data: The extracted product objects are split into individual records for sheet insertion.
  7. Append to Sheet: Each product entry is appended to the results sheet in Google Sheets.
  8. Loop: The workflow loops back to process every batch until all URLs are done.

Outputs

  • Structured rows in a Google Sheet containing product name, description, rating, reviews count, and price.


Edge Cases and Troubleshooting

401 Unauthorized on HTTP Request Node

The scrap url node fails if the BrightData API Key is wrong or expired.
Fix this by updating the API Key in the node headers.
Test the key using another API tester if possible.

Malformed or Empty JSON from OpenRouter GPT-4

If data extraction is empty or broken, verify the cleaned HTML output.
Review the prompt and JSON schema in the Language Model nodes for errors.

Google Sheets Append Errors

Issues can occur if field mappings are wrong or OAuth tokens expired.
Check mappings carefully and re-authenticate Google Sheets credentials.


Customization Ideas

  • Change the BrightData “zone” parameter to try other proxy zones for better success on tough sites.
  • Adjust the batch size in the Split In Batches node to balance speed and API limits.
  • Add more product attributes in the GPT-4 prompt, like availability or shipping info.
  • Swap OpenRouter GPT-4 for other language models like OpenAI GPT-4 or Anthropic Claude nodes.


Summary

✓ Saves hours weekly by automating product data collection.
✓ Reduces errors by standardizing data extraction.
✓ Feeds fresh and structured product data directly into Google Sheets.
✓ Scales to handle large lists with batching and loops.
✓ Uses familiar tools like Google Sheets and easy setup in n8n.


Frequently Asked Questions

Yes, change the HTTP Request node URL and parameters to another scraping API and update authentication accordingly.
Credit use depends on page size and number of URLs but using batches makes it efficient.
Data goes through authorized APIs and your Google Sheets account, which should be secured by permissions and secret API Keys.
Yes, batching and looping allow safe processing of large URL lists within API limits.

Promoted by BULDRR AI

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation Workflows in n8n

A complete beginner guide to building an AI SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free