Automated Web Scraping with n8n and OpenRouter GPT-4

This n8n workflow automates scraping product data from URLs listed in Google Sheets using BrightData API and OpenRouter’s GPT-4, cleaning and extracting structured product details efficiently. It saves hours and eliminates manual errors in data collection.
lmChatOpenRouter
httpRequest
googleSheets
+6
Workflow Identifier: 1586
NODES in Use: Manual Trigger, Google Sheets, Split In Batches, HTTP Request, Code, OpenRouter Chat Model, Chain LLM, Structured Output Parser, Split Out
Automate web scraping with n8n and OpenRouter

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

What This Automation Does

This workflow gets product info from a list of URLs in Google Sheets.
It solves the problem of manually copying product details from many competitor websites.
The workflow scrapes web pages, cleans the HTML, extracts product data using GPT-4, and puts the data back into Google Sheets.
You save time, avoid missing info, and get fresh, structured data to use.

The inputs are URLs from a sheet.
Processing includes scraping with BrightData API, cleaning unwanted HTML parts, running a language model to pull product name, description, rating, reviews count, and price.
Outputs are rows added into a result sheet with clean product data.


Tools and Services Used

  • Google Sheets: Stores URLs and results.
  • BrightData Web Scraping API: Retrieves the raw HTML from product pages.
  • OpenRouter GPT-4.1 Model: Processes cleaned HTML to extract product data.
  • n8n Automation Platform: Runs workflow nodes and manages data flow.


Beginner Step-by-Step: How to Use This Workflow in n8n

Importing and Setup

  1. Download the workflow file from the Download button on this page.
  2. Open your n8n editor already logged in.
  3. Click “Import from File” and select the downloaded workflow file.
  4. Once imported, add your Google Sheets OAuth2 credentials in the Google Sheets nodes.
  5. Enter your BrightData API Key in the “scrap url” HTTP Request node headers.
  6. Check and update the Google Sheets document ID and sheet names if your sheet names or IDs differ.
  7. If you want, review the code in the “clean html” node and use the exact JavaScript snippet provided.
  8. Verify the OpenRouter Chat Model node is set to use GPT-4.1 and your OpenRouter API Key is active.
  9. Test the workflow by clicking the Manual Trigger node and see outputs step by step.
  10. After tests pass, activate the workflow with the toggle at the top right to run automatically.
  11. Optional: Schedule the workflow or connect it to another trigger to run as needed.

Tips for Easy Configuration

  1. Use environment variables for all API Keys and tokens to keep credentials safe.
  2. Keep your Google Sheets tidy and avoid empty rows in the URLs sheet.
  3. Monitor logs on run to catch any early errors.
  4. For running on your own server, consider self-host n8n.


Inputs, Processing Steps, Outputs

Inputs

  • A list of product URLs stored in a Google Sheet.

Processing Steps

  1. Read URLs: The workflow reads URLs from the input sheet using the Google Sheets node.
  2. Batch URLs: Using the Split In Batches node, URLs are sent in batches one at a time.
  3. Scrape HTML: The scrap url HTTP Request node sends each URL to BrightData API to get raw HTML.
  4. Clean HTML: A Code node runs JavaScript code to remove scripts, styles, comments, head tags, and classes.
  5. Extract Data: The cleaned HTML gets passed to the OpenRouter GPT-4.1 model using the OpenRouter Chat Model node plus Chain LLM + Structured Output Parser nodes to create strict JSON product data.
  6. Split Data: The extracted product objects are split into individual records for sheet insertion.
  7. Append to Sheet: Each product entry is appended to the results sheet in Google Sheets.
  8. Loop: The workflow loops back to process every batch until all URLs are done.

Outputs

  • Structured rows in a Google Sheet containing product name, description, rating, reviews count, and price.


Edge Cases and Troubleshooting

401 Unauthorized on HTTP Request Node

The scrap url node fails if the BrightData API Key is wrong or expired.
Fix this by updating the API Key in the node headers.
Test the key using another API tester if possible.

Malformed or Empty JSON from OpenRouter GPT-4

If data extraction is empty or broken, verify the cleaned HTML output.
Review the prompt and JSON schema in the Language Model nodes for errors.

Google Sheets Append Errors

Issues can occur if field mappings are wrong or OAuth tokens expired.
Check mappings carefully and re-authenticate Google Sheets credentials.


Customization Ideas

  • Change the BrightData “zone” parameter to try other proxy zones for better success on tough sites.
  • Adjust the batch size in the Split In Batches node to balance speed and API limits.
  • Add more product attributes in the GPT-4 prompt, like availability or shipping info.
  • Swap OpenRouter GPT-4 for other language models like OpenAI GPT-4 or Anthropic Claude nodes.


Summary

✓ Saves hours weekly by automating product data collection.
✓ Reduces errors by standardizing data extraction.
✓ Feeds fresh and structured product data directly into Google Sheets.
✓ Scales to handle large lists with batching and loops.
✓ Uses familiar tools like Google Sheets and easy setup in n8n.


Automate web scraping with n8n and OpenRouter

Visit through Desktop to Interact with the Workflow.

Frequently Asked Questions

Yes, change the HTTP Request node URL and parameters to another scraping API and update authentication accordingly.
Credit use depends on page size and number of URLs but using batches makes it efficient.
Data goes through authorized APIs and your Google Sheets account, which should be secured by permissions and secret API Keys.
Yes, batching and looping allow safe processing of large URL lists within API limits.

Promoted by BULDRR AI

Related Workflows

Automate Twist Channel Creation and Messaging with n8n

This workflow automates creating and updating a channel in Twist and sending a personalized message to specific users. It eliminates manual setup errors and saves time managing Twist communications.

Automate Ideogram Image Generation with Google Sheets & Gmail

This workflow automates graphic design image generation via Ideogram AI, storing image data in Google Sheets and Google Drive, with email alerts via Gmail. It saves designers hours by automating image creation, remixing, review, and record-keeping.

Automate IT Support with Slack and OpenAI in n8n

Streamline IT support by automating Slack message handling using n8n and OpenAI. This workflow handles Slack DMs, filters bots, queries a Confluence knowledge base, and delivers AI-generated responses, improving support efficiency and response time.

Automate Crypto Analysis with CoinMarketCap & n8n AI Agent

Discover how this unique n8n workflow leverages CoinMarketCap’s multi-agent AI to deliver precise, real-time cryptocurrency insights directly via Telegram. Manage crypto data analysis efficiently with automated multi-source API integration.

Automate Gumroad to Beehiiv Subscriber Sync with n8n

Learn how to automatically add new Gumroad sales customers as Beehiiv newsletter subscribers using n8n automation. This workflow saves time by syncing sales data to Google Sheets CRM and notifying your Telegram channel instantly.

Generate On-Brand Blog Articles Using n8n and OpenAI

This workflow automates the creation of on-brand blog articles by analyzing existing company content using n8n and OpenAI. It extracts article structures and brand voice to produce consistent draft articles, saving significant content creation time.
1:1 Free Strategy Session
Your competitors are already automating. Are you still paying for it manually?

Do you want to adopt AI Automation?

Every hour your team does repetitive work, you're burning real money.
While you wait, faster businesses are cutting costs and moving quicker.
AI and automations aren't the future anymore — they're the present.

Book a live 1-on-1 session where we show you exactly which of your daily tasks can be automated — and what it’s costing you not to.