Opening Problem Statement
Meet Lisa, a business analyst at a B2B marketing firm. Every week, Lisa receives a spreadsheet filled with hundreds of company names and their website URLs. Her job is to enrich this list with critical business details like the company’s core activities, main products, value propositions, and customer profiles. Doing this manually means Lisa wastes countless hours scouring company websites, copying data, and guessing business models. On top of that, mistakes creep in, some insights are missed, and the firm loses out on targeting opportunities and strategic decisions.
Lisa’s challenge is specific: How can she automatically enrich her company list with reliable business insights without spending hours or risking errors? This is exactly the gap the n8n workflow “Enrich Company Data from Google Sheet with OpenAI Agent and Scraper Tool” fills.
What This Automation Does
This workflow runs a seamless process that takes a list of companies from Google Sheets, scrapes their websites, uses an AI agent to analyze the content, and then updates the sheet with structured enriched data. Here’s what happens each time it runs:
- Retrieves company names and websites from your Google Sheet.
- Loops through each company to fetch the homepage content via ScrapingBee (a web scraper).
- Passes the scraped HTML, converted to Markdown, to the OpenAI GPT-4o-mini model.
- Extracts specific business information: Business Area, Products/Offers, Value Proposition, Business Model, and Ideal Customer Profile (ICP).
- Handles edge cases such as insufficient data or mismatched website content with clear diagnostics.
- Automatically updates the original Google Sheet rows with the newly enriched data.
The benefits are concrete: save hours to days per weekly update, reduce errors, and get richer, ready-to-use business insights directly in your spreadsheet.
Prerequisites ⚙️
- Google Sheets account with your list of companies (must include at least “Company” and “Website” columns).
- ScrapingBee API key for reliable website scraping.
- OpenAI API key with GPT-4o-mini model access.
- n8n account with workflow creation and credentials configured.
- Basic familiarity with API credentials management in n8n.
- Optional: Hosting option if self-hosting n8n workflow (such as on Hostinger via buldrr.com).
Step-by-Step Guide
Step 1: Set up Google Sheets with your Company List
Open Google Sheets and prepare a sheet with these columns: Company, Website. Include additional columns like Business Area, Offer, Value Proposition, Business Model, ICP, and Additional Information for enrichment outputs.
Example entry:
Company: ACME Corp Website: https://www.acmecorp.com
You should see your list ready to be read by the workflow.
Common mistake: Missing or incorrect URLs will cause scraping failures later.
Step 2: Configure the “Get rows from Google Sheet” node
In n8n, add a Google Sheets node and select operation: read rows, connect your Google API credentials, and enter the documentId for your Sheet. Specify the sheet name (e.g., “Sheet1” or “gid=0”).
This node retrieves the full company list to start the process.
Step 3: Add a “Loop Over Items” node
Drag the SplitInBatches node (named “Loop Over Items”) to handle one company at a time. Connect the Google Sheets node’s output here.
This ensures each company’s website is processed independently.
Step 4: Set the URL for Scraping
Add a Set node (named “Set company url”) to assign the company website URL into a field named url. Use an expression like {{$json.website}} to dynamically pull the URL for each iteration.
Step 5: Configure the ScrapingBee HTTP Request node
Add an HTTP Request node (“ScrapingBee : Scrape company’s homepage data”) and select GET method.
Set the URL to https://app.scrapingbee.com/api/v1 and add query parameters: api_key with your ScrapingBee API key and url with the scraped company URL from the previous node: {{$json.url}}.
This will fetch the raw HTML content of the homepage.
Step 6: Convert HTML to Markdown
Add the Markdown node “HTML to Markdown” to convert the HTML data (from ScrapingBee) into Markdown format. Use {{$json.data}} as input to reduce token usage for OpenAI.
Step 7: Configure the OpenAI Chat Model node
Add the “OpenAI Chat Model” (LangChain GPT model) node, select GPT-4o-mini, and connect your OpenAI API credentials.
Step 8: Set up Structured Output Parser
Add the “Structured Output Parser” LangChain node with a manual schema specifying fields: Business Area, Offers or Product, Value Proposition, Business Model, Ideal Customer Profile, and Additional Information with sub-properties.
This parser organizes the AI response into a structured JSON for easy Google Sheets update.
Step 9: Build AI Agent with Subnodes
Create an “AI Agent” node configured to receive scraped data and extract the specified company details using the OpenAI Chat Model, the Scraping Workflow Call, and the Structured Output Parser. The prompt instructs the AI to give English output and handle edge cases.
Step 10: Update the Company Row in Google Sheets
Add another Google Sheets node “Update Company’s Row on Google Sheet” to write the parsed AI output back into the correct row number using the matching column “row_number”.
Map the AI output fields to Google Sheets columns: ICP, Offer, Business area, Business Model, Value proposition, and Additional information.
Customizations ✏️
- Extend Scraper to Additional Pages: Change the ScrapingBee URL query in the HTTP Request node to also scrape “About Us” or “Pricing” pages by appending paths. This will give richer data but use extra tokens.
- Enhanced Data Validation: Add error handling to verify if scraped data is sufficient, then skip or flag rows with insufficient info using the “Additional Information” fields.
- Integrate a CRM: Replace Google Sheets nodes with a database or CRM integration node to directly enrich your client records.
- Multi-language Support: Update the AI Agent prompt to accommodate multiple languages and perform translations as needed.
- Trigger from Form Submissions: Add a Webhook or Form trigger that takes real-time leads and enriches their company data live.
Troubleshooting 🔧
- Problem: “No data returned from ScrapingBee”
Cause: Incorrect API key or URL parameter missing.
Solution: Verify your ScrapingBee API key in the HTTP node. Double-check the URL field uses{{$json.url}}and the company website URLs are valid. - Problem: “OpenAI API error: max tokens exceeded”
Cause: Sent HTML too large or complex.
Solution: Use the “HTML to Markdown” node to reduce token usage; consider limiting scope of scraping to homepage only. - Problem: “Mismatch in row update or no update made”
Cause: Incorrect matching columns or missing row_number.
Solution: Make sure your Google Sheet has a row_number column populated and the update node uses this column to match rows exactly.
Pre-Production Checklist ✅
- Confirm Google Sheets document ID and sheet name match your environment.
- Ensure all API keys (OpenAI, ScrapingBee, Google) are valid and active.
- Test scraping on a few sample URLs to verify response content is retrieved correctly.
- Run AI Agent manually on test data to verify correct structured outputs.
- Backup your Google Sheet or use a copy before running full automation.
Deployment Guide
Activate your workflow in n8n after final configuration of credentials and test runs. Trigger it manually or set a webhook to start the process. Monitor executions through the n8n UI execution logs to confirm success or diagnose failures.
FAQs
- Can I use other scraping tools? Yes, but ScrapingBee is recommended for its reliability and API features integrated with this workflow.
- Does this consume many OpenAI tokens? Token usage depends on the website content size; converting HTML to Markdown reduces token consumption.
- Is my data secure? Yes, with proper API key management and n8n hosting security, your data remains private.
- Can it handle hundreds of companies? Yes, but consider running the workflow in batches with the SplitInBatches node to avoid rate limits.
Conclusion
By following this guide, you’ve built a sophisticated automation that enriches your company data effortlessly. You save invaluable time from hours to days, improve data accuracy, and gain deeper marketing and sales insights without lifting a finger.
Next steps? Consider automating lead enrichment directly into your CRM, expanding scraping to additional company pages, or integrating notifications to your sales team via Slack or Teams whenever new insights are added.
This workflow transforms tedious research into reliable, automated enrichment — empowering smarter business decisions every day.