Opening Problem Statement
Meet Sarah, a business analyst at a fast-growing marketing agency. Every day, Sarah spends hours manually researching potential clients’ company data – looking up domains, LinkedIn profiles, pricing plans, integrations, and whether they offer free trials or enterprise plans. This process easily takes her 4+ hours weekly, prone to errors and outdated info. Worse, Sarah’s team often misses important opportunities due to incomplete or stale data.
This tedious and error-prone manual research slows down strategic decisions and business growth. Sarah desires a streamlined way to automatically gather and enrich company profiles accurately, freeing her time for high-impact work.
What This Automation Does
This n8n workflow automates Sarah’s company data enrichment process using AI agents, Google Sheets, and web scraping tools to research and update company info from just a name or domain. Here’s what happens when it runs:
- Automatically retrieves company data one row at a time from a Google Sheet input list with unresearched companies.
- Uses OpenAI GPT-4 powered AI agents to research companies, extracting info like domain, LinkedIn URL, market type (B2B or B2C), cheapest pricing plan, availability of enterprise plans, APIs, free trials, and case study links.
- Augments AI research by searching Google results via SerpAPI or alternative scraping, and fetching website content with a sub-workflow to analyze URLs directly.
- Parses AI output into structured data fields for easy integration and reliability.
- Updates the original Google Sheet with enriched data and marks each row as completed, all automatically.
- Can run on-demand via manual trigger or on a schedule, meaning data is consistently fresh without manual effort.
By automating company research, Sarah saves around 4+ hours weekly and eliminates errors, enabling smarter sales outreach and competitive analysis faster. This workflow is a game-changer for those needing reliable and up-to-date business intelligence.
Prerequisites ⚙️
- 📊 Google Sheets account with a prepared spreadsheet for storing company data
- 🔑 OpenAI API key for access to GPT-4 model AI research capabilities
- 🔑 SerpAPI or ScrapingBee API key for Google search scraping (SerpAPI is default, ScrapingBee is an alternative)
- ⚙️ n8n account (cloud or self-hosted) to orchestrate workflow automation
Step-by-Step Guide
1. Prepare Your Google Sheet
Start with a Google Sheet structured to hold your company names and enrichment results. Use the template linked in the Sticky Note node or create columns: company_input, domain, linkedinUrl, market, cheapest_plan, has_free_trial, has_enterprise_plan, has_API, integrations, last_case_study_link, and enrichment_status.
Ensure the sheet ID and sheet name are correctly set in the Google Sheets nodes using the document and sheet IDs found in the URL.
Common mistake: Not matching the column names exactly will cause update failures.
2. Trigger the Workflow Manually or Schedule
Use the Manual Trigger node named “When clicking “Test workflow”” to run your workflow on demand from n8n. For automated runs, configure the Schedule Trigger node to execute every 2 hours or as needed.
Visual: You should see the workflow initiate run logs in n8n when triggered.
Common mistake: Forgetting to activate the schedule trigger will prevent automated runs.
3. Fetch Rows to Enrich from Google Sheets
The Google Sheets – Get rows to enrich node is configured to pull all rows with enrichment_status unset or not “done”. This ensures only new or updated companies are researched.
Set filter in this node under ‘filtersUI’ to only grab rows needing enrichment.
Expected outcome: The node outputs company names and row indexes one by one.
4. Iterate Rows with SplitInBatches
The Loop Over Items node uses SplitInBatches to process companies one at a time, preventing API overload and improving data integrity.
You will see variables like company_input and row_number prepared for the research steps.
5. Set Company Input Data
Input node sets and formats the company name and row number for downstream nodes using the Set node type.
Ensure the company name is correctly passed, as this is the core input for AI research.
6. Run the AI Company Researcher Agent
The heart of the workflow is the AI company researcher node (LangChain Agent). It sends a structured prompt to OpenAI’s GPT-4 model requesting details about the company such as domain, LinkedIn URL, pricing, API availability, and integrations.
Prompt excerpt:
=This is the company I want you to research info about:
{{ $json.company_input }}
Return me:
- the linkedin URL of the company
- the domain of the company. in this format ([domain].[tld])
- market: if they are B2B or B2C. Only reply by "B2B" or "B2C"
- the lowest paid plan ...This agent also integrates outputs from two specialized AI tools:
- SerpAPI – Search Google for scraping Google search results relevant to pricing and case studies.
- Get website content sub-workflow that fetches the raw HTML content of a company website for deeper analysis.
Common mistake: Missing or incorrect OpenAI or SerpAPI credentials will cause this step to fail.
7. Parse Structured Output from AI
The Structured Output Parser node ensures the AI response matches the expected JSON schema with keys like domain, linkedinUrl, market, cheapest_plan, etc. This validation step prevents malformed data.
Expected result: A clean JSON object containing all requested company details.
8. Format Data for Sheet Update
The AI Researcher Output Data node uses a Set node to map AI output fields to variables that correspond to Google Sheet columns.
Example mapping:
{
"domain": "={{ $json.output.domain }},
"linkedinUrl": "={{ $json.output.linkedinUrl }}",
"market": "={{ $json.output.market }}",
...
}9. Merge with Input Data
The Merge data node combines original input data with the AI-enriched output data to prepare a full row update.
10. Update Company Row in Google Sheets
The final Google Sheets – Update Row with data node updates the corresponding row by row_number, writing all enriched company info back to the sheet and marks enrichment_status as “done”.
Outcome: You will see your Google Sheet automatically populated with rich company profiles.
Common mistake: Incorrect sheet or document ID configurations will cause update failures.
Customizations ✏️
- Add additional company info: Modify the AI researcher node prompt to request more company data fields like CEO name, revenue, or competitor info.
- Use ScrapingBee instead of SerpAPI: Replace the
SerpAPI - Search Googlenode with theSearch Google with ScrapingBeecustom workflow for a cost-efficient alternative. Don’t forget to update your credentials. - Run workflow only on specific companies: Adjust the Google Sheets node filter to enrich companies based on custom criteria like market sector or geographic region by adding filter conditions.
- Expand integrations: Broaden the AI prompt to gather and parse additional integration tools used by companies, then extend the sheet columns accordingly.
- Schedule frequency: Change
Schedule Triggernode to run more or less frequently based on your update needs (e.g., every 24 hours or daily at midnight).
Troubleshooting 🔧
Problem: “Invalid API key” error from OpenAI node
Cause: Incorrect or expired OpenAI credential in n8n settings.
Solution: Go to Credentials → OpenAI API → Re-enter or update your API key. Test connection before running.
Problem: Google Sheets update fails with “row not found”
Cause: Mismatch between row_number from data and actual Google Sheets row index.
Solution: Verify the sheet configuration and ensure the filter and batch processing correctly track row indexes. Refresh schema if needed.
Problem: AI company researcher returns null or incomplete data
Cause: OpenAI request limits hit or prompt ambiguity.
Solution: Reduce batch size, increase model temperature for variety, clarify prompt instructions, or check API usage quotas.
Pre-Production Checklist ✅
- Verify Google Sheets document and sheet IDs match your actual spreadsheet.
- Test API connections for OpenAI, SerpAPI, and ScrapingBee in n8n credentials.
- Run workflow manually on a few sample companies to check data enrichment accuracy.
- Confirm output matches expected JSON schema in the structured output parser node.
- Backup your Google Sheet before running large updates to prevent accidental data loss.
Deployment Guide
Activate the workflow in n8n after verification. Set the schedule trigger node to your desired frequency, or use the manual trigger for on-demand updates.
Monitor runs through n8n’s execution logs to catch any errors. Adjust API quotas or prompt parameters as your company list grows.
FAQs
Q: Can I replace Google Sheets with another database?
A: Yes, but you’ll need to replace Google Sheets nodes with the appropriate database nodes and adjust data mapping accordingly.
Q: Does using AI models consume a lot of API credits?
A: Yes, consider usage costs depending on your OpenAI plan. Running regular batches can add up, so adjust frequency accordingly.
Q: Is my data secure in this workflow?
A: n8n encrypts credentials, and data stays within your environment. Avoid sharing API keys publicly and use TLS when possible.
Conclusion
By following this guide, you transformed manual, error-prone company research into an automated, scalable process using n8n and AI agents. Sarah can now enrich hundreds of company profiles with detailed, structured data directly in Google Sheets automatically.
This saves her and her team hours of manual work weekly, improves data quality, and ensures up-to-date competitive intelligence. Next, consider expanding this with automated leads scoring, CRM integrations, or competitor trend analysis – all possible with n8n’s extensible workflow automation.
Ready to start automating your research? Let’s build smarter business intelligence together!