1. Opening Problem Statement
Meet Lucas, a researcher at a B2B marketing agency, tasked with compiling detailed company profiles from a large list of domains. Manually visiting each website to capture their core value proposition, target audience, industry classification, and market type is a tedious, error-prone task. Lucas wastes hours each week copying text, guessing company focus areas, and inputting data into Google Sheets. This delays campaign launches and frustrates client teams awaiting accurate intel.
This workflow directly addresses Lucas’s pain by extracting, summarizing, and classifying company website data automatically. It turns a drudge into a streamlined, repeatable process with minimal manual oversight.
2. What This Automation Does
When you run this n8n workflow, here is what happens:
- โจ It reads a list of company domain URLs from a Google Sheets spreadsheet.
- ๐ For each domain, it makes HTTP requests to fetch website HTML content.
- ๐ Extracts the full HTML content from each page.
- ๐งน Cleans up extracted HTML to readable text focused on the website body.
- ๐ค Sends a prompt to OpenAI to generate key company insights: value proposition, industry, target audience, and whether they operate B2B or B2C.
- ๐ Parses the structured JSON response from OpenAI to extract data fields.
- ๐๏ธ Merges generated data back with the original domain information.
- ๐ Updates the original Google Sheet with the new insights in corresponding columns.
- โฑ๏ธ Waits before processing the next batch to avoid rate limits or overload.
This saves Lucas countless hours of manual web research and data entry, improving accuracy and allowing focus on strategic analysis instead.
3. Prerequisites โ๏ธ
- n8n account (cloud or self-hosted) to run workflows.
- Google Sheets account with OAuth2 credentials connected in n8n, and access to the sheet containing company domains.
- OpenAI API key configured as credentials inside n8n.
- Basic familiarity with n8n nodes like HTTP Request and Code, though this post guides step-by-step.
4. Step-by-Step Guide
Step 1: Configure the Manual Trigger
Navigate to the When clicking “Execute Workflow” node. This node allows you to start the workflow manually for testing or batch runs.
Click on the node and verify settings are default (no extra parameters). This triggers the full process when you hit the “Execute Workflow” button.
Expected outcome: Ready to start processing when activated.
Step 2: Read Company Domains from Google Sheets
Locate the Read Google Sheets node. Configure it to point to your spreadsheet URL containing the Domain column with website URLs.
Set Sheet Name to the relevant sheet (usually Sheet1). Connect your Google Sheets OAuth2 credentials.
Expected outcome: Pulls list of domains to process.
Common mistake: Incorrect sheet name or missing OAuth credentials causes failure to fetch data.
Step 3: Split the List Into Batches
Open the Split In Batches node, which chunk processes your domain list to avoid overloading calls.
The node takes input from the previous Google Sheets node and outputs one domain at a time in a batch.
Expected outcome: Domains processed sequentially in manageable chunks.
Step 4: Fetch Website HTML Content
Select the HTTP Request node. Set the URL property to dynamically use the domain from the current batch item: https://www.{{ $node["Split In Batches"].json["Domain"] }}.
Enable follow redirects in the node options to get the final page content.
Expected outcome: HTML content of the homepage returned.
Common mistake: Wrong URL template or domains missing protocol causing failures.
Step 5: Extract HTML Content Using CSS Selector
Open the HTML Extract node configured to extract the html tag’s content using CSS selector html.
This extracts the full HTML body for further processing.
Expected outcome: Full HTML of the page stored in data for cleaning.
Step 6: Clean and Reduce Content with Code Node
Go to the Clean Content node, a JavaScript code node that trims whitespace and removes line breaks and excessive spaces from the extracted HTML content.
It also truncates the content to the first 10,000 characters to keep prompts manageable.
Code snippet:
if ($input.item.json.body){
$input.item.json.content = $input.item.json.body.replaceAll('/^s+|s+$/g', '').replace('/(rn|n|r)/gm', "").replace(/s+/g, ' ')
$input.item.json.contentShort = $input.item.json.content.slice(0, 10000)
}
return $input.item
Expected outcome: Clean text ready for OpenAI analysis.
Step 7: Generate Business Insights Using OpenAI Node
In the OpenAI node, use the prompt that feeds the cleaned website content to OpenAI. The prompt instructs the AI to summarize the company’s value proposition in less than 25 words, identify the industry (choosing from a predefined list), guess the target audience, and determine if the business is B2B or B2C.
Make sure you set the max tokens, temperature, and top P settings as desired for consistent outputs.
Expected outcome: AI returns structured JSON with four fields about the company.
Step 8: Parse JSON Response Into Usable Fields
Use the Parse JSON code node to extract the properties value_proposition, industry, target_audience, and market from the raw JSON text returned by OpenAI.
Code snippet:
$input.item.json.value_proposition=JSON.parse($input.item.json.text).value_proposition
$input.item.json.industry=JSON.parse($input.item.json.text).industry
$input.item.json.market=JSON.parse($input.item.json.text).market
$input.item.json.target_audience=JSON.parse($input.item.json.text).target_audience
return $input.item;
Expected outcome: Extracted values are now separate fields within the workflow data.
Step 9: Merge Original and AI Data
The Merge node combines the original domain data and the AI-generated insights into a single item for updating the spreadsheet.
Verify the merge mode is set to merge by position.
Expected outcome: A complete dataset ready to save.
Step 10: Update Google Sheets With New Company Data
Configure the Update Google Sheets node to match rows by the Domain column and populate columns Value Proposition, Industry, Target Audience, and Market with the AI data.
Make sure OAuth credentials are properly connected and spreadsheet access is granted.
Expected outcome: Your sheet updates with fresh business insights per domain.
Step 11: Wait Before Next Batch
The Wait node pauses processing for a configurable amount of time (in seconds) between batches to avoid API rate limits or overloads.
Expected outcome: Smooth, error-free batch processing.
5. Customizations โ๏ธ
- Change Industry List in OpenAI Prompt: Modify the industry set inside the OpenAI prompt to better fit your target sectors (e.g., to add “Technology” or “Nonprofit”).
- Adjust Wait Time: In the Wait node, increase or decrease pause duration between batches for faster or more compliant runs.
- Expand Extraction CSS Selector: Tweak the HTML Extract node’s CSS selector from
htmlto a more specific container (e.g.,bodyor#main-content) for cleaner text extraction depending on site structure. - Increase Content Length: In the Clean Content code node, increase the slice limit from 10,000 characters if you want more content sent to OpenAI for deeper understanding.
- Batch Size: Configure the Split In Batches node batch size to balance performance and API cost.
6. Troubleshooting ๐ง
Problem: “HTTP Request node fails with 404 or timeout”
Cause: Some domains might be incomplete, lack “https://”, or redirect unexpectedly.
Solution: Ensure your domains include protocol prefixes or modify the HTTP node URL to add it. Test with known working URLs first.
Problem: “OpenAI node returns invalid JSON or no response”
Cause: Prompt formatting issues or API quota exceeded.
Solution: Review the prompt for syntax errors. Check your OpenAI API rate limits and billing. Enable continue on fail to prevent total workflow failure.
Problem: “Google Sheets Update does not reflect changes”
Cause: Incorrect matching column or insufficient permissions.
Solution: Verify the “valueToMatchOn” and “columnToMatchOn” settings exactly match sheet headings. Confirm your OAuth token has write access.
7. Pre-Production Checklist โ
- Verify all Google Sheets credentials are authorized and spreadsheet URLs are correct.
- Test the HTTP Request node with sample domains to ensure fetch success.
- Run OpenAI node in test mode with a sample website content to check output format.
- Confirm merge outputs combined data correctly before updating sheets.
- Run workflow manually with a small batch before full production.
- Backup your Google Sheet to prevent accidental data loss.
8. Deployment Guide
Once tested, activate your workflow in n8n by clicking the “Activate” button. Use the manual trigger or schedule runs via additional trigger nodes if desired.
Monitor execution via n8n’s execution logs to catch any failed nodes quickly.
This workflow can be self-hosted using platforms like Hostinger (https://buldrr.com/hostinger) if you prefer full control over API credentials and execution.
9. FAQs
Can I use a different NLP provider instead of OpenAI?
Yes, you can replace the OpenAI node with other NLP or AI services that accept textual prompts and return JSON. Just adjust the prompt format accordingly.
Does this workflow consume many OpenAI API credits?
Each domain processed triggers one OpenAI call. Costs scale linearly with volume, so batching and prompt optimization reduce credits consumed.
Is my company data safe in this workflow?
Yes, n8n and OpenAI use secure connections. Sensitive info is only as safe as your API and Google access are managed.
Can I process hundreds of domains at once?
Yes, but adjust batch sizes and wait times to avoid timeouts and rate limits.
10. Conclusion
By following this guide, you’ve automated extracting valuable company profiles from just domain names using n8n, Google Sheets, and OpenAI. You save hours of manual research weekly, gain consistent insights, and update your CRM or marketing databases faster.
Next, consider automations to analyze social media sentiment for these companies or integrate with email campaign tools to target prospect segments intelligently.
Keep experimenting and refining โ automation is about making your work smarter, not harder!