Automate Company Profile Extraction with n8n and OpenAI

This workflow automates extracting business value propositions and classifications directly from company websites using n8n and OpenAI, saving hours of manual research and data entry.
manualTrigger
openAi
googleSheets
+6
Workflow Identifier: 1457
NODES in Use: Manual Trigger, Google Sheets, Split In Batches, HTTP Request, HTML Extract, Code, OpenAI, Merge, Wait
Automate company profiles with n8n and OpenAI

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

What this workflow does

This workflow reads a list of website domains from Google Sheets. It fetches the HTML content of each website. Then it cleans the content and sends it to OpenAI. OpenAI returns key company details like value proposition, industry, and market type. The workflow writes these details back into the Google Sheet. This saves time on manual web research and data entry.


Who should use this workflow

This helps marketing teams or researchers who collect company profiles from many websites. It fits users who want to quickly get business info without manually browsing. It works best for users with basic n8n skills and access to Google Sheets and OpenAI services.


Tools and services used

  • Google Sheets: Stores input domain list and output company data.
  • HTTP Request node: Fetches website HTML content.
  • HTML Extract node: Extracts the full HTML body.
  • Code node: Cleans HTML content to plain text.
  • OpenAI node: Generates company business insights.
  • Merge node: Combines original and AI data.
  • Wait node: Pauses between batches to avoid rate limits.

Inputs, processing steps, and outputs

Inputs

  • List of company domains from a Google Sheet column.

Processing steps

  • Split domains into batches to handle them one by one.
  • Send HTTP requests to fetch homepage HTML.
  • Extract HTML with CSS selector “html”.
  • Clean HTML content by removing extra spaces and truncating to 10,000 characters.
  • Send cleaned text to OpenAI with a prompt to get value proposition, industry, target audience, and market type.
  • Parse OpenAI’s JSON reply into separate fields.
  • Merge AI data with original domain info.
  • Update the Google Sheet with new insights.
  • Wait some seconds before processing next batch.

Outputs

  • Updated Google Sheet rows with new columns: Value Proposition, Industry, Target Audience, Market.

Beginner step-by-step: How to build this in n8n

1. Import the workflow

  1. Download the workflow file by clicking the Download button on this page.
  2. Go to n8n editor and click “Import from File”.
  3. Select the downloaded workflow and import it.

2. Configure credentials and settings

  1. Add Google Sheets OAuth2 credentials to allow reading and writing.
  2. Add OpenAI API Key credentials.
  3. Check and update the Google Sheet ID and Sheet Name if different.
  4. Verify that in the HTTP Request node, the URL matches the domain with correct “https://” prefix.
  5. Review the OpenAI prompt text if needed for industry changes. The prompt is inside the OpenAI node.

3. Test the workflow

  1. Run the workflow manually by clicking Execute.
  2. Verify the Google Sheet updates with extracted company data.

4. Activate for production

  1. After confirming the test work, turn on the workflow by clicking “Activate”.
  2. Optionally add triggers to schedule runs or integrate into other systems.

For users wanting full control over API keys and data, self-host n8n on a VPS can be an option.


Customization ideas

  • Change the industry list inside the OpenAI prompt to better fit target sectors.
  • Adjust the Wait node time to speed up or slow down batch processing.
  • Modify the CSS selector in the HTML Extract node for cleaner or different sections of the page.
  • Increase the slice length in the Clean Content code node to send more text to OpenAI.
  • Change batch size in the Split in Batches node based on API limits and workflow speed.

Edge cases and common errors

HTTP Request fails with 404 or timeout

Cause: Some domains may lack “https://” or redirect strangely.

Solution: Make sure all domains have a protocol prefix or edit the HTTP Request URL to add it.

OpenAI node returns invalid JSON or no response

Cause: Prompt formatting problems or API quota exceeded.

Solution: Check prompt syntax and OpenAI quotas. Enable “Continue on Fail” to avoid stopping entire workflow.

Google Sheets update does not show changes

Cause: Wrong match column or missing write access.

Solution: Confirm column names in Google Sheets node and that OAuth permissions allow updates.


Summary

✓ The workflow automatically reads domains and gets company insights.

✓ It cleans and processes website content for OpenAI.

✓ AI returns structured business details added back to Google Sheets.

→ Saves manual effort and errors in researching company data.

→ Helps marketing and research teams update databases fast.


Automate company profiles with n8n and OpenAI

Visit through Desktop to Interact with the Workflow.

Frequently Asked Questions

It fetches HTML from each domain, cleans the content, and sends it to OpenAI to generate business insights.
A column with company domain URLs, including the protocol such as “https://” if required.
Yes, the prompt inside the OpenAI node can be edited to add or modify the list of industries.
Import the workflow, add credentials, test it, then activate. Use the Wait node to avoid API limits.

Promoted by BULDRR AI

Related Workflows

Automate Twist Channel Creation and Messaging with n8n

This workflow automates creating and updating a channel in Twist and sending a personalized message to specific users. It eliminates manual setup errors and saves time managing Twist communications.

Automate Ideogram Image Generation with Google Sheets & Gmail

This workflow automates graphic design image generation via Ideogram AI, storing image data in Google Sheets and Google Drive, with email alerts via Gmail. It saves designers hours by automating image creation, remixing, review, and record-keeping.

Automate IT Support with Slack and OpenAI in n8n

Streamline IT support by automating Slack message handling using n8n and OpenAI. This workflow handles Slack DMs, filters bots, queries a Confluence knowledge base, and delivers AI-generated responses, improving support efficiency and response time.

Automate Crypto Analysis with CoinMarketCap & n8n AI Agent

Discover how this unique n8n workflow leverages CoinMarketCap’s multi-agent AI to deliver precise, real-time cryptocurrency insights directly via Telegram. Manage crypto data analysis efficiently with automated multi-source API integration.

Automate Gumroad to Beehiiv Subscriber Sync with n8n

Learn how to automatically add new Gumroad sales customers as Beehiiv newsletter subscribers using n8n automation. This workflow saves time by syncing sales data to Google Sheets CRM and notifying your Telegram channel instantly.

Generate On-Brand Blog Articles Using n8n and OpenAI

This workflow automates the creation of on-brand blog articles by analyzing existing company content using n8n and OpenAI. It extracts article structures and brand voice to produce consistent draft articles, saving significant content creation time.
1:1 Free Strategy Session
Your competitors are already automating. Are you still paying for it manually?

Do you want to adopt AI Automation?

Every hour your team does repetitive work, you're burning real money.
While you wait, faster businesses are cutting costs and moving quicker.
AI and automations aren't the future anymore — they're the present.

Book a live 1-on-1 session where we show you exactly which of your daily tasks can be automated — and what it’s costing you not to.