Automate Company Profile Extraction with n8n and OpenAI

This workflow automates extracting business value propositions and classifications directly from company websites using n8n and OpenAI, saving hours of manual research and data entry.
manualTrigger
openAi
googleSheets
+6
Workflow Identifier: 1457
NODES in Use: Manual Trigger, Google Sheets, Split In Batches, HTTP Request, HTML Extract, Code, OpenAI, Merge, Wait

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

What this workflow does

This workflow reads a list of website domains from Google Sheets. It fetches the HTML content of each website. Then it cleans the content and sends it to OpenAI. OpenAI returns key company details like value proposition, industry, and market type. The workflow writes these details back into the Google Sheet. This saves time on manual web research and data entry.


Who should use this workflow

This helps marketing teams or researchers who collect company profiles from many websites. It fits users who want to quickly get business info without manually browsing. It works best for users with basic n8n skills and access to Google Sheets and OpenAI services.


Tools and services used

  • Google Sheets: Stores input domain list and output company data.
  • HTTP Request node: Fetches website HTML content.
  • HTML Extract node: Extracts the full HTML body.
  • Code node: Cleans HTML content to plain text.
  • OpenAI node: Generates company business insights.
  • Merge node: Combines original and AI data.
  • Wait node: Pauses between batches to avoid rate limits.

Inputs, processing steps, and outputs

Inputs

  • List of company domains from a Google Sheet column.

Processing steps

  • Split domains into batches to handle them one by one.
  • Send HTTP requests to fetch homepage HTML.
  • Extract HTML with CSS selector “html”.
  • Clean HTML content by removing extra spaces and truncating to 10,000 characters.
  • Send cleaned text to OpenAI with a prompt to get value proposition, industry, target audience, and market type.
  • Parse OpenAI’s JSON reply into separate fields.
  • Merge AI data with original domain info.
  • Update the Google Sheet with new insights.
  • Wait some seconds before processing next batch.

Outputs

  • Updated Google Sheet rows with new columns: Value Proposition, Industry, Target Audience, Market.

Beginner step-by-step: How to build this in n8n

1. Import the workflow

  1. Download the workflow file by clicking the Download button on this page.
  2. Go to n8n editor and click “Import from File”.
  3. Select the downloaded workflow and import it.

2. Configure credentials and settings

  1. Add Google Sheets OAuth2 credentials to allow reading and writing.
  2. Add OpenAI API Key credentials.
  3. Check and update the Google Sheet ID and Sheet Name if different.
  4. Verify that in the HTTP Request node, the URL matches the domain with correct “https://” prefix.
  5. Review the OpenAI prompt text if needed for industry changes. The prompt is inside the OpenAI node.

3. Test the workflow

  1. Run the workflow manually by clicking Execute.
  2. Verify the Google Sheet updates with extracted company data.

4. Activate for production

  1. After confirming the test work, turn on the workflow by clicking “Activate”.
  2. Optionally add triggers to schedule runs or integrate into other systems.

For users wanting full control over API keys and data, self-host n8n on a VPS can be an option.


Customization ideas

  • Change the industry list inside the OpenAI prompt to better fit target sectors.
  • Adjust the Wait node time to speed up or slow down batch processing.
  • Modify the CSS selector in the HTML Extract node for cleaner or different sections of the page.
  • Increase the slice length in the Clean Content code node to send more text to OpenAI.
  • Change batch size in the Split in Batches node based on API limits and workflow speed.

Edge cases and common errors

HTTP Request fails with 404 or timeout

Cause: Some domains may lack “https://” or redirect strangely.

Solution: Make sure all domains have a protocol prefix or edit the HTTP Request URL to add it.

OpenAI node returns invalid JSON or no response

Cause: Prompt formatting problems or API quota exceeded.

Solution: Check prompt syntax and OpenAI quotas. Enable “Continue on Fail” to avoid stopping entire workflow.

Google Sheets update does not show changes

Cause: Wrong match column or missing write access.

Solution: Confirm column names in Google Sheets node and that OAuth permissions allow updates.


Summary

✓ The workflow automatically reads domains and gets company insights.

✓ It cleans and processes website content for OpenAI.

✓ AI returns structured business details added back to Google Sheets.

→ Saves manual effort and errors in researching company data.

→ Helps marketing and research teams update databases fast.


Frequently Asked Questions

It fetches HTML from each domain, cleans the content, and sends it to OpenAI to generate business insights.
A column with company domain URLs, including the protocol such as “https://” if required.
Yes, the prompt inside the OpenAI node can be edited to add or modify the list of industries.
Import the workflow, add credentials, test it, then activate. Use the Wait node to avoid API limits.

Promoted by BULDRR AI

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation Workflows in n8n

A complete beginner guide to building an AI SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free