Extract Social Media Links with n8n and OpenAI AI Crawler

This powerful n8n workflow automates the extraction of social media profile links from company websites using AI-driven crawling and URL scraping. It solves the tedious manual task of gathering social media data, saving hours and improving accuracy.
toolWorkflow
lmChatOpenAi
outputParserStructured
+12
Workflow Identifier: 2165
NODES in Use: toolWorkflow, lmChatOpenAi, outputParserStructured, set, manualTrigger, supabase, httpRequest, html, splitOut, removeDuplicates, filter, aggregate, markdown, merge, stickyNote
Extract social media links with n8n and OpenAI

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

What this workflow does

This workflow fetches company info from a database, crawls company websites using AI, extracts social media links, and saves the results back to the database.

It cuts down manual work by gathering social media profiles automatically and fast for many companies.

The main goal is to stop you wasting hours clicking and copying URLs and instead get a correct list of social media links ready for analysis.


How the workflow works: Inputs, Process, and Output

Inputs

The workflow starts by getting company names and website URLs from a Supabase table called companies_input.

Processing steps

  • Get company data: The Supabase Get All node fetches all rows from the input table.
  • Focus fields: A Set node keeps only name and website to make processing clearer.
  • AI crawl: The LangChain AI agent (Crawl website) uses GPT-4o to read the company’s website.
  • Get page text: The Text tool workflow requests the website HTML and converts it to markdown for easy text processing by AI.
  • Extract links: The URLs tool workflow grabs all <a> tag hrefs, cleans duplicates and bad URLs.
  • Parse AI output: The AI returns a JSON listing social media platform names and URLs; the LangChain JSON Parser checks this format and outputs an array.
  • Combine data: The extracted social media array is merged with original company info in a Merge node.
  • Save result: A Supabase Insert node writes the social media profiles into the companies_output table for each company.

Output

The output is saved rows in the companies_output Supabase table showing company names alongside found social media profile URLs.


Who should use this workflow

This workflow is good for anyone needing social media data from many company websites quickly.

It helps marketing analysts, researchers, or anyone tired of clicking links and copying from every webpage manually.

No deep coding needed; if you can use n8n to run workflows and set API keys, it can save you many hours.


Tools and services used

  • n8n: Automates tasks in visual workflows.
  • Supabase: Stores input company data and output results.
  • OpenAI GPT-4o API: Powers the AI web crawler agent.
  • HTTP Request nodes: Fetch website content and HTML.
  • HTML Extraction and Markdown nodes: Get links and clean text before AI processing.
  • LangChain JSON Parser: Ensures AI output matches expected JSON format.

Beginner step-by-step: How to use this workflow in n8n production

1. Import the workflow

  1. Download the workflow file using the Download button on this page.
  2. In n8n editor, choose Import from File and select the downloaded workflow.

2. Configure credentials

  1. Set your OpenAI API Key in the appropriate credential node.
  2. Configure Supabase API Key and URL credentials to connect your database.
  3. If needed, update table names or database schema field names to match your setup.

3. Check prompts and URLs

  1. Review the LangChain AI agent (Crawl website) node prompt for social media extraction.
  2. Adjust the prompt text if needed. Use the copy block below to update easily:

Extract social media profile URLs like Facebook, Twitter, LinkedIn, Instagram from this website content and links. Return a JSON array listing platform names and URLs only.

4. Test the workflow

  1. Manually trigger the workflow using the Manual Trigger node.
  2. Check Supabase companies_output table to see if social media links got saved.

5. Activate for production

  1. After confirm tests succeed, toggle the workflow active.
  2. Set a schedule trigger or API trigger if you want periodic or event-driven runs.

If running self hosting n8n, refer to self-host n8n for best practices.


Common mistakes and edge cases

  • Forgetting the URL protocol (http/https) may cause failed HTTP requests.
  • Wrong or missing API keys cause errors in Supabase or OpenAI nodes.
  • The AI agent might respond with invalid JSON if the prompt or JSON schema does not match the output.
  • Websites blocking robots or scrapers cause HTTP 403 or timeouts. Use proxy settings or user-agent headers here.

Customization ideas

  • Change AI prompt to extract emails, phone numbers, or company descriptions instead of social media links.
  • Replace Supabase nodes with Airtable, Google Sheets, or MySQL if preferred database services.
  • Enable proxy support in HTTP Request nodes to bypass website restrictions.
  • Make the crawler follow multiple pages inside the same domain for more thorough data.

Summary and outcome

✓ Quickly get social media profiles from many company websites without manual clicking.

✓ Save complete and clean data back to your database automatically.

✓ Save hours of tedious manual work each week.

→ Have accurate social media datasets ready for marketing or analysis.

→ Easily build on this workflow for other web data extraction needs.

Extract social media links with n8n and OpenAI

Visit through Desktop to Interact with the Workflow.

Frequently Asked Questions

Download the workflow file and open n8n editor. Use Import from File to load the workflow. Then add OpenAI and Supabase API Keys and update any database table names if needed.
Check that the JSON schema in LangChain JSON Parser matches the AI output format exactly. Simplify or correct the AI prompt to get well-formed JSON.
403 errors happen when websites block automated requests. Adding user-agent headers or using proxy settings in HTTP Request nodes usually fixes this.
Yes, with proper API rate limits and database handling, this workflow can batch process large company lists efficiently.

Promoted by BULDRR AI

Related Workflows

Automate Twist Channel Creation and Messaging with n8n

This workflow automates creating and updating a channel in Twist and sending a personalized message to specific users. It eliminates manual setup errors and saves time managing Twist communications.

Automate Ideogram Image Generation with Google Sheets & Gmail

This workflow automates graphic design image generation via Ideogram AI, storing image data in Google Sheets and Google Drive, with email alerts via Gmail. It saves designers hours by automating image creation, remixing, review, and record-keeping.

Automate IT Support with Slack and OpenAI in n8n

Streamline IT support by automating Slack message handling using n8n and OpenAI. This workflow handles Slack DMs, filters bots, queries a Confluence knowledge base, and delivers AI-generated responses, improving support efficiency and response time.

Automate Crypto Analysis with CoinMarketCap & n8n AI Agent

Discover how this unique n8n workflow leverages CoinMarketCap’s multi-agent AI to deliver precise, real-time cryptocurrency insights directly via Telegram. Manage crypto data analysis efficiently with automated multi-source API integration.

Automate Gumroad to Beehiiv Subscriber Sync with n8n

Learn how to automatically add new Gumroad sales customers as Beehiiv newsletter subscribers using n8n automation. This workflow saves time by syncing sales data to Google Sheets CRM and notifying your Telegram channel instantly.

Generate On-Brand Blog Articles Using n8n and OpenAI

This workflow automates the creation of on-brand blog articles by analyzing existing company content using n8n and OpenAI. It extracts article structures and brand voice to produce consistent draft articles, saving significant content creation time.
1:1 Free Strategy Session
Your competitors are already automating. Are you still paying for it manually?

Do you want to adopt AI Automation?

Every hour your team does repetitive work, you're burning real money.
While you wait, faster businesses are cutting costs and moving quicker.
AI and automations aren't the future anymore — they're the present.

Book a live 1-on-1 session where we show you exactly which of your daily tasks can be automated — and what it’s costing you not to.