Extract Social Media Links with n8n and OpenAI AI Crawler

This powerful n8n workflow automates the extraction of social media profile links from company websites using AI-driven crawling and URL scraping. It solves the tedious manual task of gathering social media data, saving hours and improving accuracy.
toolWorkflow
lmChatOpenAi
outputParserStructured
+12
Workflow Identifier: 2165
NODES in Use: toolWorkflow, lmChatOpenAi, outputParserStructured, set, manualTrigger, supabase, httpRequest, html, splitOut, removeDuplicates, filter, aggregate, markdown, merge, stickyNote

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

What this workflow does

This workflow fetches company info from a database, crawls company websites using AI, extracts social media links, and saves the results back to the database.

It cuts down manual work by gathering social media profiles automatically and fast for many companies.

The main goal is to stop you wasting hours clicking and copying URLs and instead get a correct list of social media links ready for analysis.


How the workflow works: Inputs, Process, and Output

Inputs

The workflow starts by getting company names and website URLs from a Supabase table called companies_input.

Processing steps

  • Get company data: The Supabase Get All node fetches all rows from the input table.
  • Focus fields: A Set node keeps only name and website to make processing clearer.
  • AI crawl: The LangChain AI agent (Crawl website) uses GPT-4o to read the company’s website.
  • Get page text: The Text tool workflow requests the website HTML and converts it to markdown for easy text processing by AI.
  • Extract links: The URLs tool workflow grabs all <a> tag hrefs, cleans duplicates and bad URLs.
  • Parse AI output: The AI returns a JSON listing social media platform names and URLs; the LangChain JSON Parser checks this format and outputs an array.
  • Combine data: The extracted social media array is merged with original company info in a Merge node.
  • Save result: A Supabase Insert node writes the social media profiles into the companies_output table for each company.

Output

The output is saved rows in the companies_output Supabase table showing company names alongside found social media profile URLs.


Who should use this workflow

This workflow is good for anyone needing social media data from many company websites quickly.

It helps marketing analysts, researchers, or anyone tired of clicking links and copying from every webpage manually.

No deep coding needed; if you can use n8n to run workflows and set API keys, it can save you many hours.


Tools and services used

  • n8n: Automates tasks in visual workflows.
  • Supabase: Stores input company data and output results.
  • OpenAI GPT-4o API: Powers the AI web crawler agent.
  • HTTP Request nodes: Fetch website content and HTML.
  • HTML Extraction and Markdown nodes: Get links and clean text before AI processing.
  • LangChain JSON Parser: Ensures AI output matches expected JSON format.

Beginner step-by-step: How to use this workflow in n8n production

1. Import the workflow

  1. Download the workflow file using the Download button on this page.
  2. In n8n editor, choose Import from File and select the downloaded workflow.

2. Configure credentials

  1. Set your OpenAI API Key in the appropriate credential node.
  2. Configure Supabase API Key and URL credentials to connect your database.
  3. If needed, update table names or database schema field names to match your setup.

3. Check prompts and URLs

  1. Review the LangChain AI agent (Crawl website) node prompt for social media extraction.
  2. Adjust the prompt text if needed. Use the copy block below to update easily:

Extract social media profile URLs like Facebook, Twitter, LinkedIn, Instagram from this website content and links. Return a JSON array listing platform names and URLs only.

4. Test the workflow

  1. Manually trigger the workflow using the Manual Trigger node.
  2. Check Supabase companies_output table to see if social media links got saved.

5. Activate for production

  1. After confirm tests succeed, toggle the workflow active.
  2. Set a schedule trigger or API trigger if you want periodic or event-driven runs.

If running self hosting n8n, refer to self-host n8n for best practices.


Common mistakes and edge cases

  • Forgetting the URL protocol (http/https) may cause failed HTTP requests.
  • Wrong or missing API keys cause errors in Supabase or OpenAI nodes.
  • The AI agent might respond with invalid JSON if the prompt or JSON schema does not match the output.
  • Websites blocking robots or scrapers cause HTTP 403 or timeouts. Use proxy settings or user-agent headers here.

Customization ideas

  • Change AI prompt to extract emails, phone numbers, or company descriptions instead of social media links.
  • Replace Supabase nodes with Airtable, Google Sheets, or MySQL if preferred database services.
  • Enable proxy support in HTTP Request nodes to bypass website restrictions.
  • Make the crawler follow multiple pages inside the same domain for more thorough data.

Summary and outcome

✓ Quickly get social media profiles from many company websites without manual clicking.

✓ Save complete and clean data back to your database automatically.

✓ Save hours of tedious manual work each week.

→ Have accurate social media datasets ready for marketing or analysis.

→ Easily build on this workflow for other web data extraction needs.

Frequently Asked Questions

Download the workflow file and open n8n editor. Use Import from File to load the workflow. Then add OpenAI and Supabase API Keys and update any database table names if needed.
Check that the JSON schema in LangChain JSON Parser matches the AI output format exactly. Simplify or correct the AI prompt to get well-formed JSON.
403 errors happen when websites block automated requests. Adding user-agent headers or using proxy settings in HTTP Request nodes usually fixes this.
Yes, with proper API rate limits and database handling, this workflow can batch process large company lists efficiently.

Promoted by BULDRR AI

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation Workflows in n8n

A complete beginner guide to building an AI SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free