Autonomous AI Social Media Crawler with n8n and LangChain

Discover how to automatically crawl company websites to extract social media profile links using n8n, LangChain AI agent, and Supabase. This workflow saves hours of manual research and organizes data systematically.
agent
supabase
lmChatOpenAi
+10
Workflow Identifier: 2064
NODES in Use: manualTrigger, supabase, set, agent, lmChatOpenAi, outputParserStructured, httpRequest, html, splitOut, removeDuplicates, aggregate, markdown, merge

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

What This Workflow Does

This workflow takes a list of company websites.

It finds all social media links from those websites automatically.

The output is a clean list of social profile URLs per company.

This saves a lot of time compared to searching websites manually.

It helps marketing teams get accurate social data fast.

The workflow starts by fetching companies from a Supabase database.

Then an AI agent crawls every site to extract text and links.

The AI picks out social media URLs from all collected links.

Results are organized in JSON format.

The workflow stores the data back into a Supabase table.

This process runs with little human input.


Tools and Services Used

  • n8n Automation Platform: Runs the workflow and nodes.
  • Supabase: Stores company input data and output social links.
  • OpenAI GPT-4 API: Provides AI for crawling and link extraction.
  • LangChain AI Agent: Runs the autonomous crawler using retrieval tools.

Inputs, Processing Steps, and Outputs

Inputs

  • Company names and website URLs from the Supabase companies_input table.

Processing Steps

  • Retrieve all company records with their websites.
  • Use Set node to keep only company name and website.
  • Run LangChain AI agent to crawl each website.
  • The AI uses a text retrieval tool to get readable content.
  • The AI uses a URL retrieval tool to find and clean all internal links.
  • AI identifies which URLs are social media profiles.
  • Parse the AI JSON output into structured data via JSON Parser node.
  • Combine company info and extracted social URLs into one object.
  • Insert the structured data back into Supabase companies_output table.

Outputs

  • Stored structured JSON with company names, websites, and their social media profile URLs.

Beginner Step-by-Step: How to Use This Workflow in n8n Production

Step 1: Download and Import the Workflow

  1. Click the Download button on this page to save the workflow file.
  2. Open the n8n editor.
  3. Use Import from File to upload the downloaded workflow.

Step 2: Configure Credentials and Settings

  1. Add your OpenAI API Key in n8n credentials.
  2. Set Supabase credentials with your project keys.
  3. Update table names or fields if needed to match your database schema.
  4. Check prompt text or URLs inside the LangChain AI Agent node and adjust if necessary.

Step 3: Test the Workflow

  1. Run the workflow using the Manual Trigger node.
  2. Check execution logs for any errors.
  3. Verify results appear in your Supabase output table.

Step 4: Activate Workflow for Production

  1. Change the Manual Trigger to a scheduled trigger for automatic runs.
  2. Monitor execution logs regularly.
  3. Consider running self-host n8n for better security and scalability.

Customization Ideas

  • Change AI prompt in the crawler node to extract emails or phone numbers instead of social media links.
  • Add proxy settings in HTTP request nodes for sites blocking direct access.
  • Increase crawl depth by modifying embedded tool logic for deeper navigation.
  • Switch Supabase nodes to other databases like MySQL if preferred.
  • Extend JSON parser to extract metadata like social profile descriptions.

Common Problems and How to Fix Them

  • Supabase returns empty data or auth errors.

    Check credentials and regenerate API keys as needed.
  • AI fails to find social URLs or returns partial output.

    Loosen prompt constraints or add proxy to bypass site blocks.
  • HTTP Request nodes time out or fail.

    Increase timeouts, add retries or use proxies.
  • Workflow stops unexpectedly.

    Monitor logs to find node errors and fix specifics.

Pre-Production Checklist

  • Ensure Supabase tables companies_input and companies_output exist.
  • API keys for OpenAI are active and usable.
  • Test HTTP requests independently to confirm website access.
  • Try manual prompts in OpenAI playground to verify AI prompt quality.
  • Run tests with a small sample of companies.

Summary of Benefits and Outcome

✓ Saves up to 20 hours weekly in manual website social media link collection.

✓ Consistent, accurate extraction of social media profile URLs.

✓ Easy storage and querying of unified data in Supabase.

✓ Automates tedious tasks with minimal human intervention.

✓ Flexible prompts allow extraction of other contact info as needed.


Frequently Asked Questions

The user must add company names and website URLs into the Supabase table named companies_input.
The user should loosen the AI prompt constraints or add proxy settings in HTTP requests to bypass any blocks.
Yes. By editing the AI prompt and JSON parser schema, the user can extract emails, phone numbers, or other contact data.
The user changes the manual trigger node to a scheduled trigger and activates the workflow to run automatically.

Promoted by BULDRR AI

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation Workflows in n8n

A complete beginner guide to building an AI SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free