What this workflow does
This workflow fetches company info from a database, crawls company websites using AI, extracts social media links, and saves the results back to the database.
It cuts down manual work by gathering social media profiles automatically and fast for many companies.
The main goal is to stop you wasting hours clicking and copying URLs and instead get a correct list of social media links ready for analysis.
How the workflow works: Inputs, Process, and Output
Inputs
The workflow starts by getting company names and website URLs from a Supabase table called companies_input.
Processing steps
- Get company data: The Supabase Get All node fetches all rows from the input table.
- Focus fields: A Set node keeps only
nameandwebsiteto make processing clearer. - AI crawl: The LangChain AI agent (Crawl website) uses GPT-4o to read the company’s website.
- Get page text: The Text tool workflow requests the website HTML and converts it to markdown for easy text processing by AI.
- Extract links: The URLs tool workflow grabs all
<a>tag hrefs, cleans duplicates and bad URLs. - Parse AI output: The AI returns a JSON listing social media platform names and URLs; the LangChain JSON Parser checks this format and outputs an array.
- Combine data: The extracted social media array is merged with original company info in a Merge node.
- Save result: A Supabase Insert node writes the social media profiles into the companies_output table for each company.
Output
The output is saved rows in the companies_output Supabase table showing company names alongside found social media profile URLs.
Who should use this workflow
This workflow is good for anyone needing social media data from many company websites quickly.
It helps marketing analysts, researchers, or anyone tired of clicking links and copying from every webpage manually.
No deep coding needed; if you can use n8n to run workflows and set API keys, it can save you many hours.
Tools and services used
- n8n: Automates tasks in visual workflows.
- Supabase: Stores input company data and output results.
- OpenAI GPT-4o API: Powers the AI web crawler agent.
- HTTP Request nodes: Fetch website content and HTML.
- HTML Extraction and Markdown nodes: Get links and clean text before AI processing.
- LangChain JSON Parser: Ensures AI output matches expected JSON format.
Beginner step-by-step: How to use this workflow in n8n production
1. Import the workflow
- Download the workflow file using the Download button on this page.
- In n8n editor, choose Import from File and select the downloaded workflow.
2. Configure credentials
- Set your OpenAI API Key in the appropriate credential node.
- Configure Supabase API Key and URL credentials to connect your database.
- If needed, update table names or database schema field names to match your setup.
3. Check prompts and URLs
- Review the LangChain AI agent (Crawl website) node prompt for social media extraction.
- Adjust the prompt text if needed. Use the copy block below to update easily:
Extract social media profile URLs like Facebook, Twitter, LinkedIn, Instagram from this website content and links. Return a JSON array listing platform names and URLs only.
4. Test the workflow
- Manually trigger the workflow using the Manual Trigger node.
- Check Supabase companies_output table to see if social media links got saved.
5. Activate for production
- After confirm tests succeed, toggle the workflow active.
- Set a schedule trigger or API trigger if you want periodic or event-driven runs.
If running self hosting n8n, refer to self-host n8n for best practices.
Common mistakes and edge cases
- Forgetting the URL protocol (http/https) may cause failed HTTP requests.
- Wrong or missing API keys cause errors in Supabase or OpenAI nodes.
- The AI agent might respond with invalid JSON if the prompt or JSON schema does not match the output.
- Websites blocking robots or scrapers cause HTTP 403 or timeouts. Use proxy settings or user-agent headers here.
Customization ideas
- Change AI prompt to extract emails, phone numbers, or company descriptions instead of social media links.
- Replace Supabase nodes with Airtable, Google Sheets, or MySQL if preferred database services.
- Enable proxy support in HTTP Request nodes to bypass website restrictions.
- Make the crawler follow multiple pages inside the same domain for more thorough data.
Summary and outcome
✓ Quickly get social media profiles from many company websites without manual clicking.
✓ Save complete and clean data back to your database automatically.
✓ Save hours of tedious manual work each week.
→ Have accurate social media datasets ready for marketing or analysis.
→ Easily build on this workflow for other web data extraction needs.
