Automate LinkedIn Data Scraping with Bright Data & Google Gemini

This workflow automates scraping detailed LinkedIn person and company profiles using Bright Data’s MCP Server combined with AI processing via Google Gemini. It solves the manual and error-prone process of LinkedIn data extraction by efficiently gathering, structuring, and saving profile data.
manualTrigger
mcp.mcpClient
lmChatGoogleGemini
+9
Workflow Identifier: 1719
NODES in Use: manualTrigger, stickyNote, set, mcpClient, httpRequest, code, merge, aggregate, function, readWriteFile, informationExtractor, lmChatGoogleGemini

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

Opening Problem Statement

Meet Sarah, a market researcher who spends hours every week gathering detailed information about potential clients and competitor companies on LinkedIn. Her manual process involves copying data from LinkedIn profiles one by one, then piecing this information together into meaningful stories for her reports. This tedious task costs her roughly 8-10 hours weekly, with frequent errors and inconsistencies creeping in due to manual copying and formatting. The delay in data processing also impacts decision-making speed at her company.

Sarah urgently needs an automated, reliable solution that can scrape both individual LinkedIn profiles and company pages, convert raw data into readable stories, and save the outputs in organized files—all without violating LinkedIn’s terms. This exact challenge is what the “LinkedIn Web Scraping with Bright Data MCP Server & Google Gemini” n8n workflow is designed to solve.

What This Automation Does

When this workflow runs, it accomplishes the following:

  • Scrapes detailed LinkedIn individual person profiles using Bright Data’s Managed Proxy (MCP) Client and its LinkedIn-specific scraping tool.
  • Scrapes comprehensive LinkedIn company profiles similarly, extracting structured data about companies.
  • Processes and merges the scraped company data with the help of an intelligent JSON-to-story extractor node, transforming raw data into engaging company stories or blog posts.
  • Leverages Google Gemini, an advanced AI language model, to further enhance the storytelling quality, making the profiles more insightful and readable.
  • Saves scraped data for both individuals and companies as JSON files on disk for easy access and future use.
  • Sends data to configured webhook endpoints for additional third-party handling or integration.

This automation reduces Sarah’s data collection and formatting time from hours to just minutes, eliminating errors and freeing her to focus on analysis rather than tedious data gathering.

Prerequisites ⚙️

  • n8n Automation Platform account (cloud or self-hosted)
  • Bright Data MCP Client API credentials (brightdata.com) for web scraping proxies and tools 🔐
  • Google Gemini (PaLM) API credentials for AI language processing 🔑
  • Webhook endpoint URL (for receiving scraped data)
  • Basic file system access on the machine running n8n to save JSON output files 📁

Step-by-Step Guide

1. Trigger the workflow manually

Start by opening your n8n editor and click the Manual Trigger node labeled “When clicking ‘Test workflow’”. This node activates the workflow for testing.

What to do: Click the “Execute Workflow” button.

Expected outcome: The workflow begins running and moves to the next nodes.

Common mistake: Forgetting to run the workflow manually will cause no data to be processed.

2. Learn about available scraping tools

The next two MCP Client nodes called “List all available tools for Bright Data” and “List all tools for Bright Data” invoke the MCP API to retrieve which scraping tools you can use.

What to do: Check the node output to see available LinkedIn scraping tools.

Expected outcome: You will find tools named web_data_linkedin_person_profile and web_data_linkedin_company_profile available for later use.

Common mistake: Not having valid MCP API credentials configured will cause failures here.

3. Set URLs for LinkedIn person and company profiles

Two Set nodes named “Set the URLs” and “Set the LinkedIn Company URL” are configured with example LinkedIn profile URLs.

What to do: Replace the URLs with your target LinkedIn person and company profile URLs. Also verify the webhook URLs where scraped data will be sent.

Expected outcome: The nodes output JSON data with your URLs ready for scraping.

Common mistake: Entering invalid or private LinkedIn URLs can cause scraping errors or incomplete data.

4. Scrape LinkedIn person profile with Bright Data MCP Client

The node “Bright Data MCP Client For LinkedIn Person” uses the MCP API to scrape the individual profile page.

Configuration: Tool name is set to web_data_linkedin_person_profile, URL passed dynamically from the Set node.

Expected outcome: The node returns scraped content in Markdown format within JSON.

Common mistake: Misconfigured API credentials or incorrect tool parameters will cause failure at this stage.

5. Send scraped person data to webhook and save locally

The “Webhook for LinkedIn Person Web Scraper” HTTP Request node posts scraped data to the specified webhook URL. Then a Function node encodes this JSON data into a binary buffer, followed by a Read & Write File node saving the data as “d:LinkedIn-Person.json” on disk.

Expected outcome: Person profile data is sent externally and stored locally as JSON.

Common mistake: File path issues or permission errors on saving files can disrupt this step.

6. Scrape LinkedIn company profile with Bright Data MCP Client

Similarly, the “Bright Data MCP Client For LinkedIn Company” node scrapes company data using the web_data_linkedin_company_profile tool and URL from the Set node.

Expected outcome: You get raw company profile data back in JSON Markdown wrapped text.

Common mistake: Incorrect parameters or network issues may cause API call failures.

7. Parse company profile content with Code node

The “Code” node parses the JSON Markdown text from the previous node extracting the actual company profile content.

jsCode = `jsonContent = JSON.parse($input.first().json.result.content[0].text)
return jsonContent`;

Expected outcome: Node output is structured JSON of company data.

Common mistake: Invalid JSON string or wrong input indexing leads to errors here.

8. Extract detailed company story with LinkedIn Data Extractor node

This specialized node receives the parsed JSON data and generates a rich company story by converting raw data into a readable narrative using a Langchain Information Extractor node setup.

Configuration: Text field uses a prompt to write a full company story from JSON input. It requires attribute “company_story” as output.

Expected outcome: Well-formed company story JSON is produced.

Common mistake: Mismatch in expected attributes or missing input JSON.

9. Combine person and company stories with Merge & Aggregate nodes

Person and company stories/outputs are merged and aggregated together for final packaging.

Expected outcome: One combined JSON object containing both individual and company info.

Common mistake: Incorrect merge configuration causing data loss or empty outputs.

10. Post aggregated company story to webhook and save locally

The “Webhook for LinkedIn Company Web Scraper” HTTP Request node sends aggregated company data to the webhook URL. Then a corresponding Function node encodes the JSON for storage and a Read & Write File node saves as “d:LinkedIn-Company.json”.

Expected outcome: Company data successfully posted and saved.

Common mistake: Incorrect webhook URL or file permissions stopping the process.

Customizations ✏️

  • Change LinkedIn URLs: In the Set the URLs and Set the LinkedIn Company URL nodes, update the url parameter to scrape any new LinkedIn person or company profiles.
  • Modify output file paths: In the Write to disk nodes, change the fileName to save JSON output anywhere on your system.
  • Switch AI model: Replace the Google Gemini Chat Model node with another AI node compatible with n8n if desired, adjusting the prompt accordingly.
  • Add more data fields: Enhance the MCP client tool parameters to scrape additional LinkedIn profile details if supported.
  • Webhook integration: Change the webhook URLs in Set nodes to send data to your own API endpoints or services for further automation.

Troubleshooting 🔧

Problem: “MCP Client API authentication failed”

Cause: Invalid or expired API credentials for Bright Data MCP Client.

Solution: Go to Credentials → MCP Client API in n8n and update with valid access token from Bright Data dashboard.

Problem: “JSON.parse error in Code node”

Cause: Input JSON string malformed or unexpected structure from the MCP Client node.

Solution: Inspect the MCP node output, verify the content is in expected Markdown JSON string. Adjust the code in “Code” node to handle changes if the API response changed.

Problem: “File write permission denied”

Cause: Workflow user running n8n lacks write permissions to the specified disk folder.

Solution: Change file path to a directory where you have access or update OS permissions for n8n service.

Pre-Production Checklist ✅

  • Verify MCP Client API credentials are correct and active 🔐
  • Confirm Google Gemini API key is valid and has quota available 🔑
  • Test webhook endpoints to ensure they receive HTTP POST requests properly
  • Run the workflow in manual mode and observe each node’s output for errors
  • Verify local disk path accessibility for file writing

Deployment Guide

Once testing confirms smooth operation, activate the workflow in n8n by setting it active from the editor top right.

Schedule the workflow using n8n’s cron features or trigger it via your systems to run at regular intervals as needed.

Enable error notifications in n8n to monitor for any failure during scrape runs, ensuring timely troubleshooting.

FAQs

  • Can I scrape LinkedIn profiles without Bright Data MCP Client? No, LinkedIn data scraping is sensitive and often blocked by LinkedIn. Bright Data’s MCP Client offers proxy rotation and legal compliance needed.
  • Does this consume a lot of API credits? Yes, scraping LinkedIn pages usually consumes MCP API calls which could incur costs based on volume.
  • Is the data secure? All credentials and data are managed within your private n8n instance. Use HTTPS webhook URLs for secure transmissions.

Conclusion

After building this workflow, you’ve automated the complex task of scraping detailed LinkedIn person and company profiles, processing the data into engaging stories, and archiving it systematically.

This automation saves users like Sarah around 8-10 hours weekly, reducing errors and speeding data availability for strategic decisions.

Next steps could include extending this workflow to scrape other platforms, incorporate sentiment analysis on company reviews, or automate outreach via email based on scraped data.

Give it a try and see how much manual effort you can reclaim!

Promoted by BULDRR AI

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation in n8n (Beginner Guide)

A complete beginner guide to building an AI-powered SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free