Opening Problem Statement
Meet Sarah, a market researcher who spends hours every week gathering detailed information about potential clients and competitor companies on LinkedIn. Her manual process involves copying data from LinkedIn profiles one by one, then piecing this information together into meaningful stories for her reports. This tedious task costs her roughly 8-10 hours weekly, with frequent errors and inconsistencies creeping in due to manual copying and formatting. The delay in data processing also impacts decision-making speed at her company.
Sarah urgently needs an automated, reliable solution that can scrape both individual LinkedIn profiles and company pages, convert raw data into readable stories, and save the outputs in organized files—all without violating LinkedIn’s terms. This exact challenge is what the “LinkedIn Web Scraping with Bright Data MCP Server & Google Gemini” n8n workflow is designed to solve.
What This Automation Does
When this workflow runs, it accomplishes the following:
- Scrapes detailed LinkedIn individual person profiles using Bright Data’s Managed Proxy (MCP) Client and its LinkedIn-specific scraping tool.
- Scrapes comprehensive LinkedIn company profiles similarly, extracting structured data about companies.
- Processes and merges the scraped company data with the help of an intelligent JSON-to-story extractor node, transforming raw data into engaging company stories or blog posts.
- Leverages Google Gemini, an advanced AI language model, to further enhance the storytelling quality, making the profiles more insightful and readable.
- Saves scraped data for both individuals and companies as JSON files on disk for easy access and future use.
- Sends data to configured webhook endpoints for additional third-party handling or integration.
This automation reduces Sarah’s data collection and formatting time from hours to just minutes, eliminating errors and freeing her to focus on analysis rather than tedious data gathering.
Prerequisites ⚙️
- n8n Automation Platform account (cloud or self-hosted)
- Bright Data MCP Client API credentials (brightdata.com) for web scraping proxies and tools 🔐
- Google Gemini (PaLM) API credentials for AI language processing 🔑
- Webhook endpoint URL (for receiving scraped data)
- Basic file system access on the machine running n8n to save JSON output files 📁
Step-by-Step Guide
1. Trigger the workflow manually
Start by opening your n8n editor and click the Manual Trigger node labeled “When clicking ‘Test workflow’”. This node activates the workflow for testing.
What to do: Click the “Execute Workflow” button.
Expected outcome: The workflow begins running and moves to the next nodes.
Common mistake: Forgetting to run the workflow manually will cause no data to be processed.
2. Learn about available scraping tools
The next two MCP Client nodes called “List all available tools for Bright Data” and “List all tools for Bright Data” invoke the MCP API to retrieve which scraping tools you can use.
What to do: Check the node output to see available LinkedIn scraping tools.
Expected outcome: You will find tools named web_data_linkedin_person_profile and web_data_linkedin_company_profile available for later use.
Common mistake: Not having valid MCP API credentials configured will cause failures here.
3. Set URLs for LinkedIn person and company profiles
Two Set nodes named “Set the URLs” and “Set the LinkedIn Company URL” are configured with example LinkedIn profile URLs.
What to do: Replace the URLs with your target LinkedIn person and company profile URLs. Also verify the webhook URLs where scraped data will be sent.
Expected outcome: The nodes output JSON data with your URLs ready for scraping.
Common mistake: Entering invalid or private LinkedIn URLs can cause scraping errors or incomplete data.
4. Scrape LinkedIn person profile with Bright Data MCP Client
The node “Bright Data MCP Client For LinkedIn Person” uses the MCP API to scrape the individual profile page.
Configuration: Tool name is set to web_data_linkedin_person_profile, URL passed dynamically from the Set node.
Expected outcome: The node returns scraped content in Markdown format within JSON.
Common mistake: Misconfigured API credentials or incorrect tool parameters will cause failure at this stage.
5. Send scraped person data to webhook and save locally
The “Webhook for LinkedIn Person Web Scraper” HTTP Request node posts scraped data to the specified webhook URL. Then a Function node encodes this JSON data into a binary buffer, followed by a Read & Write File node saving the data as “d:LinkedIn-Person.json” on disk.
Expected outcome: Person profile data is sent externally and stored locally as JSON.
Common mistake: File path issues or permission errors on saving files can disrupt this step.
6. Scrape LinkedIn company profile with Bright Data MCP Client
Similarly, the “Bright Data MCP Client For LinkedIn Company” node scrapes company data using the web_data_linkedin_company_profile tool and URL from the Set node.
Expected outcome: You get raw company profile data back in JSON Markdown wrapped text.
Common mistake: Incorrect parameters or network issues may cause API call failures.
7. Parse company profile content with Code node
The “Code” node parses the JSON Markdown text from the previous node extracting the actual company profile content.
jsCode = `jsonContent = JSON.parse($input.first().json.result.content[0].text)
return jsonContent`;
Expected outcome: Node output is structured JSON of company data.
Common mistake: Invalid JSON string or wrong input indexing leads to errors here.
8. Extract detailed company story with LinkedIn Data Extractor node
This specialized node receives the parsed JSON data and generates a rich company story by converting raw data into a readable narrative using a Langchain Information Extractor node setup.
Configuration: Text field uses a prompt to write a full company story from JSON input. It requires attribute “company_story” as output.
Expected outcome: Well-formed company story JSON is produced.
Common mistake: Mismatch in expected attributes or missing input JSON.
9. Combine person and company stories with Merge & Aggregate nodes
Person and company stories/outputs are merged and aggregated together for final packaging.
Expected outcome: One combined JSON object containing both individual and company info.
Common mistake: Incorrect merge configuration causing data loss or empty outputs.
10. Post aggregated company story to webhook and save locally
The “Webhook for LinkedIn Company Web Scraper” HTTP Request node sends aggregated company data to the webhook URL. Then a corresponding Function node encodes the JSON for storage and a Read & Write File node saves as “d:LinkedIn-Company.json”.
Expected outcome: Company data successfully posted and saved.
Common mistake: Incorrect webhook URL or file permissions stopping the process.
Customizations ✏️
- Change LinkedIn URLs: In the Set the URLs and Set the LinkedIn Company URL nodes, update the
urlparameter to scrape any new LinkedIn person or company profiles. - Modify output file paths: In the Write to disk nodes, change the
fileNameto save JSON output anywhere on your system. - Switch AI model: Replace the Google Gemini Chat Model node with another AI node compatible with n8n if desired, adjusting the prompt accordingly.
- Add more data fields: Enhance the MCP client tool parameters to scrape additional LinkedIn profile details if supported.
- Webhook integration: Change the webhook URLs in Set nodes to send data to your own API endpoints or services for further automation.
Troubleshooting 🔧
Problem: “MCP Client API authentication failed”
Cause: Invalid or expired API credentials for Bright Data MCP Client.
Solution: Go to Credentials → MCP Client API in n8n and update with valid access token from Bright Data dashboard.
Problem: “JSON.parse error in Code node”
Cause: Input JSON string malformed or unexpected structure from the MCP Client node.
Solution: Inspect the MCP node output, verify the content is in expected Markdown JSON string. Adjust the code in “Code” node to handle changes if the API response changed.
Problem: “File write permission denied”
Cause: Workflow user running n8n lacks write permissions to the specified disk folder.
Solution: Change file path to a directory where you have access or update OS permissions for n8n service.
Pre-Production Checklist ✅
- Verify MCP Client API credentials are correct and active 🔐
- Confirm Google Gemini API key is valid and has quota available 🔑
- Test webhook endpoints to ensure they receive HTTP POST requests properly
- Run the workflow in manual mode and observe each node’s output for errors
- Verify local disk path accessibility for file writing
Deployment Guide
Once testing confirms smooth operation, activate the workflow in n8n by setting it active from the editor top right.
Schedule the workflow using n8n’s cron features or trigger it via your systems to run at regular intervals as needed.
Enable error notifications in n8n to monitor for any failure during scrape runs, ensuring timely troubleshooting.
FAQs
- Can I scrape LinkedIn profiles without Bright Data MCP Client? No, LinkedIn data scraping is sensitive and often blocked by LinkedIn. Bright Data’s MCP Client offers proxy rotation and legal compliance needed.
- Does this consume a lot of API credits? Yes, scraping LinkedIn pages usually consumes MCP API calls which could incur costs based on volume.
- Is the data secure? All credentials and data are managed within your private n8n instance. Use HTTPS webhook URLs for secure transmissions.
Conclusion
After building this workflow, you’ve automated the complex task of scraping detailed LinkedIn person and company profiles, processing the data into engaging stories, and archiving it systematically.
This automation saves users like Sarah around 8-10 hours weekly, reducing errors and speeding data availability for strategic decisions.
Next steps could include extending this workflow to scrape other platforms, incorporate sentiment analysis on company reviews, or automate outreach via email based on scraped data.
Give it a try and see how much manual effort you can reclaim!