Opening Problem Statement
Meet Sarah, a content marketer at a busy HR tech firm. Every day, she needs to create engaging and accurate stories about companies on LinkedIn to help her team with recruitment marketing. However, manually scraping LinkedIn data, filtering errors, and crafting concise company stories takes hours and often leads to inconsistent results. Time wasted accumulates, opportunities are missed, and the marketing team’s output suffers.
This is a very specific problem because LinkedIn data is dynamically updated and scraping it requires legal and technical steps, and the storytelling process demands AI capabilities to convert raw data into engaging narratives. Sarah needs a tailored solution that automates the entire workflow from data extraction to intelligent summary generation without manual intervention.
What This Automation Does
This unique n8n workflow tackles Sarah’s challenge by integrating Bright Data’s web scraping API and Google Gemini AI models to craft company stories effortlessly. When run, this workflow:
- Triggers a LinkedIn company data scrape using Bright Data’s snapshot API based on a configured company URL.
- Monitors the scraping progress and waits for completion automatically, reducing manual polling.
- Downloads the scraped JSON snapshot of company data once ready, ensuring fresh, indexed data delivery.
- Uses n8n’s LangChain nodes to intelligently extract structured information from the raw LinkedIn JSON data.
- Employs Google Gemini’s advanced AI models to convert extracted data into a comprehensive company story.
- Generates a concise summary from the detailed story using advanced summarization chains, improving readability.
- Automatically sends the detailed story and summary to configured webhook endpoints for downstream applications or notifications.
By automating these steps, the workflow saves Sarah and her team several hours each week, eliminates scraping errors through automated checks, and ensures company stories are consistently high-quality and ready for use.
Prerequisites ⚙️
- n8n account (cloud or self-hosted) 🔌
- Bright Data API account with access to datasets API for LinkedIn scraping 🔑
- Google PaLM (Google Gemini) API credentials for access to Gemini chat models 🔑
- Webhook URL for receiving story and summary notifications (e.g., webhook.site) 📡
- Basic knowledge of LinkedIn company URLs to customize the scraper input 🌐
Step-by-Step Guide
1. Start with the Manual Trigger Node
Navigate in n8n Editor to the ManualTrigger node labeled “When clicking ‘Test workflow’”. This node manually starts the workflow for testing and development. No parameters are needed here—simply click “Execute Workflow” in the editor to begin the process.
Expected outcome: Workflow starts and passes control to the next node to set the LinkedIn URL.
Common mistake: Forgetting to click ‘Execute Workflow’ means nothing starts.
2. Set the LinkedIn Company URL
Next, the Set LinkedIn URL node assigns the target LinkedIn company page URL to scrape. Navigate: Click the node named Set LinkedIn URL. Enter a field called url with a value like https://il.linkedin.com/company/bright-data.
This URL directly affects what company data is pulled from LinkedIn.
Expected outcome: The URL is stored in the workflow’s JSON payload for subsequent requests.
Common mistake: Using an incorrect or private LinkedIn URL causes failed scraping.
3. Trigger LinkedIn Data Scraping with Bright Data
The Perform LinkedIn Web Request node sends a POST request to Bright Data’s dataset trigger endpoint to start scraping.
- URL:
https://api.brightdata.com/datasets/v3/trigger - Method: POST
- Body: JSON array including the LinkedIn URL field from previous step.
- Query parameters:
dataset_id=gd_l1vikfnt1wgvvqz95w(specific Bright Data dataset for LinkedIn company data) andinclude_errors=true. - Authentication: Header Auth with your Bright Data API key.
Expected outcome: A snapshot ID is returned indicating the scraping job has started.
Common mistake: Incorrect dataset ID or invalid credentials causing HTTP 401/403 errors.
4. Store the Snapshot ID
The Set Snapshot Id node captures the snapshot ID from the previous response and assigns it as snapshot_id for future API calls.
Expected outcome: Snapshot ID is stored in workflow context for polling.
Common mistake: Not mapping the snapshot ID properly will cause all next steps to fail.
5. Poll the Scraping Job Status
The Check Snapshot Status node performs GET requests on Bright Data’s progress API endpoint https://api.brightdata.com/datasets/v3/progress/{{ $json.snapshot_id }}.
If the status is not ready, the workflow loops into the Wait for 30 seconds node to pause execution before rechecking.
Expected outcome: Automatic wait/retry ensures the workflow only proceeds after data is ready.
Common mistake: Not configuring the condition to detect “ready” status leads to infinite or premature requests.
6. Download the Finished Snapshot
Once the snapshot is marked as ready, the Download Snapshot HTTP Request node downloads the scraped data in JSON format for processing.
Expected outcome: Full LinkedIn company profile JSON is fetched for extraction.
Common mistake: Missing authorization headers results in failed data fetch.
7. Extract Structured Company Info
The LinkedIn Data Extractor LangChain Information Extractor node receives the raw JSON and is instructed—with a system prompt—to formulate a detailed company story incorporating all metadata.
Expected outcome: Structured, human-readable company story as output.
Common mistake: Poor system prompt detail or incorrect input data disrupts extraction.
8. Generate a Concise Summary
The Concise Summary Generator LangChain Summarization Chain node takes the detailed story output to generate a neat summary.
Expected outcome: A brief, readable summary is created for quick consumption.
Common mistake: Failure to properly map input/output breaks chain flow.
9. Notify via Webhook
The workflow ends with two Webhook Notifier HTTP Request nodes sending the story and summary payloads to configured endpoints like webhook.site for live monitoring or integration.
Expected outcome: External systems or users instantly receive the generated content.
Common mistake: Forgetting to configure your webhook URL will lose the notification output.
Customizations ✏️
- Change LinkedIn URL dynamically: In the Set LinkedIn URL node, replace the hardcoded URL with an incoming webhook or environment variable to automate different companies.
- Adjust wait time: In the Wait for 30 seconds node, modify the delay duration to suit dataset size or API rate limits.
- Enhance story tone: Modify systemPromptTemplate in the LinkedIn Data Extractor node to make stories more formal, casual, or creative.
- Send notifications to Slack or email: Replace webhook notifiers with Slack or Gmail nodes to directly inform team members.
- Use different Google Gemini models: Experiment with other Gemini model names in the Google Gemini Chat Model nodes to leverage various AI capabilities.
Troubleshooting 🔧
Problem: “401 Unauthorized or 403 Forbidden Errors from Bright Data API”
Cause: Invalid or expired API key or incorrect header authentication setup.
Solution: Re-check Bright Data API credentials in n8n under HTTP Request node authentication. Ensure header keys are correct and active.
Problem: “Snapshot status never changes to ‘ready'”
Cause: Dataset processing delay or wrong snapshot ID mapping.
Solution: Verify correct snapshot ID mapping in Set Snapshot Id. Increase wait time in wait node. Check Bright Data API status online.
Problem: “AI model returns incomplete or irrelevant story”
Cause: Poor prompt design or incomplete input data.
Solution: Refine system prompt in LinkedIn Data Extractor. Confirm JSON input is complete and correctly mapped.
Pre-Production Checklist ✅
- Confirm Bright Data and Google Gemini API credentials are valid and active.
- Test the LinkedIn company URL is publicly accessible and correct.
- Ensure snapshot ID is extracted and passed properly.
- Verify correct conditions in If nodes for status and error handling.
- Test the entire workflow manually and watch logs for errors.
- Backup n8n workflow and credentials securely.
Deployment Guide
Once fully tested, activate the workflow in n8n’s editor by enabling it from the top right toggle. Schedule runs via cron if automatic periodic refreshes are needed.
Monitor workflow executions and errors via the n8n dashboard to catch any issues early. Logs will help understand runtime behavior.
FAQs
Can I use other scraping services instead of Bright Data?
Yes, but you would need to adjust the HTTP request nodes and API credentials accordingly. The unique polling and snapshot handling may differ.
Does this workflow consume many API credits?
It depends on Bright Data and Google Gemini usage plans. Frequent scraping and AI calls add up, so optimize running frequency.
Is my data safe using this automation?
n8n ensures data security in transit using HTTPS and your API keys remain private in node credentials. However, safeguard webhook URLs and credentials carefully.
Can this workflow scale for hundreds of companies?
Yes, with a proper queuing mechanism and sufficient API quota, you can queue URLs for batch processing by adapting the Set LinkedIn URL node to accept incoming data dynamically.
Conclusion
With this detailed n8n workflow, Sarah can now automatically extract LinkedIn company data, generate engaging stories using Google Gemini AI, and produce succinct summaries with zero manual effort. This saves her hours each week, reduces errors, and boosts her HR content marketing significantly.
Next steps could include automating personalized job postings or combining with social media schedulers to broadcast stories automatically.
By mastering this automation, you unlock powerful storytelling with AI integrated deeply into modern data extraction workflows.