Automate Indeed Company Info Extraction with Bright Data & Gemini AI

Boost your recruitment intelligence with automated extraction and summarization of Indeed company data using Bright Data and Google Gemini AI. This workflow streamlines data scraping, AI analysis, and webhook notifications to save hours of manual research.
httpRequest
lmChatGoogleGemini
agent
+7
Learn how to Build this Workflow with AI:
Workflow Identifier: 2189
NODES in Use: ManualTrigger, Set, HTTP Request, Markdown, Sticky Note, Google Gemini Chat Model, Google Gemini Chat Model For Summarization, LangChain Markdown to Textual Data Extractor, LangChain Agent, LangChain Tool HTTP Request

Press CTRL+F5 if the workflow didn't load.

Visit through Desktop for Best experience

Opening Problem Statement

Meet Lisa, a busy HR manager at a mid-sized tech company. She spends hours every week manually searching Indeed for company insights—details like reputation, employee reviews, and hiring trends—to better understand potential employers or partners. This manual process is repetitive, error-prone, and time-consuming, taking away valuable hours Lisa could spend recruiting top talent. Moreover, handling multiple company queries creates data chaos and inconsistent summaries, making it hard to share clear insights with her team.

This is exactly the kind of challenge our unique n8n workflow tackles: automating the extraction and summarization of company information from Indeed using advanced data scraping through Bright Data’s Web Unlocker, combined with powerful AI-driven summarization via Google Gemini. Instead of manually surfing thousands of company pages and piecing together notes, Lisa can simply run this workflow to get neat, actionable company summaries delivered instantly to a webhook or internal dashboard.

What This Automation Does

When Lisa triggers the workflow, here’s what happens:

  • Sets the company search query for Indeed (e.g., Starbucks).
  • Uses Bright Data’s Web Unlocker API to bypass scraping blocks and retrieve raw markdown data from Indeed’s company page.
  • Extracts and converts the scraped markdown into clean textual data via a custom LangChain markdown-to-text node.
  • Summarizes the extracted information using Google Gemini’s advanced AI summarization model for concise insights.
  • Triggers an AI Expert Agent specialized in Indeed to format results perfectly and push them to an external webhook for notifications or further integrations.
  • Converts markdown to HTML for readable reports and sends notifications of both summary and full data as HTML via webhook.

This automation saves Lisa at least 3-4 hours per company query, eliminates data inconsistencies, and ensures professional, AI-tailored company profiles ready for decision-making.

Prerequisites ⚙️

  • n8n account (cloud or self-hosted; for self-hosting options, see Hostinger guide)
  • Bright Data Web Unlocker API account (for Indeed scraping)
  • Google Gemini (PaLM) API credentials for advanced AI summarization and chat model usage
  • Webhook URL service (like https://webhook.site) to receive notifications

Step-by-Step Guide to Build the Workflow ✏️

1. Add Manual Trigger Node

Navigate to Nodes > Triggers > Manual Trigger and add it as the workflow entry point. This lets you trigger the workflow on demand.
Expected: A simple button to manually test the workflow.

2. Set Indeed Search Query

Add a Set node to define your search parameters. Under “Assignments,” create two string fields:
search_query: The company name, e.g., “Starbucks”
zone: Your Bright Data zone like “web_unlocker1”
Expected: This sets dynamic query data usable by later nodes.
Common mistake: Forgetting to use exactly the zone string matching your Bright Data setup, causing request failures.

3. Perform Indeed Web Request via Bright Data

Add an HTTP Request node configured to POST to “https://api.brightdata.com/request”.
Body parameters include:
zone set to = $json.zone
url set to =https://www.indeed.com/cmp/{{ encodeURI($json.search_query) }}?product=unlocker&method=api
Other params include format: raw and data_format: markdown.
Authenticate using your Bright Data Header Auth credentials.
Expected: Receive raw markdown data of Indeed company page.
Common mistake: Misconfiguring auth headers or using the wrong URL format.

4. Convert Markdown to Textual Data

Use the LangChain Markdown to Textual Data Extractor node.
Prompt it to “analyze the markdown and convert to textual data.”
Feed it the $json.data field from the HTTP response.
Expected: Clean textual data extracted from the markdown format.
Common mistake: Missing the exact JSON field path causing empty or malformed text.

5. Summarize Extracted Data with Google Gemini

Add a Google Gemini Chat model for summarization node connected to extractor output.
Use the “models/gemini-2.0-flash-exp” model.
Expected: Receive concise summary text highlighting key company insights.
Common mistake: Forgetting to attach correct credentials leads to authentication errors.

6. Initiate Webhook Notification for Summary

Add an HTTP Request node posting to your webhook URL.
Send the summarized text in the body parameter as summary.
Expected: External notification receives the summarized company info.
Common mistake: Using wrong HTTP method or malformed body parameters.

7. Convert Markdown to HTML for Reporting

Add an Markdown node for conversion with mode set to “markdownToHtml,” sending the original markdown data.
Expected: Receive well-formatted HTML to share or embed.
Common mistake: Feeding wrong data inputs causing empty HTML output.

8. Initiate Webhook Notification for HTML Response

Add another HTTP Request node posting the HTML response to your webhook.
Use body param html_response.
Expected: Notifications receive readable web-formatted company pages.
Common mistake: Incorrect webhook URLs or missing body parameters.

9. Expert AI Agent Formatter

Add the LangChain Indeed Expert AI Agent node.
Feed it the summarized text from Google Gemini with a prompt:
“You are an Indeed Expert. Format the search result and push it to the Webhook via HTTP Request.”
Expected: Professionally formatted JSON output ready for downstream consumption.
Common mistake: Incorrect node connections or missing prompt context.

10. Final Webhook Request for AI Agent Output

Use the LangChain HTTP Request tool node to POST formatted JSON from the AI Agent to your webhook.
Set method POST, include body parameters such as search_summary with the agent’s response.
Expected: Webhook receives structured, AI-formatted company info.
Common mistake: Misaligning body parameters causes webhook errors.

Customizations ✏️

  • Change Indeed Search Query: Modify the search_query field in the Set Indeed Search Query node to any company name you want to extract info for.
  • Switch Bright Data Zone: Change the zone parameter in the Set Indeed Search Query node to match your Bright Data account’s zone.
  • Use Different AI Models: Replace Google Gemini nodes with other LangChain-compatible AI models by updating the modelName parameter for customized summarization or formatting.
  • Webhook URL Updates: Point webhook nodes to your internal systems or Slack channels for real-time team notifications instead of webhook.site.
  • Adjust Summarization Depth: Tweak the Google Gemini summarization node’s prompt or parameters to get shorter or more detailed summaries as needed.

Troubleshooting 🔧

  • Problem: HTTP Request returns 401 Unauthorized
    Cause: Incorrect Bright Data API credentials or missing header authentication.
    Solution: Check your HTTP Header Auth node, ensure credentials are correctly configured and active.
  • Problem: AI model returns empty or irrelevant summary
    Cause: Missing or malformed data input, or wrong API key.
    Solution: Verify JSON input mapping into the Google Gemini node; recheck Google API credential setup.
  • Problem: Markdown conversion returns empty HTML
    Cause: Wrong data field or malformed markdown.
    Solution: Confirm data passed to Markdown node is correct and contains valid markdown text.

Pre-Production Checklist ✅

  • Verify your Bright Data API credentials have correct privileges and zone names.
  • Test the manual trigger initiates the workflow correctly.
  • Check Indeed search query returns actual markdown data in HTTP Request node.
  • Confirm AI summarization node outputs concise summary text.
  • Validate webhook URLs are reachable and accept POST requests.
  • Perform end-to-end run and record logs for debugging any unexpected failures.

Deployment Guide

Once tested, make your workflow active by toggling the slider on the workflow page. Schedule periodic runs or trigger via API to automate company info updates.
Configure webhook monitoring tools to alert on failures or long runtimes.
Use n8n’s integrated execution logs for ongoing maintenance and troubleshooting.

FAQs

  • Q: Can I use other scraping services instead of Bright Data?
    A: Yes, but you’ll need to adjust the HTTP Request node URL and authentication accordingly.
  • Q: Does Google Gemini consume a lot of API credits?
    A: Usage depends on the input size and frequency; monitor Google Cloud billing for cost management.
  • Q: Is the data secure?
    A: Data is processed securely within your n8n environment and through trusted APIs; always use encrypted credentials.
  • Q: Can this handle multiple company queries?
    A: Yes, though for bulk queries you might want to batch requests or schedule runs to avoid rate limits.

Conclusion

By deploying this advanced n8n workflow, Lisa and others like her can dramatically cut down the time spent on manual Indeed research. Instead of hours per company, get polished summaries in minutes, complete with formatted HTML reports and AI-powered expert insights.

Not only does this save time, but it also delivers consistent, reliable, and actionable company intelligence for HR teams and recruiters.

Next steps? Consider extending this automation to include other job boards, integrate with CRM systems, or add sentiment analysis on company reviews using AI models.

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation in n8n (Beginner Guide)

A complete beginner guide to building an AI-powered SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free