Opening Problem Statement
Meet Lisa, a busy HR manager at a mid-sized tech company. She spends hours every week manually searching Indeed for company insights—details like reputation, employee reviews, and hiring trends—to better understand potential employers or partners. This manual process is repetitive, error-prone, and time-consuming, taking away valuable hours Lisa could spend recruiting top talent. Moreover, handling multiple company queries creates data chaos and inconsistent summaries, making it hard to share clear insights with her team.
This is exactly the kind of challenge our unique n8n workflow tackles: automating the extraction and summarization of company information from Indeed using advanced data scraping through Bright Data’s Web Unlocker, combined with powerful AI-driven summarization via Google Gemini. Instead of manually surfing thousands of company pages and piecing together notes, Lisa can simply run this workflow to get neat, actionable company summaries delivered instantly to a webhook or internal dashboard.
What This Automation Does
When Lisa triggers the workflow, here’s what happens:
- Sets the company search query for Indeed (e.g., Starbucks).
- Uses Bright Data’s Web Unlocker API to bypass scraping blocks and retrieve raw markdown data from Indeed’s company page.
- Extracts and converts the scraped markdown into clean textual data via a custom LangChain markdown-to-text node.
- Summarizes the extracted information using Google Gemini’s advanced AI summarization model for concise insights.
- Triggers an AI Expert Agent specialized in Indeed to format results perfectly and push them to an external webhook for notifications or further integrations.
- Converts markdown to HTML for readable reports and sends notifications of both summary and full data as HTML via webhook.
This automation saves Lisa at least 3-4 hours per company query, eliminates data inconsistencies, and ensures professional, AI-tailored company profiles ready for decision-making.
Prerequisites ⚙️
- n8n account (cloud or self-hosted; for self-hosting options, see Hostinger guide)
- Bright Data Web Unlocker API account (for Indeed scraping)
- Google Gemini (PaLM) API credentials for advanced AI summarization and chat model usage
- Webhook URL service (like https://webhook.site) to receive notifications
Step-by-Step Guide to Build the Workflow ✏️
1. Add Manual Trigger Node
Navigate to Nodes > Triggers > Manual Trigger and add it as the workflow entry point. This lets you trigger the workflow on demand.
Expected: A simple button to manually test the workflow.
2. Set Indeed Search Query
Add a Set node to define your search parameters. Under “Assignments,” create two string fields:
– search_query: The company name, e.g., “Starbucks”
– zone: Your Bright Data zone like “web_unlocker1”
Expected: This sets dynamic query data usable by later nodes.
Common mistake: Forgetting to use exactly the zone string matching your Bright Data setup, causing request failures.
3. Perform Indeed Web Request via Bright Data
Add an HTTP Request node configured to POST to “https://api.brightdata.com/request”.
Body parameters include:
– zone set to = $json.zone
– url set to =https://www.indeed.com/cmp/{{ encodeURI($json.search_query) }}?product=unlocker&method=api
Other params include format: raw and data_format: markdown.
Authenticate using your Bright Data Header Auth credentials.
Expected: Receive raw markdown data of Indeed company page.
Common mistake: Misconfiguring auth headers or using the wrong URL format.
4. Convert Markdown to Textual Data
Use the LangChain Markdown to Textual Data Extractor node.
Prompt it to “analyze the markdown and convert to textual data.”
Feed it the $json.data field from the HTTP response.
Expected: Clean textual data extracted from the markdown format.
Common mistake: Missing the exact JSON field path causing empty or malformed text.
5. Summarize Extracted Data with Google Gemini
Add a Google Gemini Chat model for summarization node connected to extractor output.
Use the “models/gemini-2.0-flash-exp” model.
Expected: Receive concise summary text highlighting key company insights.
Common mistake: Forgetting to attach correct credentials leads to authentication errors.
6. Initiate Webhook Notification for Summary
Add an HTTP Request node posting to your webhook URL.
Send the summarized text in the body parameter as summary.
Expected: External notification receives the summarized company info.
Common mistake: Using wrong HTTP method or malformed body parameters.
7. Convert Markdown to HTML for Reporting
Add an Markdown node for conversion with mode set to “markdownToHtml,” sending the original markdown data.
Expected: Receive well-formatted HTML to share or embed.
Common mistake: Feeding wrong data inputs causing empty HTML output.
8. Initiate Webhook Notification for HTML Response
Add another HTTP Request node posting the HTML response to your webhook.
Use body param html_response.
Expected: Notifications receive readable web-formatted company pages.
Common mistake: Incorrect webhook URLs or missing body parameters.
9. Expert AI Agent Formatter
Add the LangChain Indeed Expert AI Agent node.
Feed it the summarized text from Google Gemini with a prompt:
“You are an Indeed Expert. Format the search result and push it to the Webhook via HTTP Request.”
Expected: Professionally formatted JSON output ready for downstream consumption.
Common mistake: Incorrect node connections or missing prompt context.
10. Final Webhook Request for AI Agent Output
Use the LangChain HTTP Request tool node to POST formatted JSON from the AI Agent to your webhook.
Set method POST, include body parameters such as search_summary with the agent’s response.
Expected: Webhook receives structured, AI-formatted company info.
Common mistake: Misaligning body parameters causes webhook errors.
Customizations ✏️
- Change Indeed Search Query: Modify the search_query field in the Set Indeed Search Query node to any company name you want to extract info for.
- Switch Bright Data Zone: Change the
zoneparameter in the Set Indeed Search Query node to match your Bright Data account’s zone. - Use Different AI Models: Replace Google Gemini nodes with other LangChain-compatible AI models by updating the modelName parameter for customized summarization or formatting.
- Webhook URL Updates: Point webhook nodes to your internal systems or Slack channels for real-time team notifications instead of webhook.site.
- Adjust Summarization Depth: Tweak the Google Gemini summarization node’s prompt or parameters to get shorter or more detailed summaries as needed.
Troubleshooting 🔧
- Problem: HTTP Request returns 401 Unauthorized
Cause: Incorrect Bright Data API credentials or missing header authentication.
Solution: Check your HTTP Header Auth node, ensure credentials are correctly configured and active. - Problem: AI model returns empty or irrelevant summary
Cause: Missing or malformed data input, or wrong API key.
Solution: Verify JSON input mapping into the Google Gemini node; recheck Google API credential setup. - Problem: Markdown conversion returns empty HTML
Cause: Wrong data field or malformed markdown.
Solution: Confirm data passed to Markdown node is correct and contains valid markdown text.
Pre-Production Checklist ✅
- Verify your Bright Data API credentials have correct privileges and zone names.
- Test the manual trigger initiates the workflow correctly.
- Check Indeed search query returns actual markdown data in HTTP Request node.
- Confirm AI summarization node outputs concise summary text.
- Validate webhook URLs are reachable and accept POST requests.
- Perform end-to-end run and record logs for debugging any unexpected failures.
Deployment Guide
Once tested, make your workflow active by toggling the slider on the workflow page. Schedule periodic runs or trigger via API to automate company info updates.
Configure webhook monitoring tools to alert on failures or long runtimes.
Use n8n’s integrated execution logs for ongoing maintenance and troubleshooting.
FAQs
- Q: Can I use other scraping services instead of Bright Data?
A: Yes, but you’ll need to adjust the HTTP Request node URL and authentication accordingly. - Q: Does Google Gemini consume a lot of API credits?
A: Usage depends on the input size and frequency; monitor Google Cloud billing for cost management. - Q: Is the data secure?
A: Data is processed securely within your n8n environment and through trusted APIs; always use encrypted credentials. - Q: Can this handle multiple company queries?
A: Yes, though for bulk queries you might want to batch requests or schedule runs to avoid rate limits.
Conclusion
By deploying this advanced n8n workflow, Lisa and others like her can dramatically cut down the time spent on manual Indeed research. Instead of hours per company, get polished summaries in minutes, complete with formatted HTML reports and AI-powered expert insights.
Not only does this save time, but it also delivers consistent, reliable, and actionable company intelligence for HR teams and recruiters.
Next steps? Consider extending this automation to include other job boards, integrate with CRM systems, or add sentiment analysis on company reviews using AI models.