Automate Indeed Company Data Scraping & Summarization with n8n & Google Gemini

Struggling to extract and summarize company data from Indeed efficiently? This unique n8n workflow automates web scraping via Bright Data, summarizes insights with Google Gemini AI, and stores results in Airtable, saving hours of manual research and boosting HR and recruitment efforts.
airtable
lmChatGoogleGemini
agent
+9
Workflow Identifier: 2341
NODES in Use: Manual Trigger, Set, Airtable, SplitInBatches, Wait, If, HTTP Request, Chain LLM, Chain Summarization, Langchain Agent, Sticky Note, Markdown
Automate Indeed data scraping with n8n and Airtable

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

What This Automation Does

This workflow automates gathering company data from Indeed and summarizes it for easy review.
It stops you from doing copy-paste errors and saves hours of manual work.
The workflow takes URLs from Airtable, scrapes Indeed via Bright Data, cleans text with AI, and summarizes it using Google Gemini.
The final info is sent to a webhook for alerts or further use.

You get structured, clear company summaries fast to help make better HR decisions.


Tools and Services Used

  • n8n Automation Platform: Runs the workflow and nodes.
  • Airtable: Stores Indeed company URLs.
  • Bright Data Web Unlocker: Scrapes Indeed web data programmatically.
  • Google Gemini PaLM API: Converts raw data into summaries.
  • Webhook Service: Receives final summaries for notifications or integration.

Workflow Inputs, Process, and Outputs

Inputs – Company URLs

Company URL records are pulled from an Airtable base named “Indeed”.
The workflow checks each record to ensure the link is not empty.

Processing Steps

  • Assign a Bright Data zone identifier.
  • Use batching to process URLs one at a time, avoiding overload.
  • Wait 10 seconds between each request to prevent rate limits.
  • Send POST request to Bright Data API with the Indeed URL to scrape raw markdown data.
  • Feed this markdown to an AI-powered Chain LLM node to extract clean text.
  • Summarize this text data using Google Gemini’s large language model via the Chain Summarization node.
  • Format and refine the summary with an expert AI prompt using a Langchain Agent specialized in Indeed data.
  • Send the final structured summary JSON to a webhook URL.

Output – Structured Company Summaries

The output is JSON data with a clear, relevant summary for each company.
This data can trigger notifications or feed into other HR tools.


Who Should Use This Workflow

This is useful for HR analysts or recruiters who manually collect company info from Indeed and find it slow and error-prone.
Anyone needing fast, consistent company profiles without copying and pasting should use this.


Beginner Step-by-Step: How to Use This Workflow in n8n

Step 1: Import the Workflow

  1. Download the workflow file using the “Download” button on this page.
  2. Inside the n8n editor, go to “Import from File” and select the downloaded file.

Step 2: Configure Credentials and Settings

  1. Add your Airtable API Key and select the correct base and table for company URLs.
  2. Enter Bright Data HTTP Header Authentication keys in the HTTP request node.
  3. Attach Google Gemini (PaLM API) credentials in Langchain nodes.
  4. Update the webhook URL to the webhook service or your integration endpoint.

Step 3: Test the Workflow

  1. Click the Manual Trigger node and hit “Execute Node” to run the workflow once.
  2. Check the execution logs to verify each step runs without errors.

Step 4: Activate the Workflow

  1. Switch to “Active” mode in n8n to enable automatic or scheduled runs.
  2. If scheduling is needed, add a schedule trigger to run the workflow regularly.

Following these steps, you can start using the workflow to automate company data scraping and summarization quickly.
Consider checking self-host n8n if running the workflow on a personal server.


Common Issues and Fixes

Bright Data HTTP 403 Forbidden

Check if HTTP header authentication keys are correct and active.
Verify the “zone” variable matches an existing Bright Data zone.

Google Gemini API Fails or Empty Summary

Confirm API keys for Google Gemini are valid and linked in Langchain nodes.
Watch for rate limits or quota exceeded errors.

Airtable Returns No Records

Verify API Key, base ID, and table name are correct.
Make sure the “Link” column contains valid Indeed URLs.


Customization Ideas

  • Change the Bright Data zone by updating the “Set Bright Data Zone” node value.
  • Modify the wait time in the Wait node for faster or slower execution.
  • Use another Airtable base or table for different URL lists.
  • Switch Google Gemini with other AI chat models supported by Langchain nodes.
  • Update webhook URL to integrate with CRMs, Slack, or dashboards.

Summary and Results

→ Automates Indeed company data scraping without manual steps.
→ Saves over 4 hours of weekly manual work for HR analysts.
✓ Reduces errors from copying and pasting.
✓ Provides structured, clear company summaries.
✓ Outputs data ready for notifications or system integration.

Automate Indeed data scraping with n8n and Airtable

Visit through Desktop to Interact with the Workflow.

Frequently Asked Questions

Verify that the HTTP Header Authentication keys are correct and active. Also check the zone variable matches a valid Bright Data zone in the account.
Empty summaries happen if Google Gemini API credentials are missing, invalid, or if usage limits are exceeded. Check keys and API quota.
This occurs when the Airtable API key, base ID, or table name are wrong. Also ensure the ‘Link’ field contains valid Indeed URLs.
Yes, but processing is done in batches with wait nodes to avoid rate limits and bans. Adjust batch size and wait duration accordingly.

Promoted by BULDRR AI

Related Workflows

Automate Twist Channel Creation and Messaging with n8n

This workflow automates creating and updating a channel in Twist and sending a personalized message to specific users. It eliminates manual setup errors and saves time managing Twist communications.

Automate Ideogram Image Generation with Google Sheets & Gmail

This workflow automates graphic design image generation via Ideogram AI, storing image data in Google Sheets and Google Drive, with email alerts via Gmail. It saves designers hours by automating image creation, remixing, review, and record-keeping.

Automate IT Support with Slack and OpenAI in n8n

Streamline IT support by automating Slack message handling using n8n and OpenAI. This workflow handles Slack DMs, filters bots, queries a Confluence knowledge base, and delivers AI-generated responses, improving support efficiency and response time.

Automate Crypto Analysis with CoinMarketCap & n8n AI Agent

Discover how this unique n8n workflow leverages CoinMarketCap’s multi-agent AI to deliver precise, real-time cryptocurrency insights directly via Telegram. Manage crypto data analysis efficiently with automated multi-source API integration.

Automate Gumroad to Beehiiv Subscriber Sync with n8n

Learn how to automatically add new Gumroad sales customers as Beehiiv newsletter subscribers using n8n automation. This workflow saves time by syncing sales data to Google Sheets CRM and notifying your Telegram channel instantly.

Generate On-Brand Blog Articles Using n8n and OpenAI

This workflow automates the creation of on-brand blog articles by analyzing existing company content using n8n and OpenAI. It extracts article structures and brand voice to produce consistent draft articles, saving significant content creation time.
1:1 Free Strategy Session
Your competitors are already automating. Are you still paying for it manually?

Do you want to adopt AI Automation?

Every hour your team does repetitive work, you're burning real money.
While you wait, faster businesses are cutting costs and moving quicker.
AI and automations aren't the future anymore — they're the present.

Book a live 1-on-1 session where we show you exactly which of your daily tasks can be automated — and what it’s costing you not to.