Structured Data Extraction & Sentiment Analysis with n8n and Bright Data

This workflow automates extracting structured data from web content using Bright Data’s web unlocker and Google Gemini AI models. It solves the challenge of manual data mining and sentiment analysis by delivering organized insights and trend detection automatically.
manualTrigger
httpRequest
lmChatGoogleGemini
+6
Workflow Identifier: 2016
NODES in Use: Manual Trigger, Sticky Note, Set, HTTP Request, chainLlm, lmChatGoogleGemini, n8-nodes-informationExtractor, Function, Read & Write File

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

What This Automation Does

This workflow gets raw web content from any website, cleans markdown into easy-to-read text, finds main topics with details, groups trends by place and category, and checks the feelings behind the words. It saves JSON files on the computer and sends updates to a webhook. This stops you from wasting time copying, fixing text, and guessing important points.

The main goal is to help users see clear, organized news or data from complex web pages. It works by using Bright Data to get web data, then Google Gemini to clear markdown and find topics and mood.

  • Input: URL is taken and sent through a proxy to get raw markdown content.
  • Processing: The markdown is changed to clean text. Then tools pick out topics with scores, summaries, and keywords.
  • More processing: Trends are grouped by location and category.
  • Sentiment analysis: AI checks if the text is positive, negative, or neutral.
  • Output: Data is sent to a webhook and saved as JSON files on disk.

The result is easy-to-use structured data from noisy web pages, helping you find patterns and feelings fast.


Who Should Use This Workflow

This workflow suits people who collect data from news or web pages but find manual work slow and messy.

Users include marketers, analysts, or any one needing clear topics and feelings from many websites regularly.

No deep technical skills are needed if the workflow is set up.


Tools and Services Used

  • Bright Data Web Unlocker: Gets web content bypassing typical blockers.
  • Google PaLM API with Google Gemini: Converts markdown to text and does AI topic and sentiment analysis.
  • n8n automation platform: Runs the workflow linking all needed steps.
  • Webhook services like webhook.site: Receives live data updates.

Beginner Step-by-Step: How to Use This Workflow in n8n

Importing the Workflow

  1. Download the workflow file using the Download button on this page.
  2. In n8n editor, click on the menu, choose Import from File, then pick the downloaded workflow file.

Configuring the Workflow

  1. Add your Bright Data API credentials in n8n under credentials. Use the HTTP Header Auth type with your API Key.
  2. Add your Google PaLM API Key credentials for the Google Gemini nodes.
  3. Check and update the Set URL and Bright Data Zone node to the website you want to scrape and your Bright Data proxy zone name.
  4. If webhook URLs need changing (for example, to use Slack or Discord), update the HTTP Request nodes with new URLs.
  5. If saving files locally, verify the file paths in the Read & Write File nodes are correct and writable on your machine.

Testing and Activating

  1. Run the Manual Trigger once to test the workflow flow and check no errors happen.
  2. Watch the webhook or file locations for expected outputs.
  3. When happy, activate the workflow in n8n by pressing Activate.
  4. Schedule or trigger as needed for your work.

For stable use, consider self-host n8n to keep the workflow running without interruptions.


Inputs, Processing Steps, and Outputs

Inputs

  • The website URL to scrape, e.g., a news page.
  • Bright Data proxy zone to route requests.
  • User API Keys for Bright Data and Google PaLM.

Processing Steps

  • Send POST request to Bright Data’s API to get web data in markdown.
  • Use Google Gemini to clean markdown into readable text.
  • Extract main topics with details like confidence and keywords using Information Extractor nodes.
  • Group trends by location and category.
  • Analyze sentiment on the topics using Google Gemini chat model.
  • Send results to webhook URLs for real-time alerts.
  • Save extracted JSON data locally as files.

Outputs

  • Structured JSON files saved on disk with topics and trends.
  • Webhook notifications carrying text, trends, and sentiment data.

Edge Cases and Failures

  • Wrong Bright Data API Key or expired token causes authentication failure in the HTTP Request node.
  • Bad JSON schema for Information Extractor nodes leads to extraction errors.
  • Webhook Request nodes with “Send Body” disabled send empty payloads.
  • Insufficient write permissions or invalid file paths cause file saving errors.

Checking credentials, JSON syntax, and node options helps avoid these.


Customization Ideas

  • Change the target website by editing the URL in Set node to any supported site.
  • Switch Bright Data proxy zone to target different geographic regions.
  • Use different Google Gemini model versions for deeper or lighter analysis.
  • Edit Information Extractor JSON schemas to add fields like sentiment scores or named entities.
  • Replace webhook URLs with Slack, Discord, or custom notification endpoints.

Example Encoding Function for Saving JSON Files

This Function node code converts extracted JSON topics into base64 binary format so the Read & Write File node can save them.

items[0].binary = {
  data: {
    data: Buffer.from(JSON.stringify(items[0].json, null, 2)).toString('base64')
  }
};
return items;

Use similar code blocks for trends JSON files if needed.


Summary of Results

✓ Quickly get clean text from complex markdown web pages.

✓ Extract detailed topics with confidence and keywords.

✓ Group important trends by location and category.

✓ Analyze text sentiment to see positive or negative views.

✓ Save results as JSON files and send live notifications by webhook.


Frequently Asked Questions

Verify the API Key in HTTP Header Auth credentials is correct and active. Update the key in n8n if expired.
Errors come from invalid JSON formats or missing required fields like topic, score, summary, or keywords. Validate schema before use.
The HTTP Request node must have the ‘Send Body’ option enabled and correct body parameters defined.
Download the workflow file, import it in n8n editor, add Bright Data and Google PaLM API credentials, update URLs or paths if needed, test by running once, then activate for regular use.

Promoted by BULDRR AI

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation Workflows in n8n

A complete beginner guide to building an AI SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free