Automate Wikipedia Data Extraction & Summarization with n8n & Google Gemini

This workflow automates extracting and summarizing detailed Wikipedia content using Bright Data’s scraping and Google Gemini’s AI models, saving hours of manual research while delivering neat, human-readable summaries.
manualTrigger
httpRequest
set
+2
Workflow Identifier: 1887
NODES in Use: manualTrigger, httpRequest, set, chainLlm, chainSummarization

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

What This Automation Does

This workflow takes a Wikipedia article URL as input and returns a short, easy-to-read summary.
It solves the problem of manual data gathering from large web pages, saving time and mistakes.
You get human-written text summaries fast for research or business use.

The workflow uses Bright Data’s proxy service to get the full page HTML without blocking.
Then it uses Google Gemini AI to clean and format that data into plain text.
Next, another Google Gemini AI step creates a short summary of the content.
Finally, the summary is sent automatically to a webhook URL for alerting or storage.


Who Should Use This Workflow

Users who need quick, reliable summaries of Wikipedia pages without manual copying.
Ideal for marketers, researchers, and knowledge workers with limited time or technical skills.

This avoids hiring experts or spending hours on data cleanup.
Anyone with access to n8n and required API accounts can run the workflow easily.


Tools and Services Used

  • Bright Data Web Unlocker Zone: Scrapes full HTML pages reliably without blocks.
  • Google Gemini (PaLM API): Used twice – once for cleaning HTML to text and once for summarizing.
  • n8n Workflow Automation Platform: Hosts and runs the steps.
  • External HTTP Webhook: Receives final summary for notifications or logging.

Inputs, Processing, and Output

Inputs

  • Wikipedia article URL, e.g., https://en.wikipedia.org/wiki/Cloud_computing?product=unlocker&method=api
  • Configured Bright Data zone name for scraping.
  • Valid API keys for Bright Data and Google Gemini services.

Processing Steps

  1. Use Bright Data API to retrieve the raw HTML of the article.
  2. Send raw HTML to Google Gemini AI to extract clean, human-readable text.
  3. Pass clean text to another Google Gemini AI process to create a concise summary.
  4. POST the summary to an external webhook URL for downstream use.

Output

A brief, human-friendly summary text sent to the webhook as JSON.


Beginner Step-by-Step: How to Use This Workflow in n8n

Import Workflow

  1. Inside the n8n editor, click the Download button on this page to get the workflow file.
  2. Use the “Import from File” option in n8n to load the workflow.

Configure Credentials and Inputs

  1. Add your Bright Data HTTP Header Auth credentials in the Wikipedia Web Request node.
  2. Add your Google Gemini (PaLM API) credentials to both LLM nodes.
  3. In the Set node, update the url field to the desired Wikipedia page URL if needed.
  4. Update the zone field with your Bright Data zone name if different.
  5. In the final HTTP Request node, update the webhook URL to your chosen endpoint.

Test and Activate

  1. Click the Manual Trigger node’s “Execute Workflow” button once to test.
  2. Check that data flows smoothly and the final summary arrives at your webhook.
  3. When ready, toggle the workflow to “Active” to allow production use on demand.

Customizations

  • Change article by updating the URL in the Set node.
  • Use OpenAI or other AI providers by swapping Google Gemini credentials and models in LLM nodes.
  • Adapt the summarization prompt in the summarization node to create longer or shorter summaries.
  • Switch webhook URL to send summaries to Slack, email, or custom APIs.

Troubleshooting

  • Authentication failed in Wikipedia Web Request node.
    Cause: Bright Data API keys missing or wrong.
    Fix: Re-enter valid Bright Data HTTP Header Auth credentials.
  • LLM Data Extractor outputs empty or poor text.
    Cause: Incorrect input reference or prompt issues.
    Fix: Make sure input uses {{$json.data}} and prompt is clear, simple.
  • Summary webhook receives empty data.
    Cause: “Send Body” not enabled or wrong field reference.
    Fix: Enable “Send Body” and confirm body uses {{$json.response.text}}.

Pre-Production Checklist

  • Check validity of Bright Data credentials and zone name configured.
  • Confirm Google Gemini (PaLM API) keys work for both AI nodes.
  • Run a manual test to see full data flow without errors.
  • Ensure the final summary is correct and received via webhook.
  • Back up API keys and workflow settings securely.

Deployment Guide

Turn on the workflow’s toggle switch in n8n editor to activate.
Since this uses manual trigger, run when you want fresh summaries.

Watch the workflow run log for successful executions.
Consider using self-host n8n for stable operation if running on your own server.


Summary and Final Result

✓ This workflow saves hours of manual Wikipedia research.
✓ You get clean, human-readable summaries fast.
✓ It handles web scraping blocks using Bright Data proxies.
✓ Google Gemini AI cleans and summarizes content precisely.
✓ The result can trigger alerts or feed other tools via webhooks.


Frequently Asked Questions

Yes. Replace Google Gemini API keys and models in both LLM nodes with OpenAI credentials and models.
Yes. The summarization node uses advanced chunking to manage big text inputs effectively.
All data passes inside the n8n instance securely. Use encrypted API keys and protect webhook URLs for best practices.
Yes. Both Bright Data and Google Gemini APIs have usage costs and limits based on subscription plans.

Promoted by BULDRR AI

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation Workflows in n8n

A complete beginner guide to building an AI SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free