Extract Web Page Entities with Google NLP in n8n Workflow

Struggling to manually extract meaningful entities from web pages? This unique n8n workflow automates entity extraction using Google’s Natural Language API, delivering structured insights like people, organizations, and locations directly from any URL you provide.
webhook
httpRequest
code
+2
Workflow Identifier: 1594
NODES in Use: webhook, httpRequest, code, respondToWebhook, stickyNote

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

Opening Problem Statement

Meet Sarah, a content analyst for a market research firm. Each day, she spends hours manually sifting through dozens of web pages, trying to capture key entities such as company names, people mentioned, and locations referenced in articles. This process is painstaking, error-prone, and slows down her team’s ability to generate timely reports. Sarah loses roughly 5 hours per week on this repetitive task, and critical information sometimes slips through the cracks because of human oversight.

This is where the Google Page Entity Extraction n8n workflow becomes a game changer. It automates Sarah’s entire process by programmatically fetching web page content and extracting meaningful named entities like people, organizations, and locations using Google’s powerful Natural Language Processing (NLP) API. Instead of parsing HTML and manually tagging entities, Sarah now gets accurate, structured insights instantly and can focus on higher-level analysis.

What This Automation Does

This specific n8n workflow activates when you send it a URL via a POST request to its webhook. It then goes through multiple steps to fetch and analyze the page content and returns detailed entity data. Here’s exactly what it accomplishes:

  • Receives any web page URL through a secure webhook endpoint.
  • Fetches the raw HTML content of the specified web page automatically.
  • Processes the page content with Google’s Natural Language API to extract entities.
  • Returns detailed entity results including entity types (PERSON, ORGANIZATION, LOCATION), salience scores, metadata, and text mentions.
  • Allows insights from unstructured web content to be structured for easier data consumption and decision making.
  • Eliminates manual copy-pasting and error-prone tagging work — saving hours weekly.

Overall, this workflow saves an estimated 3-5 hours of manual work weekly and drastically improves data accuracy by leveraging Google’s NLP rather than manual review.

Prerequisites ⚙️

  • Google Cloud account with Natural Language API enabled and API key generated 🔑.
  • n8n account (cloud or self-hosted) with webhook activation capability 🔌.
  • Basic understanding of making HTTP POST requests to trigger the workflow.

Optional: If you prefer self-hosting n8n, you can consider services like Hostinger which supports n8n instances.

Step-by-Step Guide

Step 1: Set up the Webhook to Receive URLs
Navigate to your n8n dashboard and open the workflow editor. Add a webhook node named “Get Url”.
Set HTTP Method to POST and specify a distinct path like your-custom-path.
You will use this URL endpoint to send the web page URLs you wish to analyze.

Step 2: Fetch the Web Page Content
Add an HTTP Request node named “Get URL Page Contents”.
Set it to perform a GET request on the URL received from the webhook input (map it as {{ $json.body.url }}).
This node will download the raw HTML of the page.

Step 3: Prepare HTML for Google NLP API
Insert a Code node named “Respond with detected entities”.
Paste this JavaScript code:

// Clean and prepare HTML for API request
const html = $input.item.json.data;
// Trim if too large (optional)
const trimmedHtml = html.length > 100000 ? html.substring(0, 100000) : html;

return {
  json: {
    apiRequest: {
      document: {
        type: "HTML",
        content: trimmedHtml
      },
      encodingType: "UTF8"
    }
  }
}

This formats the fetched HTML into the JSON structure expected by Google’s API.

Step 4: Call Google Natural Language API
Add an HTTP Request node called “Google Entities”.
Configure it to POST to https://language.googleapis.com/v1/documents:analyzeEntities.
Set the request body to the JSON output from the previous Code node (map {{ $json.apiRequest }}).
Insert your Google API key in the query parameters as key.
Set the header Content-Type to application/json.

Step 5: Respond Back to the Triggering Client
Add a Respond to Webhook node.
Connect it to the Google Entities node to send the full entity analysis results back to whoever triggered the webhook.

Step 6: Activate and Test Your Workflow
Save and activate the workflow.
Send a POST request to the webhook URL with this JSON body:

{
  "url": "https://example.com"
}

Check the response JSON to see extracted entities, their types, and importance.

Common mistakes to avoid:
– Forgetting to replace “YOUR-GOOGLE-API-KEY” with your actual key.
– Sending GET requests instead of POST to the webhook.
– Not formatting the request body correctly (must be JSON with a “url” property).
– Exceeding the Google API request size limits (handled by trimming in code node).

Customizations ✏️

1. Include Entity Sentiment Analysis
Add the property features: {extractEntitySentiment: true} to the request JSON in the Code node to enable sentiment extraction.
This allows you to understand the tone associated with entities.

2. Filter Entities by Type
Add a subsequent Code node after the Google Entities node to parse the response and filter only certain entity types, like PERSON or ORGANIZATION.

3. Expand URL Content Length
Adjust the trimming logic in the Code node if your pages are larger than 100,000 characters and your Google API quota allows.

4. Save Extracted Entities to Google Sheets
Integrate a Google Sheets node after Google Entities to log entity details for reporting.

5. Use Different NLP Models
Switch the API endpoint or parameters in the HTTP Request node to use other Google NLP features like syntax analysis or content classification.

Troubleshooting 🔧

Problem: “403 Forbidden” or “Invalid API Key” response from Google Entities node.
Cause: The API key is incorrect, missing, or Google NLP API is not enabled.
Solution: Double-check your Google Cloud Console, ensure API key is valid, enabled, and inserted correctly in query parameters.

Problem: Webhook returns no data or empty response.
Cause: The POST request payload was malformed or the workflow didn’t execute fully.
Solution: Confirm that you are sending a POST with correct JSON body containing the “url” field. Check n8n execution logs for errors.

Problem: Google NLP API request size exceeds limits.
Cause: Large web pages not trimmed before sending.
Solution: Edit the Code node to trim the input HTML to under 100,000 characters or adjust as needed within Google limits.

Pre-Production Checklist ✅

  • Verify Google Cloud API key and enablement of Natural Language API.
  • Test webhook POST requests locally or using tools like Postman with valid URL JSON body.
  • Confirm each node executes without errors in n8n editor.
  • Check that trimmed HTML data passes correctly from HTTP request node to code preparation node.
  • Validate that entity response JSON received from Google is complete and correctly structured.
  • Save a backup of your workflow before publishing.

Deployment Guide

Once you have completed testing, activate the workflow within n8n. Copy the webhook URL and use it as the endpoint for clients or your automation apps that need entity extraction. Monitor execution via n8n’s dashboard for any failed runs or errors. The architecture is lightweight, so it scales well for moderate usage without additional infrastructure. For higher volume, consider n8n’s self-hosted options.

FAQs

Q: Can I use another NLP provider instead of Google?
A: You can substitute the Google Entities HTTP Request node with any NLP API that accepts raw HTML and returns entity data, but you will need to adapt the request/response formatting.

Q: Does this workflow consume API credits?
A: Yes, each Google NLP API call counts toward your Google Cloud usage quota and billing.

Q: Is the extracted data secure?
A: The workflow only processes publicly accessible web pages. Sensitive or private URLs should be handled cautiously, as data is sent to Google’s servers.

Q: Can this workflow handle very large pages?
A: The Code node trims large HTML content by default, but you can adjust this based on your Google API limits.

Conclusion

By implementing this Google Page Entity Extraction workflow in n8n, you automate the tedious task Sarah once faced, extracting structured insights from any web page with ease. You save hours weekly and increase accuracy by leveraging Google’s NLP power. This approach transforms unstructured web content into actionable data—perfect for content analysts, marketers, or researchers.

Next, you could expand this workflow by integrating Google Sheets to log the extracted entities, adding sentiment analysis to gauge entity tone, or automating report generation based on your extracted data. Dive in, experiment, and watch your productivity soar with smart automation!

Promoted by BULDRR AI

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation in n8n (Beginner Guide)

A complete beginner guide to building an AI-powered SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free