Opening Problem Statement
Meet Sarah, a content analyst for a market research firm. Each day, she spends hours manually sifting through dozens of web pages, trying to capture key entities such as company names, people mentioned, and locations referenced in articles. This process is painstaking, error-prone, and slows down her team’s ability to generate timely reports. Sarah loses roughly 5 hours per week on this repetitive task, and critical information sometimes slips through the cracks because of human oversight.
This is where the Google Page Entity Extraction n8n workflow becomes a game changer. It automates Sarah’s entire process by programmatically fetching web page content and extracting meaningful named entities like people, organizations, and locations using Google’s powerful Natural Language Processing (NLP) API. Instead of parsing HTML and manually tagging entities, Sarah now gets accurate, structured insights instantly and can focus on higher-level analysis.
What This Automation Does
This specific n8n workflow activates when you send it a URL via a POST request to its webhook. It then goes through multiple steps to fetch and analyze the page content and returns detailed entity data. Here’s exactly what it accomplishes:
- Receives any web page URL through a secure webhook endpoint.
- Fetches the raw HTML content of the specified web page automatically.
- Processes the page content with Google’s Natural Language API to extract entities.
- Returns detailed entity results including entity types (PERSON, ORGANIZATION, LOCATION), salience scores, metadata, and text mentions.
- Allows insights from unstructured web content to be structured for easier data consumption and decision making.
- Eliminates manual copy-pasting and error-prone tagging work — saving hours weekly.
Overall, this workflow saves an estimated 3-5 hours of manual work weekly and drastically improves data accuracy by leveraging Google’s NLP rather than manual review.
Prerequisites ⚙️
- Google Cloud account with Natural Language API enabled and API key generated 🔑.
- n8n account (cloud or self-hosted) with webhook activation capability 🔌.
- Basic understanding of making HTTP POST requests to trigger the workflow.
Optional: If you prefer self-hosting n8n, you can consider services like Hostinger which supports n8n instances.
Step-by-Step Guide
Step 1: Set up the Webhook to Receive URLs
Navigate to your n8n dashboard and open the workflow editor. Add a webhook node named “Get Url”.
Set HTTP Method to POST and specify a distinct path like your-custom-path.
You will use this URL endpoint to send the web page URLs you wish to analyze.
Step 2: Fetch the Web Page Content
Add an HTTP Request node named “Get URL Page Contents”.
Set it to perform a GET request on the URL received from the webhook input (map it as {{ $json.body.url }}).
This node will download the raw HTML of the page.
Step 3: Prepare HTML for Google NLP API
Insert a Code node named “Respond with detected entities”.
Paste this JavaScript code:
// Clean and prepare HTML for API request
const html = $input.item.json.data;
// Trim if too large (optional)
const trimmedHtml = html.length > 100000 ? html.substring(0, 100000) : html;
return {
json: {
apiRequest: {
document: {
type: "HTML",
content: trimmedHtml
},
encodingType: "UTF8"
}
}
}
This formats the fetched HTML into the JSON structure expected by Google’s API.
Step 4: Call Google Natural Language API
Add an HTTP Request node called “Google Entities”.
Configure it to POST to https://language.googleapis.com/v1/documents:analyzeEntities.
Set the request body to the JSON output from the previous Code node (map {{ $json.apiRequest }}).
Insert your Google API key in the query parameters as key.
Set the header Content-Type to application/json.
Step 5: Respond Back to the Triggering Client
Add a Respond to Webhook node.
Connect it to the Google Entities node to send the full entity analysis results back to whoever triggered the webhook.
Step 6: Activate and Test Your Workflow
Save and activate the workflow.
Send a POST request to the webhook URL with this JSON body:
{
"url": "https://example.com"
}
Check the response JSON to see extracted entities, their types, and importance.
Common mistakes to avoid:
– Forgetting to replace “YOUR-GOOGLE-API-KEY” with your actual key.
– Sending GET requests instead of POST to the webhook.
– Not formatting the request body correctly (must be JSON with a “url” property).
– Exceeding the Google API request size limits (handled by trimming in code node).
Customizations ✏️
1. Include Entity Sentiment Analysis
Add the property features: {extractEntitySentiment: true} to the request JSON in the Code node to enable sentiment extraction.
This allows you to understand the tone associated with entities.
2. Filter Entities by Type
Add a subsequent Code node after the Google Entities node to parse the response and filter only certain entity types, like PERSON or ORGANIZATION.
3. Expand URL Content Length
Adjust the trimming logic in the Code node if your pages are larger than 100,000 characters and your Google API quota allows.
4. Save Extracted Entities to Google Sheets
Integrate a Google Sheets node after Google Entities to log entity details for reporting.
5. Use Different NLP Models
Switch the API endpoint or parameters in the HTTP Request node to use other Google NLP features like syntax analysis or content classification.
Troubleshooting 🔧
Problem: “403 Forbidden” or “Invalid API Key” response from Google Entities node.
Cause: The API key is incorrect, missing, or Google NLP API is not enabled.
Solution: Double-check your Google Cloud Console, ensure API key is valid, enabled, and inserted correctly in query parameters.
Problem: Webhook returns no data or empty response.
Cause: The POST request payload was malformed or the workflow didn’t execute fully.
Solution: Confirm that you are sending a POST with correct JSON body containing the “url” field. Check n8n execution logs for errors.
Problem: Google NLP API request size exceeds limits.
Cause: Large web pages not trimmed before sending.
Solution: Edit the Code node to trim the input HTML to under 100,000 characters or adjust as needed within Google limits.
Pre-Production Checklist ✅
- Verify Google Cloud API key and enablement of Natural Language API.
- Test webhook POST requests locally or using tools like Postman with valid URL JSON body.
- Confirm each node executes without errors in n8n editor.
- Check that trimmed HTML data passes correctly from HTTP request node to code preparation node.
- Validate that entity response JSON received from Google is complete and correctly structured.
- Save a backup of your workflow before publishing.
Deployment Guide
Once you have completed testing, activate the workflow within n8n. Copy the webhook URL and use it as the endpoint for clients or your automation apps that need entity extraction. Monitor execution via n8n’s dashboard for any failed runs or errors. The architecture is lightweight, so it scales well for moderate usage without additional infrastructure. For higher volume, consider n8n’s self-hosted options.
FAQs
Q: Can I use another NLP provider instead of Google?
A: You can substitute the Google Entities HTTP Request node with any NLP API that accepts raw HTML and returns entity data, but you will need to adapt the request/response formatting.
Q: Does this workflow consume API credits?
A: Yes, each Google NLP API call counts toward your Google Cloud usage quota and billing.
Q: Is the extracted data secure?
A: The workflow only processes publicly accessible web pages. Sensitive or private URLs should be handled cautiously, as data is sent to Google’s servers.
Q: Can this workflow handle very large pages?
A: The Code node trims large HTML content by default, but you can adjust this based on your Google API limits.
Conclusion
By implementing this Google Page Entity Extraction workflow in n8n, you automate the tedious task Sarah once faced, extracting structured insights from any web page with ease. You save hours weekly and increase accuracy by leveraging Google’s NLP power. This approach transforms unstructured web content into actionable data—perfect for content analysts, marketers, or researchers.
Next, you could expand this workflow by integrating Google Sheets to log the extracted entities, adding sentiment analysis to gauge entity tone, or automating report generation based on your extracted data. Dive in, experiment, and watch your productivity soar with smart automation!