Automate News Extraction with n8n and OpenAI for Weekly Summaries

This n8n workflow automates extracting, summarizing, and keywording the latest news posts from a website without RSS, saving hours of manual work each week. It pulls fresh news, summarizes content with OpenAI, and stores results in NocoDB for easy access.
html
openAi
set
+7
Learn how to Build this Workflow with AI:
Workflow Identifier: 1959
NODES in Use: html, openAi, set, merge, code, httpRequest, itemLists, scheduleTrigger, nocoDb, stickyNote

Press CTRL+F5 if the workflow didn't load.

Visit through Desktop for Best experience

Opening Problem Statement

Meet Sarah, a market analyst at a telecommunications firm. Every week, she needs to monitor the latest news updates and technical releases from Colt, a telecom company. Unfortunately, Colt’s news site lacks an RSS feed, forcing Sarah to sift through each post’s web page manually. This tedious process consumes over four hours weekly, with high risk of missing critical updates or key technical keywords relevant to her team’s projects.

She often finds herself overwhelmed, manually copying links, dates, and lengthy articles, then drafting summaries and extracting keywords — all tasks that are repetitive and time-consuming. If she misses even one post, decision-making suffers, potentially costing her company hundreds of dollars in delayed responses to industry changes.

What This Automation Does

This n8n workflow streamlines Sarah’s entire news monitoring process by automating web scraping, AI summarization, and data storage. When run weekly via a scheduler trigger, it achieves the following:

  • Extracts links and publication dates from Colt’s news page using precise CSS selectors.
  • Filters news posts from only the last 7 days to focus on relevant updates.
  • Fetches full content of each filtered news post from individual URLs.
  • Uses OpenAI GPT-4 to generate concise news summaries capped at 70 words for quick comprehension.
  • Extracts three key technical keywords from each news post’s content for tagging and indexing.
  • Stores all collected data including title, date, link, summary, and keywords into a NocoDB SQL database for centralized access and further analysis.

This automation saves Sarah upwards of four hours weekly, eliminates human error in data collection, and keeps her team instantly updated with relevant technical news.

Prerequisites ⚙️

  • n8n account – hosting your automation workflows.
  • OpenAI account with API access – for GPT-4 driven summaries and keyword extraction.
  • NocoDB instance and API token – an SQL-compatible no-code database for storing news data.
  • Basic familiarity with CSS selectors – needed to pinpoint exact HTML elements for extraction.
  • Optional: Self-hosted n8n for greater control and scalability, which you can learn more about at Hostinger’s n8n hosting guide.

Step-by-Step Guide to Build This News Extraction Workflow ✏️

Step 1: Schedule the Automation Weekly with Schedule Trigger

Navigate to Triggers → Schedule Trigger. Set the workflow to run every Wednesday at 4:32 AM by configuring the interval to weeks and selecting day 3 at hour 4, minute 32. This ensures you collect new news weekly without manual intervention.

Expected: The workflow automatically starts every Wednesday morning.

Common mistake: Forgetting to enable the workflow after setup.

Step 2: Retrieve the News Page HTML via HTTP Request

Go to Nodes → HTTP Request. Use the URL https://www.colt.net/resources/type/news/, and set response format to text. This gets the raw HTML of the news page for analysis.

Expected: Raw HTML text is returned, visible in node output.

Common mistake: Not setting response format correctly may cause parsing errors.

Step 3: Extract News Links with the HTML Node

Insert an HTML Node named “Extract the HTML with the right css class.” Configure operation as “extractHtmlContent”. Use CSS selector div:nth-child(9) > div:nth-child(3) > a:nth-child(2) to extract href attributes of links displayed on the news page.

Expected: You get an array of news post URLs.

Common mistake: Incorrect CSS selectors result in empty or incorrect link lists.

Step 4: Convert the Links Array into Individual Items with ItemLists Node

Add an ItemLists Node named “Create single link items”. Select the field to split out as data created in the previous step to transform the array into individual JSON items for each link.

Expected: Each link becomes a separate item for further processing.

Common mistake: Forgetting to specify the correct source field for splitting.

Step 5: Extract Post Dates with HTML Node

Add another HTML Node named “Extract date,” using CSS selector div:nth-child(9) > div:nth-child(2) > span:nth-child(1) to extract the dates corresponding to each post on the main news page.

Expected: An array of post dates matching extracted links.

Common mistake: Dates not matching links if selectors are off.

Step 6: Convert Dates Array to Individual Items Using ItemLists Node

Add ItemLists Node named “Create single date items.” Set it to split out data as in step 4.

Expected: Dates available as individual items aligned with links.

Common mistake: Mismatch in length of links and dates arrays causing errors.

Step 7: Merge Dates and Links by Position

Use a Merge Node named “Merge date & links” with mode “combine” and combinationMode “mergeByPosition” to pair each link with its date.

Expected: Each news post item now contains both link and date.

Common mistake: Using wrong merge mode causes data misalignment.

Step 8: Filter News Posts from the Last 7 Days with Code Node

Add a Code Node named “Select posts of last 7 days.” Use the following JavaScript code:

const currentDate = new Date();
const sevenDaysAgo = new Date(currentDate.setDate(currentDate.getDate() - 7));

const filteredItems = items.filter(item => {
    const postDate = new Date(item.json["Date"]);
    return postDate >= sevenDaysAgo;
});

return filteredItems;

This filters out news older than a week, keeping only current posts.

Expected: Only news posts from the past 7 days remain.

Common mistake: Date parsing errors if date formats differ.

Step 9: Fetch Full News Content of Filtered Items via HTTP Request

Insert another HTTP Request Node named “HTTP Request1”. Use the URL from each item’s Link field dynamically (=$json["Link"]). This retrieves the complete HTML content of each individual news article.

Expected: Raw HTML of each news post is pulled for content extraction.

Common mistake: Forgetting to use dynamic expressions for URLs results in errors.

Step 10: Extract Title and Content from Each Post with HTML Node

Add an HTML Node named “Extract individual posts.” Configure with two CSS selectors:

  • Title: h1.fl-heading > span:nth-child(1)
  • Content: .fl-node-5c7574ae7d5c6 > div:nth-child(1)

This pulls the headline and main text of each article.

Expected: Structured title and content extracted.

Common mistake: Selectors might need adjustment if site changes.

Step 11: Merge Extracted Content with Corresponding Date and Link

Use a Merge Node named “Merge Content with Date & Link” in “combine” mode by position. This consolidates title, content, date, and links into one item.

Expected: Each item now fully represents one news post with metadata.

Common mistake: Out-of-sync merges due to oddly ordered data.

Step 12: Generate Summary of Each News Post with OpenAI GPT-4 Node

Add the OpenAI Node named “Summary.” Use the GPT-4 preview model. The prompt template is:

=Create a summary in less than 70 words {{ $json["content"] }}

This generates concise, readable abstracts for quick review.

Expected: Each post has a short AI-generated summary.

Common mistake: Missing API credentials causes failure.

Step 13: Extract Three Technical Keywords with OpenAI Node

Use another OpenAI Node named “Keywords” with the prompt:

=name the 3 most important technical keywords in {{ $json["content"] }} ? just name them without any explanations or other sentences

This tags each post with relevant technical terms.

Expected: Three keywords returned per article.

Common mistake: Incorrect prompt format yields poor keywords.

Step 14: Rename and Prepare Summary & Keywords for Merging

Add two Set Nodes called “Rename Summary” and “Rename keywords.” Use expressions:

  • Summary: =$json["message"]["content"] saved as summary
  • Keywords: =$json["message"]["content"] saved as keywords

This cleans up the JSON fields to usable keys.

Expected: Clean summary and keyword fields available.

Common mistake: Forgetting to specify output fields correctly.

Step 15: Merge Summaries and Keywords

Use a Merge Node named “Merge” with combination by position to combine the renamed summary and keywords outputs.

Expected: Each news post now has content, summary, and keywords grouped.

Common mistake: Mismatched item counts cause errors.

Step 16: Merge ChatGPT Output with News Metadata

Use “Merge ChatGPT output with Date & Link” node to combine all information from the AI output with the previously collected date and link metadata.

Expected: A final enriched news post item with all details.

Common mistake: Wrong merge mode could misalign data.

Step 17: Save Final News Records into NocoDB Database

Add the NocoDB Node named “NocoDB news database.” Connect your NocoDB API token and select the target table. Map fields:

  • News_Source: Static value “Colt”
  • Title, Date, Link, Summary, Keywords: Values from JSON fields

Expected: Structured news data saved in your database for further use.

Common mistake: Incorrect field mappings can cause data loss.

Customizations ✏️

  • Change News Source URL: Modify the URL in the “Retrieve the web page for further processing” HTTP Request node to another news site by adjusting the CSS selectors accordingly.
  • Adjust Date Range: In the “Select posts of last 7 days” Code Node, change the days back from 7 to a preferred time frame falling in sync with your schedule.
  • Use Different AI Models: Swap the OpenAI GPT-4 preview model with GPT-3.5 or custom prompt tuning in the “Summary” and “Keywords” nodes for customized outputs.
  • Switch Database Target: Replace NocoDB with other SQL integrations available in n8n, such as MySQL or PostgreSQL nodes, to fit your infrastructure.
  • Add Email Notifications: After data is stored, add a Gmail node to notify your team of new summaries automatically.

Troubleshooting 🔧

Problem: “No data extracted from HTML Node”

Cause: CSS selectors are incorrect or the webpage structure changed.

Solution: Use browser inspect tool to verify correct CSS selectors and update the HTML nodes accordingly.

Problem: “OpenAI API Key Unauthorized”

Cause: Expired or incorrect OpenAI API credentials.

Solution: Go to n8n credentials, check your OpenAI API key, and refresh if needed.

Problem: “Merge Node Data Misalignment”

Cause: Merged nodes use wrong combination mode or inputs are out-of-sync.

Solution: Ensure all merges are set to “combine” and “mergeByPosition” to correctly align items.

Pre-Production Checklist ✅

  • Verify CSS selectors with browser inspector for links, dates, title, and content.
  • Confirm OpenAI API credentials are active and permitted for GPT-4 usage.
  • Test HTTP Requests separately to confirm pages and posts are reachable.
  • Run workflow in debug mode to ensure data flows correctly after each node.
  • Backup database before first data insertion to prevent accidental overwrites.
  • Schedule a test run close to next scheduled time to verify automation triggers properly.

Deployment Guide

After building and testing this workflow, activate it in your n8n editor by toggling the active switch. The workflow will then automatically run as scheduled, fetching weekly updates.

Monitor execution logs in n8n for errors and review the NocoDB database for stored records. Adjust CSS selectors or date filters as required for website changes or evolving needs.

FAQs

Can I use other AI services instead of OpenAI?

While alternative AI providers exist, this workflow is tailored for OpenAI’s GPT-4. Switching would require adjusting prompt structures and authentication tokens.

Does this workflow consume many OpenAI tokens?

Each summary and keyword extraction call consumes tokens based on content length, so expect moderate usage aligned with your weekly news volume.

Is my news data securely stored?

NocoDB stores your data safely on your infrastructure or cloud provider of choice. Ensure secure API credential management in n8n to keep information protected.

Conclusion

By following this detailed guide, you’ve automated the extraction, summarization, and tagging of the latest news posts from a telecom site lacking an RSS feed. This saves significant manual effort, reduces missed updates, and centralizes data for better decision-making.

Sarah now spends less than an hour weekly monitoring updates instead of hours, increasing productivity and accuracy. Next, consider automating alerts via email or Slack based on keywords or integrating sentiment analysis for deeper insights.

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation in n8n (Beginner Guide)

A complete beginner guide to building an AI-powered SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free