Automate Webpage Content Fetching & Markdown Conversion with n8n

This n8n workflow automates fetching webpage content via HTTP requests, cleans and simplifies HTML, and converts it into Markdown, solving lengthy manual content extraction and formatting. Save hours in content processing and enable smarter AI integrations with this streamlined approach.
chatTrigger
agent
lmChatOpenAi
+7
Workflow Identifier: 2020
NODES in Use: chatTrigger, agent, lmChatOpenAi, executeWorkflowTrigger, set, httpRequest, if, markdown, toolWorkflow, stickyNote

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

What This Automation Does

This workflow gets web page content from a link you give it.

It cleans the page to keep only the main parts and removes scripts, ads, and other clutter.

You can choose to simplify the content even more by replacing links and images with placeholders.

After cleaning, it changes the HTML into Markdown form, which is easier to read and use with AI tools.

If the final content is too long, the workflow tells you with an error message instead of sending too much data.


Who Should Use This Workflow

This helps people who want to get clean text from webpages without doing it by hand every day.

It fits well for users working with teams, preparing reports, or feeding cleaned text into AI models for summaries or analysis.

No deep technical skills required, but some basic understanding of URLs and queries is needed.


Tools and Services Used

  • n8n Workflow: Runs the full process automatically.
  • HTTP Request Node: Gets raw HTML content from a URL.
  • Set Nodes: Parse query strings, configure limits, and prepare data.
  • If Nodes: Check for errors and decide if simplification is needed.
  • Regex Functions: Extract webpage body and remove unwanted tags.
  • Markdown Conversion Node: Turn clean HTML into Markdown text.
  • OpenAI API (optional): For further processing or summarization.

Inputs, Processing Steps, and Outputs

Inputs

  • A chat message with a query string containing the web page URL.
  • An optional parameter indicating if the content should be simplified.
  • An optional maximum length limit to control output size.

Processing Steps

  • Parse the query string into usable parameters.
  • Set a default or custom maximum content length.
  • Make an HTTP request to get full HTML content from the URL.
  • Check for errors in fetching content and handle them gracefully.
  • Extract only the body part of the HTML.
  • Remove scripts, iframes, videos, and other noisy HTML tags.
  • Optionally simplify by replacing real URLs with placeholders.
  • Convert the cleaned content to Markdown format.
  • Check the length of the Markdown content and return it if within limits.

Outputs

  • The Markdown text of the cleaned webpage content.
  • Error messages if content is too long or URL is invalid.

Beginner Step-by-Step: How to Use This Workflow in n8n Production

Download and Import Workflow

  1. Find and click the “Download” button on this page to save the workflow file.
  2. Open the n8n editor where the workflow will run.
  3. Use the menu option to “Import from File” and upload the downloaded JSON workflow.

Configure Credentials and Parameters

  1. Provide the required API keys or credentials, such as the OpenAI API key in the credential settings.
  2. Check if there are any IDs, emails, channel names, or folder paths in nodes and change them to fit your environment.
  3. If needed, update URL parameters or query settings so the workflow matches your content sources.

Test and Activate Workflow

  1. Run a test trigger using a chat message input with the query string, for example:
    ?url=https://example.com&method=full

    to confirm the workflow works.

  2. Once tests pass, activate the workflow to let it run automatically on new chat messages.
  3. Monitor workflow runs for errors and adjust settings if necessary.

If using self-host n8n, consider reviewing self-host n8n resources to optimize and secure the setup.


Common Edge Cases or Failures

  • If the input chat message does not contain a proper query string, the workflow will not run properly. Always check input format.
  • Malformed or unusual webpage HTML without clear tags can cause extraction failures.
  • Using a too low maxlimit may cause early errors about content length.
  • Invalid URLs or unreachable sites will return error messages from the HTTP Request node.

Ideas for Customization

  • Change the default maxlimit value to control output size as suited to user needs.
  • Add removal of other types of HTML elements by extending regex in the cleaning step.
  • Customize error messages to use language or tone matching your audience.
  • Switch the OpenAI model in the AI nodes if using them to try different AI capabilities.
  • Add new processing methods beyond “full” and “simplify” by editing the workflow logic.

Summary

✓ This workflow automatically fetches and cleans webpage content.

✓ It outputs the cleaned content in Markdown for easy reading or AI tools.

→ It saves many hours of manual copying, pasting, and cleanup.

→ It handles errors and length limits safely to avoid failures.

Frequently Asked Questions

The workflow needs a chat message containing a query string with parameters like ‘url’ and ‘method’.
It checks the HTTP request response for errors and sends back a clear error message instead of stopping silently.
Yes, the maximum length limit can be set dynamically via the ‘maxlimit’ query parameter or by editing the configuration node.
Yes, the workflow works on any n8n environment, including self-host n8n servers. Using self-host n8n helps manage such setups.

Promoted by BULDRR AI

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation in n8n

A complete beginner guide to building an AI-powered SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free