Opening Problem Statement
Meet Sarah, a content manager at a growing tech company. Every week, Sarah faces the tedious challenge of writing blog articles that consistently match her company’s brand voice and style. Despite spending hours drafting and editing, her team often struggles with maintaining content consistency across posts, leading to delays, costly rewrites, and missed marketing opportunities. With limited resources and deadlines looming, Sarah needs a solution to automate content generation while preserving her brand’s unique style.
This is where the unique n8n workflow we’re discussing shines. It taps into Sarah’s existing company blog content, uses advanced AI analysis to understand the writing style and brand voice, and automatically drafts new articles aligned with those tones. By automating these steps, Sarah can reduce the time spent brainstorming and revising content by hours each week, freeing her to focus on strategy and engagement.
What This Automation Does
This workflow is a powerful content assistant that leverages n8n and OpenAI’s AI models specifically to:
- Scrape the latest published blog posts from a specified company blog website.
- Extract the main article content in HTML format and convert it to Markdown for efficient AI processing.
- Analyze aggregated article content to identify common structure, style, and layout patterns used across multiple articles.
- Use AI to extract detailed brand voice characteristics including tone, language style, and writing traits from the source content.
- Combine the analyzed style and brand voice guidelines to instruct an AI content generation agent to produce new on-brand article drafts.
- Automatically save the generated articles as drafts in a WordPress site for easy review and publishing.
In practical terms, this automation can save content teams many hours per article cycle, reduce inconsistency errors, and fast-track blog publishing workflows.
Prerequisites ⚙️
- n8n account with workflow editor access ⚙️
- OpenAI API key credential (for the OpenAI Chat Model and Langchain nodes) 🔑
- WordPress site with API access configured (for saving drafts) 📁
- Access to the target company blog URL for scraping 📧
- Basic familiarity with n8n node configuration and running workflows ⏱️
- Optional self-hosting for n8n if preferred (see Hostinger)
Step-by-Step Guide
Step 1: Trigger the Workflow Manually
In the n8n editor, start by selecting the Manual Trigger node labeled “When clicking ‘Test workflow’”. Click the “Execute Node” button to initiate the workflow. This manual trigger allows you to test or run the process on demand.
Outcome: The workflow begins scraping the content from the configured blog URL.
Common mistake: Forgetting to run the trigger node leaves the workflow idle.
Step 2: Fetch the Latest Blog Homepage
The HTTP Request node named “Get Blog” calls the company blog homepage URL (set here to https://blog.n8n.io). It retrieves the HTML page containing links to recent articles.
Outcome: Raw HTML data of the blog homepage is available for extraction.
Common mistake: Incorrect blog URL results in failure to fetch data.
Step 3: Extract Article URLs Using the HTML Node
The HTML node “Extract Article URLs” parses the homepage HTML using CSS selectors to extract the href attribute of article links from the blog’s structure (.item.post a.global-link selector). It returns an array of article URLs.
Outcome: Array of URLs pointing to individual blog posts.
Common mistake: If the blog HTML structure changes, this CSS selector must be updated, or no URLs will be extracted.
Step 4: Split URLs into Individual Items
The Split Out node “Split Out URLs” separates the array of article URLs into individual items, enabling processing each article independently.
Outcome: Each URL is an independent data item in the workflow.
Common mistake: Omitting this node leads to processing issues downstream expecting single article inputs.
Step 5: Limit to the Most Recent 5 Articles
The Limit node “Latest Articles” restricts the processed URLs to the 5 most recent articles, ensuring manageable data volume and focusing the AI analysis on fresh content.
Outcome: Only 5 URLs continue through the workflow.
Common mistake: Setting too high a limit can slow down the workflow unnecessarily.
Step 6: Fetch Individual Article Content
The HTTP Request node “Get Article” takes each article URL and fetches the full HTML content of the blog post page.
Outcome: Raw HTML of each individual article.
Common mistake: If URLs are incorrect or inaccessible, this node will fail.
Step 7: Extract Main Article Content
The HTML node “Extract Article Content” parses the blog post HTML to extract the main content block inside the .post-section CSS class. It returns the content as HTML.
Outcome: Cleaned HTML of the article body ready for conversion.
Common mistake: If the blog changes layout, the CSS selector must be updated.
Step 8: Convert HTML to Markdown
The Markdown node converts the extracted HTML content into Markdown format, optimizing the text for OpenAI token usage and preserving relevant formatting.
Outcome: Articles are in Markdown, easier for AI to interpret and generate content.
Common mistake: Not using Markdown increases token count and may reduce AI efficiency.
Step 9: Combine Articles for Aggregated Analysis
The Aggregate node “Combine Articles” merges all Markdown article bodies into one array, grouping content for style and voice analysis.
Outcome: A single compiled array of article content ready for AI processing.
Common mistake: Failing to merge means analysis runs separately and loses the aggregate context.
Step 10: Analyze Article Structure and Style Using OpenAI Chat
The Chain LLM node “Capture Existing Article Structure” sends the aggregated Markdown articles to OpenAI with a prompt to describe how to replicate the common structure, layout, language, and writing style. It captures guidance for the AI content generation.
Outcome: Guidelines about article structure and style are captured for use in future AI-driven writing.
Common mistake: Poor prompt formulation can yield unclear style output.
Step 11: Extract Brand Voice Characteristics
The Information Extractor node “Extract Voice Characteristics” further sends the article content to AI to identify specific brand voice traits such as tone, style, and language choices with examples.
Outcome: Detailed brand voice characteristics structured for instructing the content generator.
Common mistake: Inconsistent source content gives vague voice outputs.
Step 12: Merge Style and Voice for Final Instructions
The Merge node “Article Style & Brand Voice” combines the output from the style analysis and voice characteristic extraction nodes, preparing a unified guideline package for content generation.
Outcome: Comprehensive, AI-readable brand and style instructions.
Common mistake: Wrong merge mode can corrupt data structure.
Step 13: Set Instruction for New Article Generation
The Set node “New Article Instruction” defines a clear new article request instructing AI about the desired content topic, style, and brand voice to emulate. For example: “Write a comprehensive guide on using AI for document classification and document extraction…”
Outcome: A prompt instruction specifying the new blog article to be generated.
Common mistake: Vague instructions produce unfocused content.
Step 14: Generate the On-Brand Article Draft
The Information Extractor node “Content Generation Agent” is used here as the AI content generator. It uses the combined brand voice and style guidelines, plus the user instruction, to generate an on-brand content piece in Markdown format.
Outcome: A polished draft blog article matching the brand voice and style.
Common mistake: Over-reliance on AI without human editing can cause repetitive phrases.
Step 15: Save Draft Article to WordPress
The WordPress node “Save as Draft” takes the generated article content with title and summary, then creates a new draft post in your WordPress blog platform for review and editing by your team.
Outcome: Automation creates blog draft posts instantly, streamlining the publishing pipeline.
Common mistake: Incorrect WordPress credentials or API configuration causes failure to save drafts.
Customizations ✏️
- Change Source Blog URL: In the “Get Blog” HTTP Request node, update the URL parameter to your own company blog URL to scrape your own content source.
- Adjust Number of Articles: Modify the “Latest Articles” Limit node to increase or reduce how many recent posts you fetch for style and voice analysis.
- Modify New Article Instruction: In the “New Article Instruction” Set node, change the prompt text to guide AI for different article topics or tones specific to your brand or campaign goals.
- Post-Publish Automation: Extend workflow by adding a WordPress Publish node post draft review to automate the publishing after editorial approval.
- Voice Characteristic Storage: Store voice characteristic output in an external database for reuse and versioning across multiple article generations, reducing repeated AI analysis.
Troubleshooting 🔧
Problem: “No article URLs extracted” from the “Extract Article URLs” node.
Cause: The CSS selector is outdated due to website redesign.
Solution: Inspect the blog webpage structure, update the CSS selector in the node settings to match current page layout.
Problem: “WordPress API authorization error” when saving draft.
Cause: Incorrect or expired API keys or WordPress user permissions.
Solution: Verify WordPress credentials under credentials manager and ensure the user has appropriate permissions for draft creation.
Problem: “OpenAI API rate limits or errors” during style or voice extraction.
Cause: API quota exceeded or network issues.
Solution: Wait or upgrade your API plan. Check API key validity and network connection.
Pre-Production Checklist ✅
- Test that the manual trigger starts the workflow successfully.
- Verify remote blog URLs fetch correctly by HTTP nodes.
- Inspect HTML extraction nodes for correct CSS selector alignment.
- Confirm AI nodes have valid OpenAI credentials and return expected style and voice output formats.
- Test WordPress draft creation with sample content to verify credentials.
- Backup existing WordPress data before running the workflow in production.
Deployment Guide
Once tested successfully, activate the workflow to run manually when new draft content is needed or on a schedule if your n8n plan permits. Use the built-in execution logs to monitor for errors or issues in API calls or data extraction. Periodically update CSS selectors and verify API credentials to ensure smooth ongoing operations.
FAQs
Q: Can I use other content sources besides blogs?
A: Yes, any HTML-based content or text data accessible by n8n can be used, such as PDFs converted to text, social media posts, or internal documents.
Q: Does this workflow consume my OpenAI API credits?
A: Yes, each AI analysis and generation call uses credits. Optimize prompt size and reuse voice characteristics to minimize costs.
Q: Is my draft content secure?
A: Your data security depends on your n8n hosting and WordPress site security. Use HTTPS, secure API keys, and consider self-hosting for better control.
Conclusion
By building and using this specialized n8n workflow, Sarah has empowered her content team to generate high-quality, on-brand blog articles efficiently. The automation reduces repetitive research and drafting work, saving hours each week and ensuring consistent messaging aligned precisely with her brand voice.
Beyond blogging, this approach can be adapted for social media content creation, marketing emails, or internal communications. Experiment with different content sources and AI parameters to tailor to your needs. With n8n and OpenAI, automated, brand-savvy content generation is in your hands—ready to elevate your content game.
Happy automating!