1. Opening Problem Statement
Meet Sarah, the accounts receivable specialist at a mid-sized company. Every day, she sifts through hundreds of emails in the Outlook inbox shared by her finance team. Many of these emails contain invoices from suppliers, but others are just newsletters, replies, or unrelated documents. Sarah spends hours downloading attachments, manually checking if they are invoices, extracting key information like invoice number and amount, and then entering them into the company’s Excel records. This manual process is error-prone and steals precious time from more strategic work.
On average, Sarah wastes 4-5 hours daily on this tedious work and often makes data entry errors that cause delays in payments and vendor frustrations. What if Sarah could automate this entire invoice processing pipeline using n8n? This way, she could reclaim valuable time and increase accuracy effortlessly.
2. What This Automation Does
Once triggered on a schedule, this n8n workflow performs a full cycle of invoice processing from Microsoft Outlook inbox emails. Specifically, it:
- Fetches all emails from a specific Outlook folder received within the last hour containing attachments.
- Uses AI-powered text classification to analyze each email’s content and filter only those that are actual invoices, ignoring irrelevant emails.
- Downloads attachments from these emails and splits them into individual files.
- Uses Google Gemini’s AI vision capabilities to classify each attachment to confirm if it is indeed an invoice issued to the company.
- Extracts detailed invoice data from the confirmed invoice PDFs using Optical Character Recognition (OCR) via Google Gemini API.
- Automatically appends extracted invoice details into a Microsoft Excel workbook for easy tracking and review.
This process eliminates hours of manual effort, reduces errors from manual data entry, and provides a seamless invoice tracking system integrating Microsoft Outlook, AI classification, and Excel.
3. Prerequisites ⚙️
- Microsoft Outlook account with access to the invoice mailbox 📧
- n8n account (cloud or self-hosted) ⏱️
- Google Gemini (PaLM) API access for AI classification and OCR 🔐
- Microsoft Excel 365 account with access to the workbook where invoice data will be appended 📊
If you prefer self-hosting your n8n, you can find hosted solutions like Hostinger (https://buldrr.com/hostinger).
4. Step-by-Step Guide
Step 1: Set Up Scheduled Trigger to Check New Outlook Emails
Navigate to Settings → Nodes in n8n and add a Schedule Trigger node. Configure it to run every hour by selecting the interval field and choosing “hours”. This ensures the workflow checks for new invoice emails periodically.
Expected Outcome: The workflow will activate automatically every hour without manual initiation.
Common Mistake: Forgetting to set the time interval will make the workflow run only once or unpredictably.
Step 2: Fetch Recent Outlook Messages with Attachments
Add a Microsoft Outlook node, set the operation to getAll, and use filters to retrieve messages received in the last hour with attachments from the specific invoice mailbox folder. Make sure to connect your Microsoft Outlook OAuth2 credentials and specify the folder ID.
Visual: You should see a list of emails with fields like sender, subject, and attachments in the node’s output.
Common Mistake: Incorrect folder ID or missing authentication will prevent messages from fetching.
Step 3: Convert HTML Email Body to Markdown
Add a Markdown node to convert the raw HTML email body to markdown text. This helps prepare text for the AI classification that follows.
Expected Outcome: Email content ready in markdown format for analysis.
Step 4: Use AI Text Classifier to Identify Invoices
Insert the Message Classifier node (n8n-nodes-langchain.textClassifier) configured with Google Gemini for AI-driven classification. Feed it the formatted email text to classify emails into the “invoice” category or others.
Example Input: The classifier uses sender email, subject, and markdown body fields.
Common Mistake: Not providing sufficient input text or misconfigured categories can cause misclassifications.
Step 5: Split Messages into Batches for Processing
Add a SplitInBatches node to process individual messages batch-by-batch. This controls throughput so API limits aren’t hit and simplifies downstream processing.
Expected Outcome: One email’s processing completes fully before moving to the next.
Step 6: Download Attachments of Valid Invoice Messages
Connect a Microsoft Outlook node set to get operation with “Download Attachments” option enabled. Link it with message references to download all attachments of invoice-flagged emails.
Expected Outcome: All attachment files in binary format are available for processing.
Step 7: Split Attachments into Individual Files using Code Node
Add a Code node to iterate through each attachment binary and split them into separate items for individual classification.
Code Snippet to Copy:
let results = [];
for (const item of $input.all()) {
for (key of Object.keys(item.binary)) {
results.push({
json: {
fileName: item.binary[key].fileName
},
binary: {
data: item.binary[key],
}
});
}
}
return results;Expected Outcome: Each attachment is now a separate item in the workflow queue.
Step 8: Classify Each Attachment as Invoice Using Google Gemini AI
Use an HTTP Request node to call Google Gemini’s API for document understanding. Send the attachment’s binary data with a prompt asking if it is an invoice issued to the company.
HTTP Request Configuration:
- Method: POST
- URL:
https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent - Body (JSON): includes binary mime type and data, plus prompt text for AI interpretation.
- Authentication: Google Palm API credentials
Expected Outcome: Boolean flags indicating whether the file is an invoice and issued to the company.
Step 9: Filter Attachments To Only Process True Invoices
Add a Filter node to keep only items where the AI classification flags “is_invoice” and “is_issued_to_company” as true.
Expected Outcome: Non-invoice files are dropped from further processing.
Step 10: Extract Invoice Data Using OCR with Google Gemini
Use another HTTP Request node calling the same Gemini endpoint to extract detailed invoice data from the filtered invoice PDFs. The request includes the binary invoice data and a detailed prompt to obtain structured invoice information (invoice number, dates, amounts, supplier info, etc.).
Example Output: JSON containing invoice_number, invoice_date, invoice_amount, supplier_name, and line item services.
Expected Outcome: Structured invoice data ready for insertion into Excel.
Step 11: Parse and Structure Output Data
Add a Set node to format the JSON output, merging extracted invoice data with a reference to the original email (excluding full email body for privacy).
Expected Outcome: Single structured output object containing invoice details and linked email metadata.
Step 12: Append Extracted Invoice to Microsoft Excel
Use the Microsoft Excel 365 node configured to append rows to a specific worksheet within a workbook. Map the extracted invoice data fields to Excel columns for tracking and review.
Expected Outcome: New invoice entries appear immediately in your Excel spreadsheet without manual entry.
Step 13: Add Wait Node for Controlled Execution
Lastly, a Wait node pauses for 1 second to ensure orderly execution of processing batches and prevent API spamming.
5. Customizations ✏️
- Expand AI Invoice Categories: In the Message Classifier node, add more categories like “purchase order” or “credit note” by editing the category list JSON. This allows nuanced email sorting.
- Change Output Destination: Instead of Excel, connect a database node (e.g., MySQL or PostgreSQL) to save invoice data to your ERP system.
- Filter Duplicate Invoices: Insert a Remove Duplicates node after parsing to avoid processing the same invoice twice, especially if your inbox receives forwarded emails.
- Use Email Trigger Instead of Schedule: Replace the Schedule Trigger node with an Outlook Trigger node for near real-time processing when a new email arrives (note date filters are not supported by Outlook Trigger).
- Expand Data Extraction: Modify the Gemini extraction prompt to include custom fields your finance team requires such as tax numbers or payment terms.
6. Troubleshooting 🔧
Problem: “No emails fetched from Outlook despite emails existing in mailbox.”
Cause: Incorrect folder ID or missing permissions for the Outlook credentials.
Solution: Double-check folder IDs and ensure OAuth2 credentials have correct mailbox access by re-authenticating in the Outlook node settings.
Problem: “Attachments not downloading or empty binary data.”
Cause: The download attachments option might be off or the message ID is not correctly passed.
Solution: Verify “Download Attachments” is enabled and that the message ID expression is correct: {{$json.id}}.
Problem: “AI classification returns unexpected categories or empty results.”
Cause: Insufficient or malformed input to the classifier, or API key permission issues.
Solution: Test the input text formatting by printing in a Set node and ensure API keys are valid and quota not exceeded.
7. Pre-Production Checklist ✅
- Verify Microsoft Outlook OAuth2 credentials are properly connected and can access the targeted mailbox and folder.
- Test the Google Gemini API credentials by using simple sample requests in Postman or n8n HTTP Request nodes.
- Confirm workbook and worksheet IDs in the Microsoft Excel 365 node are correct and accessible.
- Simulate workflow run with sample invoice emails to verify the entire extraction and logging pipeline functions without errors.
- Backup your Excel workbook before deploying to prevent data loss.
8. Deployment Guide
After completing setup, activate the Schedule Trigger node to start the hourly check. Monitor the workflow execution via the n8n UI Executions tab to ensure smooth processing and no errors.
For high-volume environments, consider adding throttling or splitting batches further to respect API rate limits. Utilize n8n’s built-in logging and alerting capabilities to watch automation health.
9. FAQs
Q: Can I use another AI provider instead of Google Gemini for classification and OCR?
A: While this workflow uses Google Gemini for its powerful PDF support and generationConfig features, you can substitute other AI services if they support similar document understanding and API capabilities, but you may need to adjust node configurations.
Q: Does this automation consume many API credits?
A: Yes, frequent classification and OCR calls use API credits depending on your Google Gemini subscription plan. Optimize frequency and batch sizes to reduce usage.
Q: Is my invoice data safe?
A: The data flows securely using OAuth2 and API authentication. Sensitive files are processed transiently and not stored beyond workflow scope in n8n unless configured to save permanently.
10. Conclusion
In this tutorial, you just built an advanced n8n workflow to automate invoice processing from Microsoft Outlook emails. You set up AI-based classification to accurately identify invoices, used Google Gemini’s OCR for detailed data extraction, and seamlessly logged results into Microsoft Excel. This automation can save Sarah and her team hours every day, reduce errors, and accelerate payments.
Next, consider expanding this system to automatically reconcile invoices with purchase orders or connect it to your accounting system for end-to-end digital finance management. Happy automating!