Automate Colombian Electronic Invoice Extraction with n8n & OpenAI

This workflow automates the extraction and validation of Colombian electronic invoices received via Gmail, saving time and reducing errors. By integrating n8n nodes with OpenAI and Google services, it processes ZIP invoice attachments, extracts key data, and organizes it efficiently.
gmailTrigger
agent
googleSheets
+14
Workflow Identifier: 1963
NODES in Use: Gmail Trigger, Code, Filter, Compression, SplitInBatches, NoOp, Switch, ExtractFromFile, XML, Merge, Aggregate, LangChain Agent, OpenAI Chat Model, Calculator, Google Drive, Google Sheets, Sticky Note

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

What this workflow does

This workflow automatically processes electronic invoice ZIP files received by email. It extracts and validates data from XML and PDF invoices, checks totals with a calculator, and organizes files in Google Drive and Google Sheets. This saves time and reduces errors in manual invoice handling.

It targets small business owners who get zipped invoices via Gmail and must follow Colombia’s DIAN rules. The workflow downloads ZIP attachments, unzips them, separates PDFs and XMLs, extracts data using OpenAI GPT-4o-mini with LangChain, verifies amounts, and saves everything orderly.


Who should use this workflow

This is for business owners or accountants who spend hours manually processing zipped electronic invoices from emails. They need to save time and avoid errors with invoice totals, dates, and IDs. It’s especially useful for those following Colombian DIAN electronic invoicing and tax rules.

Users must have Gmail, Google Drive, Google Sheets, and OpenAI accounts set up with API access. Some basic n8n workflow import and credential linking knowledge helps but deep programming skills are not required.


Tools and services used

  • Gmail: To check email every 30 minutes and download ZIP attachments containing invoices.
  • n8n automation platform: Workflow editor and nodes for email trigger, file extraction, filtering, merging, and Google integrations.
  • OpenAI GPT-4o-mini with LangChain: Parses PDF text and XML JSON to extract key invoice details like invoice numbers, NITs, totals, and CUFE.
  • Google Drive: To save original invoice PDFs, rename them based on extracted data, and organize files.
  • Google Sheets: To append or update records of invoices in a database and avoid duplicates.

Inputs, processing steps, and outputs

Inputs

  • New emails with ZIP file attachments containing Colombian electronic invoices.

Processing Steps

  • Every 30 minutes, check Gmail inbox for emails with ZIP attachments.
  • Extract attachment filenames and MIME types using a Code node.
  • Filter only ZIP files to proceed.
  • Unzip files to separate contained PDFs and XMLs.
  • Extract filenames and MIME types again from extracted files.
  • Use a Switch node to separate PDFs from XMLs for different processing.
  • Extract text from PDF invoices.
  • Parse XML to JSON format.
  • Merge and aggregate PDF text and XML data into one batch for processing.
  • Use a LangChain Agent node with GPT-4o-mini to extract structured invoice info, including:
    • Invoice type (e.g., Factura, Nota Crédito)
    • Invoice number
    • Issue date (YYYY-MM-DD)
    • Issuer and receiver NITs (without verification digits)
    • Issuer legal name
    • Subtotal, VAT, and total amounts
    • CUFE (invoice unique code)
    • Brief purchase summary (under 20 words)

    This node also verifies total = subtotal + VAT using a calculator tool.

  • Upload original PDF to Google Drive in a chosen folder.
  • Rename the uploaded PDF file with pattern: YYYY-MM-DD-NUMERO_FACTURA.pdf.
  • Append or update invoice record in Google Sheets using a unique key NIT_Emisor + Numero_Factura to avoid duplicates.

Outputs

  • PDF invoices saved and renamed in Google Drive.
  • An updated Google Sheets database with validated invoice data and no duplicates.

Beginner step-by-step: How to use this workflow in n8n

Import the workflow

  1. Download the workflow file using the Download button on this page.
  2. Open the n8n editor where workflows are managed.
  3. Click “Import from File” and select the downloaded workflow file.

Configure credentials and IDs

  1. Add your Gmail OAuth2 credentials so the Gmail Trigger can access your inbox.
  2. Input OpenAI API key into the LangChain Agent node to enable invoice data extraction.
  3. Connect your Google Drive credentials and specify the folder ID where PDFs will save.
  4. Link Google Sheets API credentials and specify the spreadsheet and worksheet to update invoice records.
  5. Check any IDs, emails, or folder references and update them per your setup.

Test and activate

  1. Manually run the workflow with a test email carrying a ZIP invoice attachment to verify all steps work.
  2. Review outputs in Google Drive and Sheets for correctness.
  3. Activate the workflow by toggling the active switch to allow automatic runs every 30 minutes.
  4. If running on your own server, consider self-host n8n for better control and uptime.

Customization ideas

  • Change Gmail polling interval in the Gmail Trigger for faster or slower email checking.
  • Add extra validation nodes, like checking date formats strictly against DIAN rules.
  • Expand the Switch node to handle other invoice-related file types such as TXT or JSON.
  • Organize Google Drive uploads into vendor-specific or invoice-type folders by changing folder IDs dynamically.
  • Upgrade OpenAI model in the LangChain Agent node to GPT-4 if more precise data extraction is needed.

Handling errors and edge cases

  • If no attachments trigger the workflow, check the Gmail Trigger filter query and confirm attachment download is enabled.
  • Missing or incomplete extracted fields from LangChain may mean prompt errors or invalid OpenAI keys. Double-check prompt text and credentials.
  • Duplicate or missing rows in Google Sheets likely result from incorrect unique key construction or insufficient API permissions. Ensure keys combine issuer NIT and invoice number properly.

Summary of results

✓ Automatically processes ZIP invoice attachments from email.

✓ Extracts and validates key invoice data with AI and calculator checks.

✓ Saves original PDFs to Google Drive with proper names.

✓ Updates Google Sheets database to prevent duplicates.

→ Saves hours of manual work each week.

→ Reduces costly errors in tax reporting and invoice management.

Frequently Asked Questions

This workflow is designed for Gmail, but can be adapted to other email services if they support downloading attachments and provide compatible n8n nodes.
OpenAI usage depends on invoice volume. You can reduce usage by processing invoices in batches with the batching node.
It uses a unique key combining issuer NIT and invoice number to check and update existing records, preventing duplicates.
Yes. All data stays within your n8n instance and linked services with encrypted credentials. No external storage beyond connected accounts occurs.

Promoted by BULDRR AI

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation Workflows in n8n

A complete beginner guide to building an AI SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free