Automate Colombian Electronic Invoice Extraction with n8n & OpenAI

This workflow automates the extraction and validation of Colombian electronic invoices received via Gmail, saving time and reducing errors. By integrating n8n nodes with OpenAI and Google services, it processes ZIP invoice attachments, extracts key data, and organizes it efficiently.
gmailTrigger
agent
googleSheets
+14
Workflow Identifier: 1963
NODES in Use: Gmail Trigger, Code, Filter, Compression, SplitInBatches, NoOp, Switch, ExtractFromFile, XML, Merge, Aggregate, LangChain Agent, OpenAI Chat Model, Calculator, Google Drive, Google Sheets, Sticky Note
Automate Colombian invoices with n8n and OpenAI

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

What this workflow does

This workflow automatically processes electronic invoice ZIP files received by email. It extracts and validates data from XML and PDF invoices, checks totals with a calculator, and organizes files in Google Drive and Google Sheets. This saves time and reduces errors in manual invoice handling.

It targets small business owners who get zipped invoices via Gmail and must follow Colombia’s DIAN rules. The workflow downloads ZIP attachments, unzips them, separates PDFs and XMLs, extracts data using OpenAI GPT-4o-mini with LangChain, verifies amounts, and saves everything orderly.


Who should use this workflow

This is for business owners or accountants who spend hours manually processing zipped electronic invoices from emails. They need to save time and avoid errors with invoice totals, dates, and IDs. It’s especially useful for those following Colombian DIAN electronic invoicing and tax rules.

Users must have Gmail, Google Drive, Google Sheets, and OpenAI accounts set up with API access. Some basic n8n workflow import and credential linking knowledge helps but deep programming skills are not required.


Tools and services used

  • Gmail: To check email every 30 minutes and download ZIP attachments containing invoices.
  • n8n automation platform: Workflow editor and nodes for email trigger, file extraction, filtering, merging, and Google integrations.
  • OpenAI GPT-4o-mini with LangChain: Parses PDF text and XML JSON to extract key invoice details like invoice numbers, NITs, totals, and CUFE.
  • Google Drive: To save original invoice PDFs, rename them based on extracted data, and organize files.
  • Google Sheets: To append or update records of invoices in a database and avoid duplicates.

Inputs, processing steps, and outputs

Inputs

  • New emails with ZIP file attachments containing Colombian electronic invoices.

Processing Steps

  • Every 30 minutes, check Gmail inbox for emails with ZIP attachments.
  • Extract attachment filenames and MIME types using a Code node.
  • Filter only ZIP files to proceed.
  • Unzip files to separate contained PDFs and XMLs.
  • Extract filenames and MIME types again from extracted files.
  • Use a Switch node to separate PDFs from XMLs for different processing.
  • Extract text from PDF invoices.
  • Parse XML to JSON format.
  • Merge and aggregate PDF text and XML data into one batch for processing.
  • Use a LangChain Agent node with GPT-4o-mini to extract structured invoice info, including:
    • Invoice type (e.g., Factura, Nota Crédito)
    • Invoice number
    • Issue date (YYYY-MM-DD)
    • Issuer and receiver NITs (without verification digits)
    • Issuer legal name
    • Subtotal, VAT, and total amounts
    • CUFE (invoice unique code)
    • Brief purchase summary (under 20 words)

    This node also verifies total = subtotal + VAT using a calculator tool.

  • Upload original PDF to Google Drive in a chosen folder.
  • Rename the uploaded PDF file with pattern: YYYY-MM-DD-NUMERO_FACTURA.pdf.
  • Append or update invoice record in Google Sheets using a unique key NIT_Emisor + Numero_Factura to avoid duplicates.

Outputs

  • PDF invoices saved and renamed in Google Drive.
  • An updated Google Sheets database with validated invoice data and no duplicates.

Beginner step-by-step: How to use this workflow in n8n

Import the workflow

  1. Download the workflow file using the Download button on this page.
  2. Open the n8n editor where workflows are managed.
  3. Click “Import from File” and select the downloaded workflow file.

Configure credentials and IDs

  1. Add your Gmail OAuth2 credentials so the Gmail Trigger can access your inbox.
  2. Input OpenAI API key into the LangChain Agent node to enable invoice data extraction.
  3. Connect your Google Drive credentials and specify the folder ID where PDFs will save.
  4. Link Google Sheets API credentials and specify the spreadsheet and worksheet to update invoice records.
  5. Check any IDs, emails, or folder references and update them per your setup.

Test and activate

  1. Manually run the workflow with a test email carrying a ZIP invoice attachment to verify all steps work.
  2. Review outputs in Google Drive and Sheets for correctness.
  3. Activate the workflow by toggling the active switch to allow automatic runs every 30 minutes.
  4. If running on your own server, consider self-host n8n for better control and uptime.

Customization ideas

  • Change Gmail polling interval in the Gmail Trigger for faster or slower email checking.
  • Add extra validation nodes, like checking date formats strictly against DIAN rules.
  • Expand the Switch node to handle other invoice-related file types such as TXT or JSON.
  • Organize Google Drive uploads into vendor-specific or invoice-type folders by changing folder IDs dynamically.
  • Upgrade OpenAI model in the LangChain Agent node to GPT-4 if more precise data extraction is needed.

Handling errors and edge cases

  • If no attachments trigger the workflow, check the Gmail Trigger filter query and confirm attachment download is enabled.
  • Missing or incomplete extracted fields from LangChain may mean prompt errors or invalid OpenAI keys. Double-check prompt text and credentials.
  • Duplicate or missing rows in Google Sheets likely result from incorrect unique key construction or insufficient API permissions. Ensure keys combine issuer NIT and invoice number properly.

Summary of results

✓ Automatically processes ZIP invoice attachments from email.

✓ Extracts and validates key invoice data with AI and calculator checks.

Saves original PDFs to Google Drive with proper names.

✓ Updates Google Sheets database to prevent duplicates.

→ Saves hours of manual work each week.

→ Reduces costly errors in tax reporting and invoice management.

Automate Colombian invoices with n8n and OpenAI

Visit through Desktop to Interact with the Workflow.

Frequently Asked Questions

This workflow is designed for Gmail, but can be adapted to other email services if they support downloading attachments and provide compatible n8n nodes.
OpenAI usage depends on invoice volume. You can reduce usage by processing invoices in batches with the batching node.
It uses a unique key combining issuer NIT and invoice number to check and update existing records, preventing duplicates.
Yes. All data stays within your n8n instance and linked services with encrypted credentials. No external storage beyond connected accounts occurs.

Promoted by BULDRR AI

Related Workflows

Automate Twist Channel Creation and Messaging with n8n

This workflow automates creating and updating a channel in Twist and sending a personalized message to specific users. It eliminates manual setup errors and saves time managing Twist communications.

Automate Ideogram Image Generation with Google Sheets & Gmail

This workflow automates graphic design image generation via Ideogram AI, storing image data in Google Sheets and Google Drive, with email alerts via Gmail. It saves designers hours by automating image creation, remixing, review, and record-keeping.

Automate IT Support with Slack and OpenAI in n8n

Streamline IT support by automating Slack message handling using n8n and OpenAI. This workflow handles Slack DMs, filters bots, queries a Confluence knowledge base, and delivers AI-generated responses, improving support efficiency and response time.

Automate Crypto Analysis with CoinMarketCap & n8n AI Agent

Discover how this unique n8n workflow leverages CoinMarketCap’s multi-agent AI to deliver precise, real-time cryptocurrency insights directly via Telegram. Manage crypto data analysis efficiently with automated multi-source API integration.

Automate Gumroad to Beehiiv Subscriber Sync with n8n

Learn how to automatically add new Gumroad sales customers as Beehiiv newsletter subscribers using n8n automation. This workflow saves time by syncing sales data to Google Sheets CRM and notifying your Telegram channel instantly.

Generate On-Brand Blog Articles Using n8n and OpenAI

This workflow automates the creation of on-brand blog articles by analyzing existing company content using n8n and OpenAI. It extracts article structures and brand voice to produce consistent draft articles, saving significant content creation time.
1:1 Free Strategy Session
Your competitors are already automating. Are you still paying for it manually?

Do you want to adopt AI Automation?

Every hour your team does repetitive work, you're burning real money.
While you wait, faster businesses are cutting costs and moving quicker.
AI and automations aren't the future anymore — they're the present.

Book a live 1-on-1 session where we show you exactly which of your daily tasks can be automated — and what it’s costing you not to.