What this workflow does
This workflow automatically processes electronic invoice ZIP files received by email. It extracts and validates data from XML and PDF invoices, checks totals with a calculator, and organizes files in Google Drive and Google Sheets. This saves time and reduces errors in manual invoice handling.
It targets small business owners who get zipped invoices via Gmail and must follow Colombia’s DIAN rules. The workflow downloads ZIP attachments, unzips them, separates PDFs and XMLs, extracts data using OpenAI GPT-4o-mini with LangChain, verifies amounts, and saves everything orderly.
Who should use this workflow
This is for business owners or accountants who spend hours manually processing zipped electronic invoices from emails. They need to save time and avoid errors with invoice totals, dates, and IDs. It’s especially useful for those following Colombian DIAN electronic invoicing and tax rules.
Users must have Gmail, Google Drive, Google Sheets, and OpenAI accounts set up with API access. Some basic n8n workflow import and credential linking knowledge helps but deep programming skills are not required.
Tools and services used
- Gmail: To check email every 30 minutes and download ZIP attachments containing invoices.
- n8n automation platform: Workflow editor and nodes for email trigger, file extraction, filtering, merging, and Google integrations.
- OpenAI GPT-4o-mini with LangChain: Parses PDF text and XML JSON to extract key invoice details like invoice numbers, NITs, totals, and CUFE.
- Google Drive: To save original invoice PDFs, rename them based on extracted data, and organize files.
- Google Sheets: To append or update records of invoices in a database and avoid duplicates.
Inputs, processing steps, and outputs
Inputs
- New emails with ZIP file attachments containing Colombian electronic invoices.
Processing Steps
- Every 30 minutes, check Gmail inbox for emails with ZIP attachments.
- Extract attachment filenames and MIME types using a Code node.
- Filter only ZIP files to proceed.
- Unzip files to separate contained PDFs and XMLs.
- Extract filenames and MIME types again from extracted files.
- Use a Switch node to separate PDFs from XMLs for different processing.
- Extract text from PDF invoices.
- Parse XML to JSON format.
- Merge and aggregate PDF text and XML data into one batch for processing.
- Use a LangChain Agent node with GPT-4o-mini to extract structured invoice info, including:
- Invoice type (e.g., Factura, Nota Crédito)
- Invoice number
- Issue date (YYYY-MM-DD)
- Issuer and receiver NITs (without verification digits)
- Issuer legal name
- Subtotal, VAT, and total amounts
- CUFE (invoice unique code)
- Brief purchase summary (under 20 words)
This node also verifies total = subtotal + VAT using a calculator tool.
- Upload original PDF to Google Drive in a chosen folder.
- Rename the uploaded PDF file with pattern: YYYY-MM-DD-NUMERO_FACTURA.pdf.
- Append or update invoice record in Google Sheets using a unique key NIT_Emisor + Numero_Factura to avoid duplicates.
Outputs
- PDF invoices saved and renamed in Google Drive.
- An updated Google Sheets database with validated invoice data and no duplicates.
Beginner step-by-step: How to use this workflow in n8n
Import the workflow
- Download the workflow file using the Download button on this page.
- Open the n8n editor where workflows are managed.
- Click “Import from File” and select the downloaded workflow file.
Configure credentials and IDs
- Add your Gmail OAuth2 credentials so the Gmail Trigger can access your inbox.
- Input OpenAI API key into the LangChain Agent node to enable invoice data extraction.
- Connect your Google Drive credentials and specify the folder ID where PDFs will save.
- Link Google Sheets API credentials and specify the spreadsheet and worksheet to update invoice records.
- Check any IDs, emails, or folder references and update them per your setup.
Test and activate
- Manually run the workflow with a test email carrying a ZIP invoice attachment to verify all steps work.
- Review outputs in Google Drive and Sheets for correctness.
- Activate the workflow by toggling the active switch to allow automatic runs every 30 minutes.
- If running on your own server, consider self-host n8n for better control and uptime.
Customization ideas
- Change Gmail polling interval in the Gmail Trigger for faster or slower email checking.
- Add extra validation nodes, like checking date formats strictly against DIAN rules.
- Expand the Switch node to handle other invoice-related file types such as TXT or JSON.
- Organize Google Drive uploads into vendor-specific or invoice-type folders by changing folder IDs dynamically.
- Upgrade OpenAI model in the LangChain Agent node to GPT-4 if more precise data extraction is needed.
Handling errors and edge cases
- If no attachments trigger the workflow, check the Gmail Trigger filter query and confirm attachment download is enabled.
- Missing or incomplete extracted fields from LangChain may mean prompt errors or invalid OpenAI keys. Double-check prompt text and credentials.
- Duplicate or missing rows in Google Sheets likely result from incorrect unique key construction or insufficient API permissions. Ensure keys combine issuer NIT and invoice number properly.
Summary of results
✓ Automatically processes ZIP invoice attachments from email.
✓ Extracts and validates key invoice data with AI and calculator checks.
✓ Saves original PDFs to Google Drive with proper names.
✓ Updates Google Sheets database to prevent duplicates.
→ Saves hours of manual work each week.
→ Reduces costly errors in tax reporting and invoice management.
