Automate Invoice Data Extraction with n8n & LlamaParse

Tired of manually processing invoice PDFs from email? This workflow uses n8n with LlamaParse and OpenAI to automatically extract detailed invoice data and update Google Sheets, saving you hours and reducing errors.
gmailTrigger
httpRequest
lmOpenAi
+9
Workflow Identifier: 2065
NODES in Use: gmailTrigger, splitOut, gmail, aggregate, if, httpRequest, switch, wait, lmOpenAi, outputParserStructured, chainLlm, googleSheets

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

What This Workflow Does

This workflow watches a Gmail inbox for new invoice emails from [email protected] that have PDF attachments.

It uploads the PDFs to the LlamaParse API to extract text and tables.

Then it checks the completion status and waits until parsing is done.

After that, the workflow sends the parsed invoice content to OpenAI GPT-3.5 Turbo to extract specific data fields like invoice number, date, supplier, line items, VAT, and totals.

Extracted data is saved into a Google Sheet for invoice reconciliation.

The original email gets labeled as “invoice synced” to avoid repeat processing.

This saves time and reduces manual errors in invoice data entry.


Who Should Use This Workflow

Anyone who receives many invoice PDFs by email and spends too long copying data by hand.

This is good for accounts payable teams or small business owners tracking payments.

No advanced technical skills are required to run it after setup.


Tools and Services Used

  • Gmail: Triggers workflow on incoming invoice emails.
  • LlamaParse API: Cloud service that extracts text and tables from PDF invoices.
  • OpenAI GPT-3.5 Turbo via LangChain: Extracts structured invoice data from parsed Markdown content.
  • Google Sheets: Stores cleaned invoice data for reconciliation.
  • n8n: Automation platform that connects all nodes and runs the workflow.

How The Workflow Works

Inputs

New emails with PDF invoices from [email protected] arrive in Gmail.

Processing Steps

  • Detect email arrival using Gmail Trigger filtering for attachments and sender.
  • Get email labels using Split Out and Gmail Get nodes to check if already processed.
  • Use an If node to continue only if the email has PDF attachment and lacks the “invoice synced” label.
  • Upload PDF to LlamaParse API via HTTP Request node with multipart-form-data including the file.
  • Poll LlamaParse job status repeatedly with another HTTP Request, using a Switch node to handle SUCCESS, PENDING, ERROR, or CANCELED.
  • Pause via Wait node for 1 minute if parsing is still pending, then check again.
  • Once successful, get parsed Markdown invoice content from LlamaParse API.
  • Send Markdown text to the LangChain OpenAI Model node configured for gpt-3.5-turbo-1106 with zero temperature.
  • Chain LLM node uses a prompt to extract invoice details like dates, numbers, supplier info, line items with prices, VAT, and total amounts.
  • Structured Output Parser turns the AI output into clean JSON matching Google Sheets columns.
  • Set node maps this JSON to the sheet columns.
  • Google Sheets node appends data as a new row to the invoice reconciliation sheet.
  • Lastly, Gmail node adds the “invoice synced” label to the email for tracking.

Output

Cleaned and structured invoice data appears in Google Sheets.

The email is labeled to avoid duplicates.


Inputs and Outputs

  • Inputs: Incoming Gmail emails from [email protected] with PDF attachments.
  • Outputs: Structured invoice rows in Google Sheets and labeled source emails.

Beginner Step-by-Step: How to Use This Workflow in n8n

1. Import Workflow

Download the workflow file from this page using the Download button.

Open your n8n editor and choose Import from File to load the workflow.

2. Add Credentials

In n8n, add required API Keys and OAuth2 credentials:

  • Gmail with read/write and label permissions.
  • LlamaIndex API Key for LlamaParse.
  • OpenAI API Key for GPT-3.5 Turbo model.
  • Google Sheets credentials with edit access.

Update node fields like spreadsheet ID and sheet name if needed.

Check the email sender filter, default set to [email protected].

3. Test Workflow

Send a test invoice email to Gmail with a PDF attachment.

Run the workflow manually inside n8n to watch each step and confirm data extraction.

4. Activate for Production

Once tested, activate the workflow so it runs automatically every minute.

Monitor logs to catch any errors or failed parsing.

If desired, consider self-host n8n to control costs and keep data private.


Possible Issues and Solutions

No PDF Found or Wrong MIME Type

The email may miss PDF attachments or the MIME type is not “application/pdf”.

Check Gmail filtering and the If node condition for attachment type.

LlamaParse API Returns Error or Canceled

The PDF might be corrupted or unsupported by LlamaParse.

Verify API keys and authorization headers in HTTP Request nodes.

Google Sheets Append Fails

Spreadsheet ID or sheet name might be wrong.

Confirm OAuth scopes include editing permissions.


How to Customize This Workflow

  • Change the Gmail sender filter to track invoices from other email addresses.
  • Add more fields in the Structured Output Parser such as payment terms or due date.
  • Replace Google Sheets node with another storage like Airtable or Excel if preferred.
  • Adjust Wait node timer to control how often LlamaParse API is polled.
  • Tweak OpenAI model temperature settings for more strict or creative data extraction.

Summary of Workflow Results

✓ Extracts invoice data automatically from Gmail PDF attachments.

✓ Reduces manual data entry time and human error.

✓ Saves data directly into Google Sheets for quick reconciliation.

✓ Marks processed emails to prevent repeat work.


Frequently Asked Questions

Yes, but a compatible n8n node for the other email provider must be used and email filtering and label handling adjusted.
Yes, each invoice processed calls OpenAI’s API which counts toward usage and cost.
Data passes through LlamaParse and OpenAI APIs. For sensitive data, consider self-host n8n and use encrypted API keys.
Yes, but API rate limits on LlamaParse and OpenAI apply. Adjust polling intervals and optimize as needed.

Promoted by BULDRR AI

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation in n8n

A complete beginner guide to building an AI-powered SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free