Extract PDF & Image Text into CSV Using n8n and Vertex AI

Learn how to automatically extract text from PDFs and images using n8n with Google Vertex AI and Openrouter, converting costly manual entry into CSV files for easy data analysis. This workflow saves hours by automating text extraction and classification of transactions in bank statements.
googleDriveTrigger
switch
googleDrive
+5
Workflow Identifier: 1367
NODES in Use: Google Drive Trigger, Switch, Google Drive, ExtractFromFile, HTTP Request, ConvertToFile, chainLlm, lmChatGoogleGemini

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

What This Workflow Does

This workflow watches a specific Google Drive folder for new bank statement PDFs or images.

It downloads files and finds text inside PDFs or images.

Then it asks an AI model to turn the raw text into a clean CSV file that has transactions sorted by category.

The workflow saves time and reduces mistakes by making CSV files automatically from financial documents.


Who Should Use This Workflow

Anyone who needs to extract transaction data from bank statements or receipts in PDFs or images.

It is great for small businesses, accountants, or bookkeepers who want to stop doing manual copy-paste.

It fits users with Google Drive storage and who want automatic CSV exports for their accounting.


Tools and Services Used

  • Google Drive API: Detect and download new files from a drive folder.
  • n8n ExtractFromFile node: Pull text from PDFs.
  • Google Vertex AI: Extract text from images via AI.
  • Openrouter API: Use the Meta LLaMA 3.1 AI model to parse raw text and produce CSV.
  • n8n ConvertToFile node: Turn AI text output into CSV files.
  • Google Drive API: Upload concluded CSV files back to Drive.

Inputs, Processing Steps, and Output

Inputs

  • New PDFs or image files placed into a specific Google Drive folder.

Processing Steps

  1. Trigger when a new file appears in the folder via Google Drive Trigger node.
  2. Use a Switch node to send PDFs and images down different paths.
  3. Download files using Google Drive nodes.
  4. Extract text with ExtractFromFile node for PDFs, or send images to Google Vertex AI node for text extraction.
  5. Send the extracted text to Openrouter AI via an HTTP Request node calling Meta LLaMA 3.1 to parse and convert text into categorized CSV format.
  6. Convert AI response text to CSV files using ConvertToFile nodes.
  7. Upload the CSV files back to Drive using Google Drive node.

Output

The user gets categorized transaction data as CSV files stored in Google Drive ready to import into accounting software.


Beginner Step-by-Step: How to Use This Workflow in n8n Production

Step 1: Import the Workflow

  1. Download the workflow file using the Download button on this page.
  2. Open the n8n editor where you want to use the automation.
  3. Click on “Import from File” and select the downloaded workflow file.

Step 2: Configure Credentials and IDs

  1. Go to each Google Drive node and connect your Google Service Account credentials.
  2. In the Google Drive Trigger node, update the folder ID to your target bank statement or receipt folder.
  3. Add your Openrouter API Key in the HTTP Request node settings for authorization.
  4. If needed, update any folder IDs or authentication emails used for uploads.

Step 3: Test the Workflow

  1. Upload a test PDF or image file to the Google Drive folder you are monitoring.
  2. Check if the workflow triggers and you get a CSV file uploaded back to the Drive output folder.

Step 4: Activate for Production

  1. Turn the workflow toggle ON to enable automatic runs on new files.
  2. Monitor executions the first few times to catch any errors.
  3. Adjust configuration or credentials if there are permission or API issues.

Use self-host n8n if more control over the runtime is needed.


Common Issues and Fixes

  • Google Drive Trigger fires but files not processed: Make sure the folder is shared with your Google Service Account email.
  • Empty text from ExtractFromFile node: Check if PDF has selectable text, not just scanned images.
  • Vertex AI permission errors: Enable the Vertex AI API in Google Cloud and grant correct roles to your service account.
  • Openrouter API 401 error: Verify the API key and HTTP header formats in the HTTP Request node.

Customization Ideas

  • Switch AI models by changing the HTTP Request node to use GPT-4 or Google Gemini for parsing.
  • Add other image file types in the Switch node by matching additional MIME types.
  • Insert filters to remove old or small transactions before CSV conversion using a Function node.
  • Change CSV filenames dynamically by adding client or bank names with expressions in the upload node.
  • Add email notifications with a Gmail node after CSV uploads for alerts.

Summary of Benefits

✓ Saves time by automating data extraction from PDFs and images.

✓ Produces cleaner, structured CSV files with categorized expense columns.

✓ Reduces errors from manual copy-pasting and delays in reporting.

✓ Works fully integrated within Google Drive and n8n workflow.

→ Makes financial data ready for accounting software fast and easy.


Frequently Asked Questions

Yes, the Openrouter HTTP Request node can be adjusted to call OpenAI GPT-4 or Google Gemini models by changing the model parameter.
The ExtractFromFile node won’t extract text well in that case; use the image extraction path with Google Vertex AI.
The Google Drive folder must be shared with the Google Service Account email used in the workflow so it can access new files and upload results.
Yes, all data flows through authenticated trusted APIs; credentials are stored securely within n8n environment.

Promoted by BULDRR AI

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation Workflows in n8n

A complete beginner guide to building an AI SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free