Extract PDF & Image Text into CSV Using n8n and Vertex AI

Learn how to automatically extract text from PDFs and images using n8n with Google Vertex AI and Openrouter, converting costly manual entry into CSV files for easy data analysis. This workflow saves hours by automating text extraction and classification of transactions in bank statements.
googleDriveTrigger
switch
googleDrive
+5
Workflow Identifier: 1367
NODES in Use: Google Drive Trigger, Switch, Google Drive, ExtractFromFile, HTTP Request, ConvertToFile, chainLlm, lmChatGoogleGemini
Extract text with n8n and Vertex AI

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

What This Workflow Does

This workflow watches a specific Google Drive folder for new bank statement PDFs or images.

It downloads files and finds text inside PDFs or images.

Then it asks an AI model to turn the raw text into a clean CSV file that has transactions sorted by category.

The workflow saves time and reduces mistakes by making CSV files automatically from financial documents.


Who Should Use This Workflow

Anyone who needs to extract transaction data from bank statements or receipts in PDFs or images.

It is great for small businesses, accountants, or bookkeepers who want to stop doing manual copy-paste.

It fits users with Google Drive storage and who want automatic CSV exports for their accounting.


Tools and Services Used

  • Google Drive API: Detect and download new files from a drive folder.
  • n8n ExtractFromFile node: Pull text from PDFs.
  • Google Vertex AI: Extract text from images via AI.
  • Openrouter API: Use the Meta LLaMA 3.1 AI model to parse raw text and produce CSV.
  • n8n ConvertToFile node: Turn AI text output into CSV files.
  • Google Drive API: Upload concluded CSV files back to Drive.

Inputs, Processing Steps, and Output

Inputs

  • New PDFs or image files placed into a specific Google Drive folder.

Processing Steps

  1. Trigger when a new file appears in the folder via Google Drive Trigger node.
  2. Use a Switch node to send PDFs and images down different paths.
  3. Download files using Google Drive nodes.
  4. Extract text with ExtractFromFile node for PDFs, or send images to Google Vertex AI node for text extraction.
  5. Send the extracted text to Openrouter AI via an HTTP Request node calling Meta LLaMA 3.1 to parse and convert text into categorized CSV format.
  6. Convert AI response text to CSV files using ConvertToFile nodes.
  7. Upload the CSV files back to Drive using Google Drive node.

Output

The user gets categorized transaction data as CSV files stored in Google Drive ready to import into accounting software.


Beginner Step-by-Step: How to Use This Workflow in n8n Production

Step 1: Import the Workflow

  1. Download the workflow file using the Download button on this page.
  2. Open the n8n editor where you want to use the automation.
  3. Click on “Import from File” and select the downloaded workflow file.

Step 2: Configure Credentials and IDs

  1. Go to each Google Drive node and connect your Google Service Account credentials.
  2. In the Google Drive Trigger node, update the folder ID to your target bank statement or receipt folder.
  3. Add your Openrouter API Key in the HTTP Request node settings for authorization.
  4. If needed, update any folder IDs or authentication emails used for uploads.

Step 3: Test the Workflow

  1. Upload a test PDF or image file to the Google Drive folder you are monitoring.
  2. Check if the workflow triggers and you get a CSV file uploaded back to the Drive output folder.

Step 4: Activate for Production

  1. Turn the workflow toggle ON to enable automatic runs on new files.
  2. Monitor executions the first few times to catch any errors.
  3. Adjust configuration or credentials if there are permission or API issues.

Use self-host n8n if more control over the runtime is needed.


Common Issues and Fixes

  • Google Drive Trigger fires but files not processed: Make sure the folder is shared with your Google Service Account email.
  • Empty text from ExtractFromFile node: Check if PDF has selectable text, not just scanned images.
  • Vertex AI permission errors: Enable the Vertex AI API in Google Cloud and grant correct roles to your service account.
  • Openrouter API 401 error: Verify the API key and HTTP header formats in the HTTP Request node.

Customization Ideas

  • Switch AI models by changing the HTTP Request node to use GPT-4 or Google Gemini for parsing.
  • Add other image file types in the Switch node by matching additional MIME types.
  • Insert filters to remove old or small transactions before CSV conversion using a Function node.
  • Change CSV filenames dynamically by adding client or bank names with expressions in the upload node.
  • Add email notifications with a Gmail node after CSV uploads for alerts.

Summary of Benefits

✓ Saves time by automating data extraction from PDFs and images.

✓ Produces cleaner, structured CSV files with categorized expense columns.

✓ Reduces errors from manual copy-pasting and delays in reporting.

✓ Works fully integrated within Google Drive and n8n workflow.

→ Makes financial data ready for accounting software fast and easy.


Extract text with n8n and Vertex AI

Visit through Desktop to Interact with the Workflow.

Frequently Asked Questions

Yes, the Openrouter HTTP Request node can be adjusted to call OpenAI GPT-4 or Google Gemini models by changing the model parameter.
The ExtractFromFile node won’t extract text well in that case; use the image extraction path with Google Vertex AI.
The Google Drive folder must be shared with the Google Service Account email used in the workflow so it can access new files and upload results.
Yes, all data flows through authenticated trusted APIs; credentials are stored securely within n8n environment.

Promoted by BULDRR AI

Related Workflows

Automate Twist Channel Creation and Messaging with n8n

This workflow automates creating and updating a channel in Twist and sending a personalized message to specific users. It eliminates manual setup errors and saves time managing Twist communications.

Automate Ideogram Image Generation with Google Sheets & Gmail

This workflow automates graphic design image generation via Ideogram AI, storing image data in Google Sheets and Google Drive, with email alerts via Gmail. It saves designers hours by automating image creation, remixing, review, and record-keeping.

Automate IT Support with Slack and OpenAI in n8n

Streamline IT support by automating Slack message handling using n8n and OpenAI. This workflow handles Slack DMs, filters bots, queries a Confluence knowledge base, and delivers AI-generated responses, improving support efficiency and response time.

Automate Crypto Analysis with CoinMarketCap & n8n AI Agent

Discover how this unique n8n workflow leverages CoinMarketCap’s multi-agent AI to deliver precise, real-time cryptocurrency insights directly via Telegram. Manage crypto data analysis efficiently with automated multi-source API integration.

Automate Gumroad to Beehiiv Subscriber Sync with n8n

Learn how to automatically add new Gumroad sales customers as Beehiiv newsletter subscribers using n8n automation. This workflow saves time by syncing sales data to Google Sheets CRM and notifying your Telegram channel instantly.

Generate On-Brand Blog Articles Using n8n and OpenAI

This workflow automates the creation of on-brand blog articles by analyzing existing company content using n8n and OpenAI. It extracts article structures and brand voice to produce consistent draft articles, saving significant content creation time.
1:1 Free Strategy Session
Your competitors are already automating. Are you still paying for it manually?

Do you want to adopt AI Automation?

Every hour your team does repetitive work, you're burning real money.
While you wait, faster businesses are cutting costs and moving quicker.
AI and automations aren't the future anymore — they're the present.

Book a live 1-on-1 session where we show you exactly which of your daily tasks can be automated — and what it’s costing you not to.