Extract Data from PDFs Using Claude 3.5 Sonnet & Gemini 2.0 in n8n

This n8n workflow solves the challenge of extracting specific data from PDF files by directly sending base64-encoded PDFs to Claude 3.5 Sonnet and Gemini 2.0 Flash AI models. It eliminates the need for multi-step OCR, saving time and improving accuracy.
manualTrigger
googleDrive
httpRequest
+3
Workflow Identifier: 1749
NODES in Use: ManualTrigger, ExtractFromFile, GoogleDrive, HttpRequest, Set, StickyNote
Extract PDF data with n8n and Claude 3.5

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

What this workflow does

This workflow takes a PDF file from Google Drive and pulls out specific data like VAT numbers.
It uses two AI models at once — Claude 3.5 Sonnet and Google Gemini 2.0 Flash — to read the PDF and find the info you ask for.
The result is faster, easier, and less error-prone extraction of data from invoices.

The workflow cuts down on manual steps by sending the PDF in base64 straight to the AI for direct extraction.
This removes the need to use separate OCR processing and multiple tool integrations.


Who should use this workflow

Anyone who needs to grab information from PDFs stored in Google Drive regularly.
This is great if you want to save time on repetitive data extraction like VAT numbers from invoice PDFs.

You do not need to be a tech expert. Basic familiarity with n8n and API keys is enough.
If you want to compare two AI providers to find what works best for your documents, this is helpful too.


Tools and services used


Beginner step-by-step: How to use this workflow in n8n production

Download and import

  1. Click the Download button on this page to save the workflow file (.json).
  2. Inside the n8n editor, choose “Import from File” and upload the saved workflow.

Configure credentials and settings

  1. Add your Google Drive OAuth2 credentials in n8n credential settings.
  2. Enter your Claude 3.5 Sonnet API key in the appropriate HTTP Request node.
  3. Enter your Google Gemini 2.0 Flash API key in its HTTP Request node.
  4. Find the Google Drive node and replace the sample fileId with your actual PDF file ID from Google Drive.
  5. Open the Define Prompt node and update the prompt field if you want to extract other information beyond VAT numbers.

Test and activate

  1. Run the workflow once using the Manual Trigger node to check if everything works and outputs the data.
  2. If results are good, activate the workflow for regular use or schedule it inside n8n.

If self hosting n8n, visit self-host n8n for help with setup and scaling.


Inputs, processing, and output explained

Inputs

  • Google Drive PDF file specified by file ID.
  • User-defined prompt telling AI what data to extract.

Processing Steps

  • The PDF file is downloaded from Google Drive.
  • It is converted from binary to a base64-encoded string inside n8n.
  • This base64 PDF and prompt are sent in parallel HTTP requests to Claude 3.5 Sonnet and Google Gemini 2.0 Flash APIs.
  • Each AI endpoint reads the PDF and extracts requested data.
  • Outputs from both AI models are returned to compare and choose the best.

Output

Structured text or JSON containing extracted data like VAT numbers.
The user can review outputs side by side for accuracy, latency, and cost trade-offs.


What to do if errors happen

  • Google Drive file not found or permission denied: Check if the file ID is correct.
    Re-authorize Google Drive OAuth2 credentials in n8n.
  • Invalid authentication on HTTP Request nodes: Make sure API keys for Claude and Gemini are active and entered correctly.
    Verify API scopes in platform consoles.
  • Empty or malformed AI responses: Confirm the prompt and JSON body formats follow the workflow examples exactly.
    Test with simple prompts first.

Customization ideas

  • Edit the prompt text in the Define Prompt node to extract info other than VAT, like invoice dates or supplier names.
  • Disable one AI provider’s HTTP Request node to reduce API calls and costs.
  • Modify request bodies to ask for JSON responses for easier parsing, using "generationConfig": { "responseMimeType": "application/json" }.
  • Replace the fileId in the Google Drive node with your own PDF file ID for your documents.
  • Add extra logging nodes after API calls to keep records of outputs.

Final summary

→ Automates data extraction from PDF files on Google Drive using AI.
→ Combines two AI models to improve accuracy and let user compare results.
→ Removes complex OCR and manual processing steps.
→ Easy to set up, runs inside n8n with minimal technical skill.
→ Helps you save time, reduce errors, and customize extraction needs.


Extract PDF data with n8n and Claude 3.5

Visit through Desktop to Interact with the Workflow.

Frequently Asked Questions

Yes, disabling either the Claude 3.5 Sonnet or Google Gemini 2.0 Flash HTTP Request node lets the workflow run with just one AI model.
Yes, sending the whole PDF encoded in base64 uses more tokens, so API costs can increase. Optimizing prompt length and PDF size helps control usage.
Check that the file ID is correct and that Google Drive OAuth2 credentials have proper permissions granted.
Yes, changing the prompt in the Define Prompt node allows extraction of any text-based information from the PDF files.

Promoted by BULDRR AI

Related Workflows

Automate Twist Channel Creation and Messaging with n8n

This workflow automates creating and updating a channel in Twist and sending a personalized message to specific users. It eliminates manual setup errors and saves time managing Twist communications.

Automate Ideogram Image Generation with Google Sheets & Gmail

This workflow automates graphic design image generation via Ideogram AI, storing image data in Google Sheets and Google Drive, with email alerts via Gmail. It saves designers hours by automating image creation, remixing, review, and record-keeping.

Automate IT Support with Slack and OpenAI in n8n

Streamline IT support by automating Slack message handling using n8n and OpenAI. This workflow handles Slack DMs, filters bots, queries a Confluence knowledge base, and delivers AI-generated responses, improving support efficiency and response time.

Automate Crypto Analysis with CoinMarketCap & n8n AI Agent

Discover how this unique n8n workflow leverages CoinMarketCap’s multi-agent AI to deliver precise, real-time cryptocurrency insights directly via Telegram. Manage crypto data analysis efficiently with automated multi-source API integration.

Automate Gumroad to Beehiiv Subscriber Sync with n8n

Learn how to automatically add new Gumroad sales customers as Beehiiv newsletter subscribers using n8n automation. This workflow saves time by syncing sales data to Google Sheets CRM and notifying your Telegram channel instantly.

Generate On-Brand Blog Articles Using n8n and OpenAI

This workflow automates the creation of on-brand blog articles by analyzing existing company content using n8n and OpenAI. It extracts article structures and brand voice to produce consistent draft articles, saving significant content creation time.
1:1 Free Strategy Session
Your competitors are already automating. Are you still paying for it manually?

Do you want to adopt AI Automation?

Every hour your team does repetitive work, you're burning real money.
While you wait, faster businesses are cutting costs and moving quicker.
AI and automations aren't the future anymore — they're the present.

Book a live 1-on-1 session where we show you exactly which of your daily tasks can be automated — and what it’s costing you not to.