Automate Google Drive PDFs with n8n & Pinecone Search

Save hours of manual work by automatically monitoring Google Drive for new PDFs, extracting and cleaning their content, and enabling powerful AI-driven search and chat using Pinecone and Google Gemini. This workflow turns your document storage into an intelligent, queryable knowledge base.
googleDriveTrigger
vectorStorePinecone
embeddingsGoogleGemini
+8
Workflow Identifier: 1785
NODES in Use: googleDriveTrigger, googleDrive, extractFromFile, code, vectorStorePinecone, embeddingsGoogleGemini, documentDefaultDataLoader, textSplitterRecursiveCharacterTextSplitter, chatTrigger, lmChatOpenRouter, agent
Automate Google Drive PDFs with n8n and Pinecone

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

What This Workflow Does

This workflow watches a Google Drive folder for new PDF files.

It downloads each PDF, then extracts and cleans the text.

Next, it creates embeddings from the clean text using the Google Gemini model.

It adds these embeddings and document data to a Pinecone vector store for quick searches.

When a user sends a question, the workflow finds relevant documents from Pinecone and uses AI chat powered by Google Gemini to answer.

This saves many hours by automating slow manual work and speed ups document search with AI.


Who Should Use This Workflow

This workflow is for anyone who gets many PDF files to manage on Google Drive.

It helps teams that spend hours downloading, reading, and searching PDFs manually.

Especially useful for legal, research, or admin staff needing fast text search across documents.

It works great for small or large volumes needing AI chat to answer question from PDFs.


Tools and Services Used

  • Google Drive Trigger node: Watches a folder for new PDFs.
  • Google Drive node: Downloads files by file ID.
  • Extract From File node: Extracts text from PDF binary data.
  • Code node: Cleans text by removing line breaks and special characters.
  • Generate Document Embeddings (Google Gemini) node: Creates text embeddings for search.
  • Insert Document into Pinecone Vector Store node: Adds vectors and metadata for similarity search.
  • Chat Message Trigger node: Webhook for user chat queries.
  • Generate Query Embeddings (Google Gemini) node: Turns chat questions into embeddings.
  • Retrieve Relevant Documents from Pinecone node: Finds documents matching the question.
  • Code node (prompt builder): Combines top documents with user query to build AI prompt.
  • OpenRouter Chat Model Interface node: Runs Google Gemini chat model to answer questions.

Inputs, Process, and Outputs

Inputs

  • New PDF files uploaded to a specific monitored Google Drive folder.
  • User questions sent to the chat webhook.

Processing Steps

  • Trigger detects new PDF files on Google Drive.
  • Downloads file binary data.
  • Extracts raw text from PDFs.
  • Cleans text by removing line breaks and special symbols.
  • Generates vector embeddings of cleaned text using Google Gemini text-embedding-004.
  • Inserts embeddings and document metadata into Pinecone index.
  • When a question arrives, generates query embeddings from user input.
  • Searches Pinecone for documents closest to user query vectors.
  • Builds a prompt mixing top document snippets with question.
  • Sends prompt to Google Gemini chat model for an answer.

Outputs

  • Fast AI-generated answers to user questions based on PDF content.
  • Stored indexed document vectors for repeated fast similarity searches.

Beginner Step-by-Step: How To Use This Workflow in n8n

Importing and Setup

  1. Download the workflow file using the Download button on this page.
  2. Open n8n editor and choose Import from File to upload the workflow file.
  3. After import, add Google Drive OAuth credentials to nodes that need it.
  4. Configure Pinecone credentials and make sure the index name matches n8n-rag-demo or update as needed.
  5. Enter Google Gemini API keys for embedding and chat model nodes.
  6. For the Google Drive Trigger node, update the folder ID if monitoring a different folder.

Testing and Activation

  1. Upload a test PDF to the monitored Google Drive folder and verify the workflow triggers.
  2. Send a test chat query to the webhook URL provided by the Chat Message Trigger node.
  3. Check if the workflow extracts text, creates embeddings, stores data, and returns a relevant AI answer.
  4. Fix any errors by checking credentials and configuration, then retest.
  5. Activate the workflow for production use by toggling it active.

For stable operation, if self hosting n8n, keep the n8n service running or use self-host n8n.


Common Edge Cases and Failures

  • No trigger on new files: Usually folder ID is wrong or Google Drive permissions missing.
  • Empty text extraction: Happens if PDFs are scanned images, which need OCR support (not included).
  • Embedding or Pinecone errors: Often caused by invalid API keys or index names.
  • Slow or missing chat answers: Check connection to Google Gemini chat service and webhook setup.

Customization Ideas

  • Change monitored Google Drive folder ID to track other folders.
  • Adjust text cleaning code to keep more or fewer characters depending on document style.
  • Change embedding model names to other Google Gemini or supported models.
  • Include more document snippets in chat prompts by editing prompt builder code node.
  • Add notification nodes (email, Slack) after Pinecone insertion to alert when new files are processed.

Summary of Results

✓ Automated monitoring and ingestion of PDFs from Google Drive.

✓ Clean, searchable document text extracted and normalized.

✓ Fast embeddings stored in Pinecone for efficient similarity searches.

✓ AI chat answers user queries using most relevant document context.

→ Saves hours of manual file handling and text searching.

→ Speeds up legal or document-based decision workflows.

Automate Google Drive PDFs with n8n and Pinecone

Visit through Desktop to Interact with the Workflow.

Frequently Asked Questions

The Google Drive Trigger node may not detect new files if the folder ID is incorrect or if the OAuth credentials do not have permission to access the monitored folder.
Empty text results occur when PDFs are scanned images without embedded text. This workflow does not include OCR to handle such files.
Embedding errors can be fixed by verifying the Pinecone API key is correct and the specified index name exists and matches the node configuration.
Yes, the folder ID inside the Google Drive Trigger node can be updated to any other folder the user has access to.

Promoted by BULDRR AI

Related Workflows

Automate Twist Channel Creation and Messaging with n8n

This workflow automates creating and updating a channel in Twist and sending a personalized message to specific users. It eliminates manual setup errors and saves time managing Twist communications.

Automate Ideogram Image Generation with Google Sheets & Gmail

This workflow automates graphic design image generation via Ideogram AI, storing image data in Google Sheets and Google Drive, with email alerts via Gmail. It saves designers hours by automating image creation, remixing, review, and record-keeping.

Automate IT Support with Slack and OpenAI in n8n

Streamline IT support by automating Slack message handling using n8n and OpenAI. This workflow handles Slack DMs, filters bots, queries a Confluence knowledge base, and delivers AI-generated responses, improving support efficiency and response time.

Automate Crypto Analysis with CoinMarketCap & n8n AI Agent

Discover how this unique n8n workflow leverages CoinMarketCap’s multi-agent AI to deliver precise, real-time cryptocurrency insights directly via Telegram. Manage crypto data analysis efficiently with automated multi-source API integration.

Automate Gumroad to Beehiiv Subscriber Sync with n8n

Learn how to automatically add new Gumroad sales customers as Beehiiv newsletter subscribers using n8n automation. This workflow saves time by syncing sales data to Google Sheets CRM and notifying your Telegram channel instantly.

Generate On-Brand Blog Articles Using n8n and OpenAI

This workflow automates the creation of on-brand blog articles by analyzing existing company content using n8n and OpenAI. It extracts article structures and brand voice to produce consistent draft articles, saving significant content creation time.
1:1 Free Strategy Session
Your competitors are already automating. Are you still paying for it manually?

Do you want to adopt AI Automation?

Every hour your team does repetitive work, you're burning real money.
While you wait, faster businesses are cutting costs and moving quicker.
AI and automations aren't the future anymore — they're the present.

Book a live 1-on-1 session where we show you exactly which of your daily tasks can be automated — and what it’s costing you not to.