Automate Telegram PDF Q&A Using Pinecone & LangChain

Discover how to automatically process PDFs sent to Telegram, extract searchable content using LangChain with Pinecone, and instantly reply to chat queries. This workflow saves hours by turning lengthy documents into smart answers.
telegramTrigger
chainRetrievalQa
vectorStorePinecone
+13
Learn how to Build this Workflow with AI:
Workflow Identifier: 2333
NODES in Use: Telegram Trigger, Check If, Telegram, Code, Recursive Character Text Splitter, Default Data Loader, Embeddings OpenAI, Pinecone Vector Store, Limit, Telegram Response, Stop and Error, Question and Answer Chain, Vector Store Retriever, Groq Chat Model, Telegram Response about Database, Pinecone Vector Store1

Press CTRL+F5 if the workflow didn't load.

Visit through Desktop for Best experience

1. Opening Problem Statement

Meet Ana, a remote consultant who often receives long PDF reports from clients via Telegram. She spends hours sifting through each document manually to find key insights or answer specific client questions. This tedious process not only wastes Ana’s valuable time but also increases the chance of missing critical information hidden deep within documents.

Ana needs a solution that can instantly understand and respond to questions based on the PDF content she receives through Telegram, without her having to manually read and search those files. This workflow solves exactly that problem by automating the extraction, indexing, and querying process, drastically reducing manual effort.

2. What This Automation Does

When this n8n workflow runs, it transforms Telegram into an intelligent document assistant that can:

  • Automatically detect PDF files sent via Telegram and download them seamlessly.
  • Convert PDFs into manageable text chunks using recursive character splitting.
  • Generate OpenAI embeddings for these chunks to create a searchable vector database.
  • Store this vector data in Pinecone for lightning-fast, similarity-based retrieval.
  • Allow users to ask natural language questions and get precise answers from the uploaded PDFs.
  • Send immediate replies in Telegram chat with the AI-generated response based on document content.

This automation eliminates hours of manual searching in documents and helps Ana quickly deliver accurate answers to client queries.

3. Prerequisites ⚙️

  • n8n account for building and running the workflow.
  • Telegram account with bot access and Telegram API credentials 📧.
  • OpenAI API key for using embeddings and language model generation 🔑.
  • Pinecone account and API key for vector database management 📁.
  • Groq API credentials for language model chat operations 🔐.
  • Basic knowledge of JSON and n8n interfaces recommended but not required.

4. Step-by-Step Guide

Step 1: Set Up Telegram Trigger Node

Go to your n8n editor, click Add Node → search Telegram Trigger. Configure it to listen for “message” updates. Connect your Telegram Bot by supplying its API credentials under telegramApi. This node fires whenever someone sends a message (like PDFs) in your chat.

Expected outcome: Workflow starts on receiving any Telegram message.

Common mistake: Forgetting to add the Telegram API credentials will prevent triggering.

Step 2: Check If Incoming Message Contains a Document

Add an If node to evaluate if $json.message.document exists (check for exact message.document object presence). This filters to process only PDF documents from Telegram.

Expected outcome: Messages without documents will skip the PDF processing path.

Common mistake: Using loose checking can cause errors if message contains other media like photos.

Step 3: Download the PDF File from Telegram

Use the Telegram node set to resource file and pass $json.message.document.file_id to the fileId field. This fetches the binary PDF file from Telegram servers.

Expected outcome: Workflow receives the actual PDF binary data.

Common mistake: Not passing the correct file ID will cause a failure.

Step 4: Normalize File Metadata with Code Node

Add a Code node named “Change to application/pdf” with JavaScript code to modify the binary metadata’s MIME type to application/pdf. Ensure the file name ends with .pdf and update fileType.contentType accordingly.

// Function to modify binary metadata
function modifyBinaryMetadata(items) {
  for (const item of items) {
    if (item.binary && item.binary.data) {
      item.binary.data.mimeType = 'application/pdf';
      if (!item.binary.data.fileName.toLowerCase().endsWith('.pdf')) {
        item.binary.data.fileName += '.pdf';
      }
      if (item.binary.data.fileType) {
        item.binary.data.fileType.contentType = 'application/pdf';
      }
    }
  }
  return items;
}

return modifyBinaryMetadata($input.all());

Expected outcome: Consistent PDF metadata for downstream processing.

Common mistake: Overlooking metadata leads to errors in embedding generation.

Step 5: Split PDF into Text Chunks

Add a Recursive Character Text Splitter node configured with chunk size 3000 and overlap 200. This breaks the PDF text into manageable pieces for more effective embedding.

Expected outcome: Document is split into text chunks ready for vectorization.

Common mistake: Using too large chunk sizes may overwhelm the vector store.

Step 6: Load Text Chunks as Documents

Next, connect a Default Data Loader to load the split chunks with dataType set to binary. This prepares the text chunks for generating embeddings.

Expected outcome: Text chunks are transformed into document format for embedding.

Step 7: Generate Embeddings with OpenAI Node

Add the Embeddings OpenAI node which produces semantic vectors representing the text chunk content. Connect this node to the data loader output, and configure your OpenAI API credentials.

Expected outcome: Each text chunk is converted into an embedding vector.

Step 8: Store Embeddings in Pinecone Vector Database

Use the Pinecone Vector Store node in insert mode connected to the Embeddings node. Provide your Pinecone API credentials and specify the “telegram” index to save these vectors for fast retrieval later.

Expected outcome: Embeddings are indexed and ready for similarity search.

Step 9: Inform User of Completion

Add a Telegram Response about Database node to send a message back to the original chat confirming how many PDF pages were saved into Pinecone.

Expected outcome: Chat user receives immediate feedback that the PDF has been processed.

Step 10: Handle Chat Queries Using the Q&A Chain

For incoming non-document messages, the workflow passes the text to a Question and Answer Chain node which uses Elasticsearch-style search in the Pinecone vector store via a Vector Store Retriever, then invokes a Groq Chat Model to generate an accurate and context-aware answer.

Expected outcome: User questions receive precise answers derived from the stored PDF data.

Step 11: Send User the AI-generated Response

The final step uses a Telegram Response node to send the AI’s answer back to the Telegram chat.

Expected outcome: Conversation completes with helpful feedback instantly.

5. Customizations ✏️

  • Change Chunk Size: In the Recursive Character Text Splitter, modify chunkSize from 3000 to smaller or larger values to control granularity of searchable content.
  • Switch Language Model: Replace the Groq Chat Model node with other supported models like OpenAI GPT-4 by changing the node and credentials if desired.
  • Modify Telegram Response Text: Customize the text templates in the Telegram Response nodes to better fit your communication style or add additional metadata.
  • Enhance Error Handling: Customize Stop and Error nodes’ messages for more descriptive user feedback if something goes wrong.
  • Index Multiple Telegram Channels: Duplicate and adjust Pinecone indexes for different Telegram groups or channels where you want to enable this Q&A functionality.

6. Troubleshooting 🔧

Problem: “Error: missing or invalid credentials”

Cause: API credentials for Telegram, OpenAI, Pinecone, or Groq are missing or incorrect.

Solution: Navigate to each respective node’s credentials tab, re-enter valid API keys, and re-test the workflow trigger.

Problem: “Failed to fetch file from Telegram”

Cause: Incorrect file ID or Telegram bot lacks access to user’s document.

Solution: Verify file_id is correctly retrieved and Telegram bot has necessary permissions (bots must be added to the chat).

Problem: “No relevant results found in Pinecone”

Cause: Vector store not populated or mismatched index name.

Solution: Ensure PDF upload flow completes successfully and Pinecone index name matches exactly.

7. Pre-Production Checklist ✅

  • Confirm Telegram bot is active and correctly linked to the workflow.
  • Test sending various PDF documents in Telegram and verify they undergo full processing.
  • Ensure OpenAI and Groq API keys are valid and have sufficient quota.
  • Validate Pinecone index exists and is reachable via API.
  • Test chat question feature by asking related queries in Telegram after PDF is processed.
  • Check error message nodes trigger correctly under failure scenarios.

8. Deployment Guide

Activate the workflow in n8n by toggling the Active button. Monitor the executions tab for incoming Telegram messages and messages sent back. To keep logs and improve monitoring, enable saving manual executions in workflow settings.

If self-hosting n8n, consider platforms like Hostinger for a robust environment ensuring uptime and API integration reliability.

10. Conclusion

You have successfully automated the ingestion and querying of PDFs via Telegram using n8n integrated with LangChain, OpenAI embeddings, Pinecone vector databases, and Groq chat models. This workflow saves you hours otherwise spent searching documents and gives immediate, accurate answers to questions based on those documents.

Next, you might explore automating other messaging platforms, integrating additional AI models for summarization, or scaling this bot to support multiple languages or telegram groups.

Take your document processing to the next level and enjoy smarter, faster workflows with n8n!

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation in n8n (Beginner Guide)

A complete beginner guide to building an AI-powered SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free