Automate Document Analysis & Chat with n8n LangChain Nodes

This n8n workflow automates document parsing, analysis, and interactive Q&A via an AI chatbot. It solves the challenge of manually reviewing multi-file submissions and enables fast, accurate document insights delivered by email and chatbot.
Form Trigger
HTTP Request
Google Gemini Chat Model
+16
Workflow Identifier: 1697
NODES in Use: Form Trigger, Code, SplitInBatches, HTTP Request, If, Aggregate, Google Gemini Chat Model, Markdown, Gmail, ConvertToFile, LangChain Agent, Information Extractor, Pinecone Vector Store, Embeddings Mistral Cloud, Default Data Loader, Recursive Character Text Splitter, Chat Trigger, Question and Answer Chain, Vector Store Retriever

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

1. Opening Problem Statement

Imagine Pavithran, a project manager at a consulting firm, overwhelmed by the flood of client documents submitted for review every week. Each submission often includes multiple files—contracts, specs, reports—that must be carefully analyzed for key insights and shared with the client. Pavithran spends hours manually reading, extracting essential information, and crafting summaries. He struggles with the repetitive task of answering follow-up questions based on these long documents. This slow, error-prone process wastes valuable time and delays client communications.

This specific pain is exactly the problem this workflow tackles: automating multi-file document ingestion, analysis, and client query handling through an AI-powered chatbot—all triggered by form submissions—and delivering clear email summaries. The time savings translate to faster project turnarounds and fewer manual errors.

2. What This Automation Does

Once this workflow runs, here’s what happens step-by-step:

  • Form Trigger: It listens for user submissions that include multiple files and an email address.
  • Splits Files Individually: Uses a Code node to separate each uploaded binary file for independent processing.
  • Uploads & Parses: Sends each file to an external parsing API (LlamaIndex) to convert file contents into structured markdown.
  • Checks Parsing Status: Repeatedly polls the API until parsing completes successfully.
  • Aggregates Markdown: Combines the parsed markdown outputs from all files into a single consolidated document.
  • Language Processing: Translates non-English content to English and reformats text for clarity using Google Gemini and LangChain agents.
  • Vector Stores: Stores structured knowledge into a Pinecone vector database for semantic search and chat retrieval.
  • Email Delivery: Sends the summarized and annotated document back to the submitter via Gmail, including a link to an AI chatbot for interactive Q&A about their documents.
  • Chatbot Listener & Q&A: Waits for user questions via chat webhook, retrieves relevant information from the vector store, and responds intelligently.

The benefits are tangible: hours saved per submission, consistent quality of document summaries, and the ability to provide instant interactive support through chatbot technology.

3. Prerequisites ⚙️

  • n8n automation platform account (cloud or self-hosted)
  • Access to Google Gemini Chat Model (via LangChain nodes) 🔐
  • LlamaIndex API key for document parsing 🔐
  • Pinecone account with an index set up for vector storage 🔐
  • Gmail account for sending emails 📧

Optional: self-host your n8n for more control using a service like Hostinger (https://buldrr.com/hostinger) 🔌

4. Step-by-Step Guide

Step 1: Configure the Form Submission Trigger

Navigate to Triggers → Form Trigger node. Select or create the form “form which gets multiple files”. Ensure it expects two required file fields (file1, file2) and a required email field “provide your mail Id”. This trigger activates the workflow once a user submits this form.

Tip: The webhook URL is generated automatically; test your form submission to confirm triggering.

Step 2: Split Uploaded Files Into Separate Items

Add a Code node named “split the binary item” after the form trigger. Copy-paste this JavaScript code to iterate over all binary fields in the submission and output each file as a separate item:

// Get all input data
const items = $input.all();
const splitItems = [];
items.forEach(item => {
  if(item.binary){
    for(const [key, value] of Object.entries(item.binary)){
      splitItems.push({json:{}, binary: {data: value}});
    }
  }
});
return splitItems;

The outcome: each file is an independent data item for parallel processing.

Step 3: Process Each File in Batches

Use a SplitInBatches node “Loop Over Items1” to handle one file at a time. This helps manage API rate limits.

Step 4: Send Files to LlamaIndex Parsing API

Configure HTTP Request node “Parsing the document” to:

  • Method: POST
  • URL: https://api.cloud.llamaindex.ai/api/parsing/upload
  • Body: Multipart form with field “file” referencing binary data
  • Headers: Include Bearer token authorization

Expected: The API responds with a job ID for asynchronous parsing.

Step 5: Poll Parsing Status

Add an HTTP Request node “Check the parsing status” that queries the job status endpoint by job ID. Use If node “If2” to check if the status returned is “SUCCESS”.

If success, proceed to “Provide the markdown” node; otherwise, repeat polling after some delay.

Step 6: Retrieve Markdown Result and Aggregate

Once successful, call HTTP Request node “Provide the markdown” to get the parsed document as markdown text. Then use Aggregate node to combine markdown from all files into one text stream.

Step 7: Translate & Analyze the Combined Text

Send the aggregated markdown to LangChain Google Gemini Chat node “Translator Agent” which checks the language and translates non-English text to English, attaching original content.

Follow with Analyzer Agent (LangChain Agent node) for comprehensive prompt text analysis, reformatting, and preparing for storage.

Step 8: Convert Analyzed Text to Files and Store in Pinecone

Use ConvertToFile node to save the analyzed output as a text file. Insert the content into the Pinecone Vector Store node named “Pinecone Vector Store” enabling semantic search features for chatbot retrieval.

Step 9: Send Email Summary with Attachments

Use the Gmail node to send the prepared text file back to the user email collected from the form field. The email includes a link to an interactive chatbot for detailed Q&A.

Step 10: Enable Chatbot Interaction

Set up Chat Trigger node “When chat message received” with a webhook URL to listen for user messages from the chatbot interface.

This triggers a retrieval chain: Retriever Vector Store → Question and Answer Chain → AI Agent nodes to fetch relevant document knowledge and respond intelligently.

5. Customizations ✏️

  • Adjust File Types Accepted: In the Form Trigger node, add or remove file fields to accept different or more file formats.
  • Change Parsing Service: Replace the LlamaIndex API endpoint with another document processing API by reconfiguring the HTTP Request nodes.
  • Modify Email Template: In the Gmail node, customize your email body and subject dynamically using user data or markdown summaries.
  • Enhance Chatbot Capabilities: Tweak the LangChain agent system messages to tailor chatbot responses for different industries or languages.
  • Batch Size Tuning: Adjust batch processing size in SplitInBatches node to optimize throughput vs. API rate limits.

6. Troubleshooting 🔧

Problem: “Parsing status never reaches SUCCESS”
Cause: API token expired or file format unsupported.
Solution: Verify LlamaIndex API credentials, check formats, and review API response logs in HTTP Request nodes.

Problem: “Email not sent to user after processing”
Cause: Invalid email address extraction or Gmail node misconfigured.
Solution: Confirm email field mapping from Form Trigger and test using a fixed valid email address in Gmail node.

Problem: “Chatbot fails to retrieve relevant answers”
Cause: Vector store indexing incomplete or embedding failed.
Solution: Confirm embeddings are successfully inserted in Pinecone and re-index if necessary.

7. Pre-Production Checklist ✅

  • Test form submission with multiple files and valid email.
  • Monitor HTTP requests to LlamaIndex API for successful parsing response.
  • Verify Markdown aggregation outputs consistent consolidated text.
  • Check email delivery to submitted addresses with attachments intact.
  • Test chatbot webhook and ensure queries receive relevant responses.
  • Backup workflow configuration before enabling live activation.

8. Deployment Guide

Activate your workflow in n8n after thorough testing. Ensure all sensitive credentials (API tokens, Gmail account) are securely stored in n8n credentials manager. Monitor execution logs for errors. Use n8n’s built-in retry and alerting for robust operation. For production, consider scalable hosting options and API rate limit management.

9. FAQs

Q: Can I use other vector stores instead of Pinecone?
A: Yes, n8n LangChain nodes support various vector stores. Swap the Pinecone node with compatible alternatives by adjusting configurations.

Q: Does using Google Gemini incur extra costs?
A: Access to Google Gemini models is typically subscription-based. Check provider pricing for usage limits.

Q: Is my document data secure?
A: API calls use bearer tokens and HTTPS for secure transmission, but ensure you comply with your organization’s data policies.

Q: Can this workflow scale to hundreds of document submissions?
A: Yes, but monitor API rate limits and consider increased batch processing resources and queue controls.

10. Conclusion

By building and deploying this documented workflow, you transform cumbersome manual document analysis into a seamless, automated process. You gain hours back each week, improve consistency of summaries, and offer interactive chatbot support that delights users with instant, contextual answers. Pavithran now spends time on strategic tasks instead of tedious document reading.

Next, you could extend this automation by integrating additional document formats, adding sentiment analysis, or linking results to project management tools for greater workflow impact.

Let’s keep automating smarter and building helpful AI-driven tools for everyday work challenges!

Promoted by BULDRR AI

Related Workflows

Automate Viral UGC Video Creation Using n8n + Degaus (Beginner-Friendly Guide)

Learn how to automate viral UGC video creation using n8n, AI prompts, and Degaus. This beginner-friendly guide shows how to import, configure, and run the workflow without technical complexity.
Form Trigger
Google Sheets
Gmail
+37
Free

AI SEO Blog Writer Automation in n8n (Beginner Guide)

A complete beginner guide to building an AI-powered SEO blog writer automation using n8n.
AI Agent
Google Sheets
httpRequest
+5
Free

Automate CrowdStrike Alerts with VirusTotal, Jira & Slack

This workflow automates processing of CrowdStrike detections by enriching threat data via VirusTotal, creating Jira tickets for incident tracking, and notifying teams on Slack for quick response. Save hours daily by transforming complex threat data into actionable alerts effortlessly.
scheduleTrigger
httpRequest
jira
+5
Free

Automate Telegram Invoices to Notion with AI Summaries & Reports

Save hours on financial tracking by automating invoice extraction from Telegram photos to Notion using Google Gemini AI. This workflow extracts data, records transactions, and generates detailed spending reports with charts sent on schedule via Telegram.
lmChatGoogleGemini
telegramTrigger
notion
+9
Free

Automate Email Replies with n8n and AI-Powered Summarization

Save hours managing your inbox with this n8n workflow that uses IMAP email triggers, AI summarization, and vector search to draft concise replies requiring minimal review. Automate business email processing efficiently with AI guidance and Gmail integration.
emailReadImap
vectorStoreQdrant
emailSend
+12
Free

Automate Email Campaigns Using n8n with Gmail & Google Sheets

This n8n workflow automates personalized email outreach campaigns by integrating Gmail and Google Sheets, saving hours of manual follow-up work and reducing errors in email sequences. It ensures timely follow-ups based on previous email interactions, optimizing communication efficiency.
googleSheets
gmail
code
+5
Free