1. Opening Problem Statement
Imagine Pavithran, a project manager at a consulting firm, overwhelmed by the flood of client documents submitted for review every week. Each submission often includes multiple files—contracts, specs, reports—that must be carefully analyzed for key insights and shared with the client. Pavithran spends hours manually reading, extracting essential information, and crafting summaries. He struggles with the repetitive task of answering follow-up questions based on these long documents. This slow, error-prone process wastes valuable time and delays client communications.
This specific pain is exactly the problem this workflow tackles: automating multi-file document ingestion, analysis, and client query handling through an AI-powered chatbot—all triggered by form submissions—and delivering clear email summaries. The time savings translate to faster project turnarounds and fewer manual errors.
2. What This Automation Does
Once this workflow runs, here’s what happens step-by-step:
- Form Trigger: It listens for user submissions that include multiple files and an email address.
- Splits Files Individually: Uses a Code node to separate each uploaded binary file for independent processing.
- Uploads & Parses: Sends each file to an external parsing API (LlamaIndex) to convert file contents into structured markdown.
- Checks Parsing Status: Repeatedly polls the API until parsing completes successfully.
- Aggregates Markdown: Combines the parsed markdown outputs from all files into a single consolidated document.
- Language Processing: Translates non-English content to English and reformats text for clarity using Google Gemini and LangChain agents.
- Vector Stores: Stores structured knowledge into a Pinecone vector database for semantic search and chat retrieval.
- Email Delivery: Sends the summarized and annotated document back to the submitter via Gmail, including a link to an AI chatbot for interactive Q&A about their documents.
- Chatbot Listener & Q&A: Waits for user questions via chat webhook, retrieves relevant information from the vector store, and responds intelligently.
The benefits are tangible: hours saved per submission, consistent quality of document summaries, and the ability to provide instant interactive support through chatbot technology.
3. Prerequisites ⚙️
- n8n automation platform account (cloud or self-hosted)
- Access to Google Gemini Chat Model (via LangChain nodes) 🔐
- LlamaIndex API key for document parsing 🔐
- Pinecone account with an index set up for vector storage 🔐
- Gmail account for sending emails 📧
Optional: self-host your n8n for more control using a service like Hostinger (https://buldrr.com/hostinger) 🔌
4. Step-by-Step Guide
Step 1: Configure the Form Submission Trigger
Navigate to Triggers → Form Trigger node. Select or create the form “form which gets multiple files”. Ensure it expects two required file fields (file1, file2) and a required email field “provide your mail Id”. This trigger activates the workflow once a user submits this form.
Tip: The webhook URL is generated automatically; test your form submission to confirm triggering.
Step 2: Split Uploaded Files Into Separate Items
Add a Code node named “split the binary item” after the form trigger. Copy-paste this JavaScript code to iterate over all binary fields in the submission and output each file as a separate item:
// Get all input data
const items = $input.all();
const splitItems = [];
items.forEach(item => {
if(item.binary){
for(const [key, value] of Object.entries(item.binary)){
splitItems.push({json:{}, binary: {data: value}});
}
}
});
return splitItems;
The outcome: each file is an independent data item for parallel processing.
Step 3: Process Each File in Batches
Use a SplitInBatches node “Loop Over Items1” to handle one file at a time. This helps manage API rate limits.
Step 4: Send Files to LlamaIndex Parsing API
Configure HTTP Request node “Parsing the document” to:
- Method: POST
- URL: https://api.cloud.llamaindex.ai/api/parsing/upload
- Body: Multipart form with field “file” referencing binary data
- Headers: Include Bearer token authorization
Expected: The API responds with a job ID for asynchronous parsing.
Step 5: Poll Parsing Status
Add an HTTP Request node “Check the parsing status” that queries the job status endpoint by job ID. Use If node “If2” to check if the status returned is “SUCCESS”.
If success, proceed to “Provide the markdown” node; otherwise, repeat polling after some delay.
Step 6: Retrieve Markdown Result and Aggregate
Once successful, call HTTP Request node “Provide the markdown” to get the parsed document as markdown text. Then use Aggregate node to combine markdown from all files into one text stream.
Step 7: Translate & Analyze the Combined Text
Send the aggregated markdown to LangChain Google Gemini Chat node “Translator Agent” which checks the language and translates non-English text to English, attaching original content.
Follow with Analyzer Agent (LangChain Agent node) for comprehensive prompt text analysis, reformatting, and preparing for storage.
Step 8: Convert Analyzed Text to Files and Store in Pinecone
Use ConvertToFile node to save the analyzed output as a text file. Insert the content into the Pinecone Vector Store node named “Pinecone Vector Store” enabling semantic search features for chatbot retrieval.
Step 9: Send Email Summary with Attachments
Use the Gmail node to send the prepared text file back to the user email collected from the form field. The email includes a link to an interactive chatbot for detailed Q&A.
Step 10: Enable Chatbot Interaction
Set up Chat Trigger node “When chat message received” with a webhook URL to listen for user messages from the chatbot interface.
This triggers a retrieval chain: Retriever Vector Store → Question and Answer Chain → AI Agent nodes to fetch relevant document knowledge and respond intelligently.
5. Customizations ✏️
- Adjust File Types Accepted: In the Form Trigger node, add or remove file fields to accept different or more file formats.
- Change Parsing Service: Replace the LlamaIndex API endpoint with another document processing API by reconfiguring the HTTP Request nodes.
- Modify Email Template: In the Gmail node, customize your email body and subject dynamically using user data or markdown summaries.
- Enhance Chatbot Capabilities: Tweak the LangChain agent system messages to tailor chatbot responses for different industries or languages.
- Batch Size Tuning: Adjust batch processing size in SplitInBatches node to optimize throughput vs. API rate limits.
6. Troubleshooting 🔧
Problem: “Parsing status never reaches SUCCESS”
Cause: API token expired or file format unsupported.
Solution: Verify LlamaIndex API credentials, check formats, and review API response logs in HTTP Request nodes.
Problem: “Email not sent to user after processing”
Cause: Invalid email address extraction or Gmail node misconfigured.
Solution: Confirm email field mapping from Form Trigger and test using a fixed valid email address in Gmail node.
Problem: “Chatbot fails to retrieve relevant answers”
Cause: Vector store indexing incomplete or embedding failed.
Solution: Confirm embeddings are successfully inserted in Pinecone and re-index if necessary.
7. Pre-Production Checklist ✅
- Test form submission with multiple files and valid email.
- Monitor HTTP requests to LlamaIndex API for successful parsing response.
- Verify Markdown aggregation outputs consistent consolidated text.
- Check email delivery to submitted addresses with attachments intact.
- Test chatbot webhook and ensure queries receive relevant responses.
- Backup workflow configuration before enabling live activation.
8. Deployment Guide
Activate your workflow in n8n after thorough testing. Ensure all sensitive credentials (API tokens, Gmail account) are securely stored in n8n credentials manager. Monitor execution logs for errors. Use n8n’s built-in retry and alerting for robust operation. For production, consider scalable hosting options and API rate limit management.
9. FAQs
Q: Can I use other vector stores instead of Pinecone?
A: Yes, n8n LangChain nodes support various vector stores. Swap the Pinecone node with compatible alternatives by adjusting configurations.
Q: Does using Google Gemini incur extra costs?
A: Access to Google Gemini models is typically subscription-based. Check provider pricing for usage limits.
Q: Is my document data secure?
A: API calls use bearer tokens and HTTPS for secure transmission, but ensure you comply with your organization’s data policies.
Q: Can this workflow scale to hundreds of document submissions?
A: Yes, but monitor API rate limits and consider increased batch processing resources and queue controls.
10. Conclusion
By building and deploying this documented workflow, you transform cumbersome manual document analysis into a seamless, automated process. You gain hours back each week, improve consistency of summaries, and offer interactive chatbot support that delights users with instant, contextual answers. Pavithran now spends time on strategic tasks instead of tedious document reading.
Next, you could extend this automation by integrating additional document formats, adding sentiment analysis, or linking results to project management tools for greater workflow impact.
Let’s keep automating smarter and building helpful AI-driven tools for everyday work challenges!