1. Opening Problem Statement
Meet Sarah, a junior analyst at a venture capital firm who spends countless tedious hours every week manually reviewing startup pitch decks submitted in PDF format. Each deck can be 20–50 pages long, full of diverse layouts, graphs, images, and tables. Extracting key data like funding stage, founders, traction metrics, and business model requires painstaking note-taking and cross-referencing. Inevitably, Sarah misses or misinterprets details, delaying decisions and costing her firm potential opportunities.
Imagine the time wasted: Sarah reviews up to 100 pitch decks monthly, spending an average of 45 minutes per deck. That’s over 75 hours a month lost to manual processing alone — not to mention the risk of inaccuracies or inconsistent reporting across the team. This workflow offers a transformative solution designed specifically to handle pitch decks, extracting and summarizing key information automatically and providing an AI-powered chatbot to answer deck-related questions instantly.
2. What This Automation Does
When triggered, this n8n workflow automates the entire pitch deck ingestion and analysis process end-to-end. Here’s what happens:
- Pulls pending pitch decks (PDF files) from Airtable database.
- Downloads each pitch deck PDF directly from Airtable attachment URLs.
- Splits each PDF into individual page images via Stirling PDF API, and extracts these images from a zip file.
- Resizes images for AI vision processing, then sends pages to an AI multimodal vision model to transcribe contents into detailed markdown, faithfully capturing text, tables, images, and charts.
- Aggregates all pages’ transcriptions and uses an AI information extractor tuned for VC analysis to generate a structured, comprehensive report with key startup data points.
- Upserts extracted data back into Airtable, updating the pitch deck record with metrics like founders, funding, traction, and various profiles.
- Uploads transcribed markdown to a Qdrant vector store, enabling semantic search capabilities over all processed pitch decks.
- Launches an AI chatbot agent integrated with the vector store for your team to instantly ask detailed questions about any pitch deck in the database.
This workflow saves hours of manual labor monthly, reduces errors dramatically, and empowers your investment team with instant insights and conversational access to pitch deck data.
3. Prerequisites ⚙️
- n8n Account — to build and run the workflow.
- Airtable — as the pitch deck database (with attachment URLs and columns to store extracted data).
- Stirling PDF API — free public API for splitting PDFs into images (optional self-host for privacy).
- OpenAI API — for AI vision transcription, chat models, embeddings, and information extraction.
- Qdrant Vector Store — for storing vector embeddings of pitch decks enabling semantic search and chatbot querying (self-host or cloud).
4. Step-by-Step Guide
Step 1: Trigger the Workflow on New Pitch Decks
Open your n8n editor and locate the Airtable Trigger For Pending Rows node. This node watches your Airtable base for pitch decks with a PDF file but missing the executive summary — meaning they need processing.
Configure it with your Airtable base and table IDs, and your Airtable personal access token. The formula ensures it only triggers on pitch decks ready for analysis.
Upon trigger, it passes data including the pitch deck name and file URL downstream for processing.
Common mistake: Forgetting to enable the trigger or incorrect Airtable credentials will halt workflow execution.
Step 2: Download the PDF Pitch Deck
The Download Deck From Airtable node takes the file URL from the trigger and performs an HTTP GET to fetch the binary PDF data.
Set this node’s URL parameter to: = {{ $json.File[0].url }} which dynamically pulls the file link from Airtable.
You should see the binary data attached when you inspect the node output.
Step 3: Convert Binary Data to File Object
The Extract from File node converts the incoming binary data into a property for further handling.
Follow with the Convert to File node to encode the property as a proper PDF binary file named data.pdf with MIME type application/pdf.
This prepares the PDF to be sent to the next node for conversion.
Step 4: Split PDF Into Page Images Using Stirling PDF API
The Split PDF into Images node is an HTTP Request node configured to POST the PDF file to Stirling PDF API.
Parameters include:
fileInput: binary file input nameimageFormat: set tojpgsingleOrMultiple: set tomultiplefor one image per pagedpi: 300 for high resolution
The node returns a zip archive containing individual JPG images for each page.
Privacy Note: If your document is sensitive, consider self-hosting Stirling PDF or replacing this with a private PDF-to-image service.
Step 5: Extract Zip Archive of Page Images
Use the Extract Zip File node to decompress the zip binary and produce separate binary items with each page image.
Step 6: Transform Images into a List for Sorting
The Images To List is a Code node that iterates over the separate binary images, converting them into individual items with filenames and binary data, making sure each page image is a separate item for the next step.
let results = [];
for (item of items) {
for (key of Object.keys(item.binary)) {
results.push({
json: {
fileName: item.binary[key].fileName
},
binary: {
data: item.binary[key],
}
});
}
}
return results;
This transformation is crucial for sorted and sequential processing.
Step 7: Sort Pages by Filename
The Sort Pages node sorts images alphabetically by their fileName property to ensure the page order is preserved exactly as in the original document.
Step 8: Resize Images for Vision AI Model
Resize Images For AI uses the Edit Image node to scale images to 50% width and height. This reduces processing overhead while maintaining quality for AI transcription.
Step 9: Transcribe Pages to Markdown using AI Vision Model
The Transcribe to Markdown node uses the Chain LLM from LangChain to process each resized image and transcribe detailed markdown output. The prompt instructs the model to faithfully reproduce all text, tables, charts, and images with descriptions.
Step 10: Combine Transcribed Pages
The Combine All Pages aggregate node concatenates all individual markdown transcriptions into one array for generating the report.
Step 11: Generate Detailed VC Report
The Generate Report node employs the Information Extractor AI model with a structured system prompt that mimics a seasoned VC’s evaluation. It pulls out specific data: founders, funding, business model, traction metrics, contact details, and investment fit analysis.
Step 12: Update Airtable Record with Extracted Data
The Update Pitchdecks Table Airtable node upserts extracted fields (e.g. Amount Raised, Team Size, Founders) back into the corresponding pitch deck record by matching on the company Name.
Step 13: Manage Vector Store Data
The Delete Existing Vectors HTTP Request cleans out previously indexed vectors for the current pitch deck from your Qdrant collection to keep data fresh.
The Pitchdecks Vector Store node inserts new vector embeddings created from the extracted text into Qdrant, making the pitch deck searchable.
Step 14: Enable AI Chatbot Interaction
The workflow includes nodes to support an AI chatbot agent that listens for chat messages, identifies companies being queried, and leverages the vector store to answer questions about pitch decks on demand using OpenAI chat models and memory buffers.
5. Customizations ✏️
- Replace Stirling PDF API with Private Instance: Swap the
Split PDF into ImagesHTTP Request node URL to your self-hosted Stirling PDF endpoint for data privacy. - Adjust Image Resize Percentage: In the
Resize Images For AIEdit Image node, modifywidthandheightto balance between AI accuracy and processing speed. - Customize VC Persona in Report: Edit the system prompt in the
Generate ReportInformation Extractor node to match your investment style or industry focus. - Change Airtable Base/Table: Update all Airtable node credentials and base/table IDs to connect to your own data source.
- Use Different Vision or Chat Models: Switch OpenAI Chat Model nodes to other available models or providers as preferred.
6. Troubleshooting 🔧
Problem: “No new records found to process.”
Cause: Airtable trigger’s filter formula might exclude records or the input conditions are not met.
Solution: Verify the Airtable formula and that pitch decks have files and no executive summary yet.
Problem: “Error uploading to vector store”
Cause: Incorrect Qdrant API URL or credentials.
Solution: Check the Delete Existing Vectors node for the correct Qdrant URL and update your credentials.
Problem: “Transcription output incomplete or inaccurate.”
Cause: Image resize parameters or AI model prompt might need tuning.
Solution: Adjust image quality or tweak the transcription prompt in the Transcribe to Markdown node.
7. Pre-Production Checklist ✅
- Verify Airtable credentials and base/table IDs are correct.
- Test Stirling PDF API endpoint with sample pitch deck PDF.
- Check OpenAI API keys and permissions for all AI model nodes.
- Confirm Qdrant vector store connection and collection configuration.
- Run test execution with a sample pitch deck and inspect intermediate outputs at key nodes (image extraction, transcription, report generation).
8. Deployment Guide
Once you have tested the workflow end-to-end, activate the Airtable Trigger For Pending Rows node to enable automatic processing of new pitch decks. The workflow will run each time a qualifying pitch deck is added or updated in Airtable.
Monitor executions via the n8n dashboard and check Airtable updates for proper data insertion. Set up alerting if needed for failures or errors.
9. FAQs
Q: Can I use a different PDF splitter than Stirling PDF?
A: Yes, any service or node that converts PDFs to images as individual pages can be substituted, including self-hosted options.
Q: Does this workflow consume a lot of OpenAI credits?
A: Usage depends on document length and number of OCR pages. Consider batch processing and monitor API costs.
Q: Is my pitch deck data secure?
A: If using public APIs like Stirling PDF, be cautious with confidential data. Self-hosting components improves security.
10. Conclusion
By following this tutorial, you’ve built a powerful n8n workflow automating pitch deck ingestion, transcription, data extraction, reporting, and AI chatbot interaction. This solution can save your team dozens of hours monthly, reduce errors, and provide instant, conversational access to pitch deck insights.
Next, consider automating investor follow-ups with personalized email campaigns, integrating your pitch deck analysis with CRM software, or expanding the vector store to handle other document types like investor reports or market analysis.
Happy automating and investing smarter with n8n and AI!