Opening Problem Statement
Meet Joe, a busy content manager juggling dozens of audio recordings from interviews, meetings, and podcasts each week. Every time a new audio file lands in his Google Drive’s “Audio Recordings” folder, Joe faces a daunting manual task: downloading the file, transcribing the audio accurately, summarizing the lengthy transcript into insightful reports, and then saving those reports for his team. This process takes hours, costs money if outsourced, and is prone to human error, especially when Joe is swamped with other critical projects.
With no easy way to automate this complex multi-step task, Joe risks falling behind on delivering timely, organized meeting highlights and documentation for his colleagues. If only Joe had a smart assistant that could detect new audio files, handle transcription and summarization instantly, and neatly archive the results without any manual intervention.
What This Automation Does ⚙️
This n8n workflow solves Joe’s problem by automatically processing new audio files placed in a specific Google Drive folder. When triggered (either manually or via file creation trigger), it performs the following key outcomes:
- Searches Google Drive for the latest .m4a audio file uploaded into the “Audio Recordings” folder.
- Downloads the audio file for processing.
- Uses the OpenAI “audio transcribe” feature to convert speech to text with high accuracy.
- Generates a structured JSON summary of the transcript, capturing main points, action items, sentiments, and more.
- Converts the structured JSON into a readable Markdown report for easy sharing.
- Saves the raw transcript, structured JSON summary, and Markdown report back to Google Drive in the same folder for centralized access.
- Optionally sends notifications via Gmail and Telegram with links to the generated reports.
With this automation, Joe can save hours of manual work, avoid transcription errors, and provide polished reports that enhance team productivity and decision-making.
Prerequisites ⚙️
- 📁 Google Drive Account: For storing audio files and reports.
- 📧 Gmail Account: To send approval requests or notifications (optional human approval step included).
- 🤖 OpenAI Account: For transcription and AI summarization.
- 💬 Telegram Account: (Optional) To send notification messages with links.
- 🔑 n8n Workflow Automation Platform: Either n8n cloud or self-hosted instance.
Ensure OAuth2 credentials are configured for Google Drive, Gmail, OpenAI, and Telegram integrations before starting.
Step-by-Step Guide 🛠️
1. Start Workflow Manually or Use Google Drive Trigger
Navigate to your n8n instance and open this workflow. You can start it manually via the Manual Trigger or enable the Google Drive Trigger node to activate on new audio file creation in your specified folder.
Tip: The Google Drive Trigger watches the “Audio Recordings” folder for audio file uploads of MIME type “application/vnd.google-apps.audio”.
2. Search Google Drive for Audio Files
The “Search Google Drive” node looks into the “Audio Recordings” folder using its Folder ID to list files uploaded recently.
Configuration Highlight: Make sure to set the folderId to your own Google Drive folder ID containing audio files.
If using the manual trigger, this step still executes to fetch the latest files.
3. Filter .m4a Audio Files
Next, the Filter node ensures only files with a .m4a extension are processed. This avoids accidental processing of other file types in the folder.
Common Mistake: Forgetting to adjust filtering if your files have different extensions like .mp3.
4. Limit to the Most Recent File
The Limit node keeps only the last audio file (newest) to process in this workflow cycle, preventing batch overload.
5. Download the Audio File
The Google Drive node downloads the selected audio file by its ID for local processing within the workflow.
6. Transcribe Audio Using OpenAI
Pass the downloaded audio to the OpenAI (Langchain) node configured for the transcribe audio operation. This node sends the audio file to OpenAI’s transcription API and returns a text transcript.
Technical Insight: The node uses the audio resource and transcribe operation with your OpenAI credentials.
7. Set Transcription Context Variables
The Set node stores the transcript text and timestamp for use in subsequent steps.
8. Summarize Transcript to Structured JSON
Using another OpenAI Langchain node, the workflow converts the raw transcript text into a structured JSON summary. This summary includes title, main points, action items, sentiments, and more, formatted for clear understanding.
This structured summary helps teams digest long transcripts quickly.
9. Summarize Transcript to a More Detailed JSON Report
Another OpenAI node takes the transcript and creates a more elaborate technical document structured with headers, lists, executive-summaries, and detailed analysis.
10. Convert JSON Summary to Markdown
To provide a user-friendly version, a separate OpenAI node converts the JSON transcript summary into Markdown text.
This Markdown report is suitable for sharing or publishing in team documentation.
11. Generate Filenames for Saved Files
Two Set nodes dynamically create filenames for the JSON and Markdown files, including the original audio file ID, name, and current timestamp.
12. Save JSON and Markdown Files to Google Drive
The Google Drive nodes create the transcript summary files in the same “Audio Recordings” folder, storing the clean JSON and Markdown report for reference.
13. Retrieve File Metadata and Prepare Response
Separate Google Drive nodes fetch the metadata for the newly created JSON and Markdown files to get web view links. Set nodes gather this data preparing it for notifications.
14. Save Raw Transcript Text
The raw transcript text file is also saved back to Google Drive with a .txt extension for archival.
15. Notify Users via Gmail and Telegram
Finally, the workflow sends automated notifications through Gmail and Telegram, sharing the generated report links with users like Joe.
This closes the automation loop by delivering results directly to stakeholders.
Customizations ✏️
- Change Audio File Type Filter: Adjust the Filter node to process other audio formats such as
.mp3or.wav. Change therightValueto your desired extension. - Skip Human Approval: Disable or bypass the Gmail User for Approval node if you want fully automated processing without manual confirmation.
- Configure Notifications: Modify or add messaging nodes to customize recipients or include Slack instead of Telegram.
- Advanced Summarization: Tweak the OpenAI prompts inside the Summarize to Structured JSON or Summarize to JSON nodes to fit specific report formats like meeting minutes or interview highlights.
- Change Google Drive Folder: Update folder IDs in all Google Drive nodes to store files in a different location.
Troubleshooting 🔧
Problem: “OpenAI transcription fails or returns empty text.”
Cause: Audio file is corrupted, unsupported format, or API quota exceeded.
Solution: Verify audio file integrity; convert to supported .m4a format if needed; check OpenAI API key limits.
Problem: “Google Drive nodes can’t find or upload files.”
Cause: Wrong folder ID or insufficient permissions.
Solution: Double-check folder IDs in each node and ensure OAuth credentials have full access.
Problem: “Notifications not received in Gmail or Telegram.”
Cause: Incorrect chat ID or email address.
Solution: Verify environment variables EMAIL_ADDRESS_JOE and TELEGRAM_CHAT_ID are correct and active.
Pre-Production Checklist ✅
- Verify Google Drive Folder ID is correct and accessible.
- Test OpenAI API with a sample audio file outside n8n to confirm transcription works.
- Check Gmail and Telegram credentials and environment variables for notifications.
- Run manual test with recent .m4a audio and confirm files are saved back to Drive.
- Backup existing audio and transcript files before full deployment as a precaution.
Deployment Guide
Once tested, activate the Google Drive Trigger node for automatic processing on new audio uploads. Keep the Gmail User for Approval enabled if you want manual review, or disable it for seamless automation. Monitor workflow runs periodically in n8n to check status and adjust for scaling if processing many files daily.
FAQs
Can I use MP3 or WAV files instead of M4A?
Yes, just update the filter node to match your file extension and ensure OpenAI supports your format.
Does this consume OpenAI API credits?
Yes, transcription and large language model summarizations use your OpenAI account’s credits.
Is my data secure with this workflow?
OAuth2 is used for all integrations, and your files remain private within your Google Drive.
Can I get notified via Slack instead of Telegram?
Definitely. Replace or add a Slack node with your webhook URL for notifications.
Conclusion
After setting up this comprehensive n8n workflow, you’ve automated the entire process of audio transcription, AI summarization, and report generation—directly integrated with Google Drive. Joe now saves hours weekly, eliminates manual transcription errors, and delivers insightful reports effortlessly. Next, consider expanding to multi-language transcription, real-time audio processing, or integrating with project management tools for task tracking.
You’re well on the way to mastering AI-powered automation that transforms tedious manual tasks into seamless, reliable workflows!