What this workflow does
This workflow helps handle different types of WhatsApp messages automatically using n8n and OpenAI.
It solves the problem of spending many hours manually reading and replying to texts, voice notes, images, and PDFs on WhatsApp.
The outcome is faster, smarter replies that make customer chats easier and more accurate.
Who should use this workflow
This is for anyone using WhatsApp Business API who wants to reply quickly to multi-format messages.
It is good if you get many voice notes, photos, PDFs, or long texts from clients.
Non-technical users can benefit if they use n8n and OpenAI for automation.
Tools and services used
- WhatsApp Business API: Receives messages and sends replies.
- n8n: Automates the workflow with nodes to process different inputs.
- OpenAI GPT-4o-mini model: Creates smart text replies and analyzes images.
- OpenAI Whisper model: Converts voice messages to text.
- HTTP Request nodes: Download media like audio, images, and documents.
- Credential nodes: Manage secure access to WhatsApp API and OpenAI API.
Inputs, processing steps, and outputs
Inputs
- Incoming WhatsApp messages which may be of different types: text, voice audio, images, PDFs, or other documents.
Processing steps
- The WhatsApp Trigger node catches every new message.
- A Switch node sorts messages into text, audio, image, PDF document, or unsupported types.
- Texts go to the AI node for response generation.
- Voice messages get media URLs, are downloaded, then transcribed by OpenAI Whisper before AI reply.
- Images are downloaded and analyzed by OpenAI GPT-4o-mini to create detailed descriptions.
- PDFs are downloaded, text is extracted, summarized, then used by AI to answer.
- Unsupported types get a polite notification about allowed formats.
- At the end, replies are sent back via WhatsApp. Audio replies use a special fix to set correct MIME types so they play properly.
Outputs
- Clear, relevant text or audio replies sent to users on WhatsApp.
Beginner step-by-step: How to build this in n8n
1. Import the workflow
- Download the workflow file using the Download button on this page.
- Open the n8n editor and click “Import from File” to load the workflow.
2. Configure credentials
- Add WhatsApp API OAuth credentials in n8n Credential Manager.
- Add OpenAI API Key with access to GPT-4o-mini and audio/image models.
- Update any IDs, emails, or channel info if your setup requires it.
3. Test and activate
- Send test messages in different formats to your WhatsApp business number.
- Check if replies come correctly: text replies for text messages, transcriptions for voice, descriptions for images, and summaries for PDFs.
- When satisfied, toggle the workflow live in n8n to run in production.
You can explore self-host n8n if you want full control on a private server setup.
Handling message types: Input → Process → Output
- Text: Input text is sent to AI GPT-4o-mini which generates a quick reply text. Output is text sent back on WhatsApp.
- Voice messages: Input audio media ID fetches URL → download the audio → transcribe with OpenAI Whisper → send the transcript to GPT-4o-mini for reply → output text or audio reply.
- Images: Image media downloaded → base64 encoded → analyzed by GPT-4o-mini for detailed descriptions → output descriptive text reply.
- PDF documents: Document URL fetched → file download → text extracted → summarized by AI → output reply with summary or main points.
- Unsupported: If message type is not one above, output a polite WhatsApp message listing allowed content types.
Edge cases and failures
- If WhatsApp media URL fails, check OAuth credentials and refresh tokens.
- If OpenAI API returns an error, verify API keys and monitor the quota usage.
- If audio replies do not play, the workflow fixes MIME types using a Code node before sending.
Possible customizations
- Switch OpenAI model in the AI Agent node to change reply style or complexity.
- Add multi-language support by changing AI prompt to detect and reply in different languages.
- Tweak audio voice or quality in audio generation node for branding.
- Extend supported files beyond PDFs by adding checks and extraction for other document types.
- Customize message templates with branding or extra info in the response Set nodes.
Summary of outcomes
→ Saves hours per day by automating message handling on WhatsApp.
→ Reduces transcription mistakes in voice messages.
→ Creates quick, clear, and helpful replies for texts, images, audio, and documents.
→ Improves customer conversations with richer content and less manual work.
