What this workflow does

This workflow helps handle different types of WhatsApp messages automatically using n8n and OpenAI.

It solves the problem of spending many hours manually reading and replying to texts, voice notes, images, and PDFs on WhatsApp.

The outcome is faster, smarter replies that make customer chats easier and more accurate.

Who should use this workflow

This is for anyone using WhatsApp Business API who wants to reply quickly to multi-format messages.

It is good if you get many voice notes, photos, PDFs, or long texts from clients.

Non-technical users can benefit if they use n8n and OpenAI for automation.

Tools and services used

WhatsApp Business API: Receives messages and sends replies.

n8n: Automates the workflow with nodes to process different inputs.

OpenAI GPT-4o-mini model: Creates smart text replies and analyzes images.

OpenAI Whisper model: Converts voice messages to text.

HTTP Request nodes: Download media like audio, images, and documents.

Credential nodes: Manage secure access to WhatsApp API and OpenAI API.

Inputs, processing steps, and outputs

Inputs

Incoming WhatsApp messages which may be of different types: text, voice audio, images, PDFs, or other documents.

Processing steps

The WhatsApp Trigger node catches every new message.

A Switch node sorts messages into text, audio, image, PDF document, or unsupported types.

Texts go to the AI node for response generation.

Voice messages get media URLs, are downloaded, then transcribed by OpenAI Whisper before AI reply.

Images are downloaded and analyzed by OpenAI GPT-4o-mini to create detailed descriptions.

PDFs are downloaded, text is extracted, summarized, then used by AI to answer.

Unsupported types get a polite notification about allowed formats.

At the end, replies are sent back via WhatsApp. Audio replies use a special fix to set correct MIME types so they play properly.

Outputs

Clear, relevant text or audio replies sent to users on WhatsApp.

Beginner step-by-step: How to build this in n8n

1. Import the workflow

Download the workflow file using the Download button on this page.

Open the n8n editor and click “Import from File” to load the workflow.

2. Configure credentials

Add WhatsApp API OAuth credentials in n8n Credential Manager.

Add OpenAI API Key with access to GPT-4o-mini and audio/image models.

Update any IDs, emails, or channel info if your setup requires it.

3. Test and activate

Send test messages in different formats to your WhatsApp business number.

Check if replies come correctly: text replies for text messages, transcriptions for voice, descriptions for images, and summaries for PDFs.

When satisfied, toggle the workflow live in n8n to run in production.

You can explore self-host n8n if you want full control on a private server setup.

Handling message types: Input → Process → Output

Text: Input text is sent to AI GPT-4o-mini which generates a quick reply text. Output is text sent back on WhatsApp.

Voice messages: Input audio media ID fetches URL → download the audio → transcribe with OpenAI Whisper → send the transcript to GPT-4o-mini for reply → output text or audio reply.

Images: Image media downloaded → base64 encoded → analyzed by GPT-4o-mini for detailed descriptions → output descriptive text reply.

PDF documents: Document URL fetched → file download → text extracted → summarized by AI → output reply with summary or main points.

Unsupported: If message type is not one above, output a polite WhatsApp message listing allowed content types.

Edge cases and failures

If WhatsApp media URL fails, check OAuth credentials and refresh tokens.

If OpenAI API returns an error, verify API keys and monitor the quota usage.

If audio replies do not play, the workflow fixes MIME types using a Code node before sending.

Possible customizations

Switch OpenAI model in the AI Agent node to change reply style or complexity.

Add multi-language support by changing AI prompt to detect and reply in different languages.

Tweak audio voice or quality in audio generation node for branding.

Extend supported files beyond PDFs by adding checks and extraction for other document types.

Customize message templates with branding or extra info in the response Set nodes.

Summary of outcomes

→ Saves hours per day by automating message handling on WhatsApp.

→ Reduces transcription mistakes in voice messages.

→ Creates quick, clear, and helpful replies for texts, images, audio, and documents.

→ Improves customer conversations with richer content and less manual work.

← Back to All Workflows

Build an AI-Powered WhatsApp Chatbot with n8n & OpenAI

This workflow automates AI-driven responses on WhatsApp for texts, voice, images, and PDFs, saving hours in customer interaction handling with instant, smart replies powered by OpenAI and n8n.

whatsAppTrigger

openAi

httpRequest

Workflow Identifier: 1175

NODES in Use: whatsAppTrigger, httpRequest, openAi, lmChatOpenAi, agent, extractFromFile, if, code, set, whatsApp, switch, memoryBufferWindow

Updated: June, 2025

Press CTRL+F5 if the workflow didn't load.

Learn how to Build this Workflow with AI:

Visit through Desktop for Best experience

Frequently Asked Questions

Can this workflow be used with other chat apps besides WhatsApp?

This workflow is built specifically for WhatsApp Business API. The AI parts can be adapted, but other chat platforms need different message trigger and send nodes.

Does this workflow use many OpenAI API credits?

Yes, generating replies and analyzing media calls OpenAI APIs. Usage costs depend on message volume and prompt complexity.

How does the workflow handle user privacy?

Messages are processed in real time and not stored permanently. This limits sensitive data exposure following good security practices.

What should be done if audio replies do not play on WhatsApp?

A Code node fixes the audio MIME type before sending. Make sure this node runs correctly to fix playback issues.