April 13, 2026
6 min read
Share article

The True Cost of an AI Voice Agent Per Minute (2026 Breakdown)

Itemized cost breakdown of an AI voice agent per minute

Vapi advertises a base rate near $0.05 per minute. Retell publishes roughly $0.07. ElevenLabs sits around $0.08 for standard usage and can burst to about $0.16 under load. If you built your client pricing around those numbers, you built it on a myth. Those are component rates for a single layer of the stack, not the price of running a working voice agent on a real phone call. Once you add the reasoning model, the transcription, the voice synthesis, and the telephony that actually carries the call, the real all-in cost lands between $0.13 and $0.33 per minute. That is a three-to-six-times gap between the marketing number and the invoice, and it is where agencies quietly bleed margin.

This is the breakdown to internalize before you quote a single client. Every dollar you misjudge here comes straight out of your retainer.

Why the Advertised Rate Is Only One Line Item

A voice agent is not a single product. It is a pipeline of at least four services working in sequence: speech-to-text turns the caller's words into text, a language model decides what to say, text-to-speech turns that answer back into a voice, and telephony connects the whole thing to an actual phone number. The rate a platform advertises usually covers its own slice of that pipeline, or its orchestration layer, and quietly assumes you will pay for the rest separately.

That is not deception so much as convention, but it catches agencies constantly. When Vapi says $0.05, it means the Vapi layer. When ElevenLabs quotes $0.08, it means the voice synthesis. Neither number includes the model reasoning or the phone line. The trap is treating a component rate as a total, and it is the most common pricing mistake in this entire category.

Line Item 1: The Platform Layer

This is the advertised number: roughly $0.05 for Vapi, $0.07 for Retell, in that neighborhood. On an orchestration platform this layer buys you the coordination between components. On a more bundled platform it may fold in more of the stack, which raises the headline rate but lowers what you owe elsewhere. Either way, read carefully what the platform rate actually includes, because that determines how much of the rest of this list you are personally responsible for.

Line Item 2: The Language Model

The model is the brain, and on most stacks it is billed separately based on tokens consumed per conversation. A chattier agent, longer prompts, and larger context windows all push this up. A frontier model produces better conversations and costs more per minute than a smaller, faster one. This is a genuine quality-versus-cost lever: you can materially change your per-minute economics by choosing a lighter model, but you may pay for it in how the agent handles nuance. For an agency, matching model choice to the client's actual complexity is where a lot of margin is won or lost.

Line Item 3: Speech-to-Text and Text-to-Speech

Transcription and voice synthesis are two more meters running the entire call. Speech-to-text accuracy matters because errors cascade into the model, and premium voice synthesis is what makes an agent sound human rather than robotic. ElevenLabs anchors the high end of voice quality at around $0.08 per minute for standard usage, with burst pricing near $0.16 when demand spikes. That burst behavior matters for agencies running outbound campaigns, because a concentrated calling window is exactly the scenario that triggers the higher rate. Budget for the burst, not the base.

Line Item 4: Telephony, the Cost Everyone Forgets

Here is the one that surprises almost everyone: telephony is billed entirely separately, on top of everything above. The platform rate does not include the phone line. Providers like Twilio and Telnyx charge their own per-minute rate to originate and carry the call, and that charge stacks on top of the platform, the model, and the voice. We compare the two in our guide to Twilio vs Telnyx for AI voice agents, because the provider you pick meaningfully moves your all-in number. If you have ever quoted a client based on the platform rate and watched your margin evaporate, unaccounted telephony is almost always the reason.

Where a Voice Agent Minute Actually Goes (share of a typical all-in cost)

Platform / orchestration layer22%
Language model (reasoning)30%
Speech-to-text + text-to-speech26%
Telephony (billed separately, on top)22%

Adding It Up: The $0.13 to $0.33 Reality

Stack the four line items and a working voice agent costs between $0.13 and $0.33 per minute all-in. Where you land inside that band depends on your model choice, your voice provider, whether you hit burst pricing, and which telephony provider you route through. A lean configuration on a lighter model with an efficient voice provider sits near the bottom. A frontier model with premium voice synthesis during a burst-heavy outbound campaign pushes toward the top.

The practical takeaway is to price against the top of the band, not the bottom. If you quote a client assuming $0.13 and your real cost runs $0.28, you have signed away half your margin on a number you never controlled. Build your client price on the realistic all-in figure and let a favorable configuration become upside rather than the assumption that keeps you solvent. Run your specific numbers through the outreach ROI calculator before you commit anything in writing.

What This Means for Your Client Pricing

None of this argues against selling voice agents. A cost of $0.13 to $0.33 per minute is still dramatically cheaper than the human labor it replaces, which is exactly why the offer works. The point is that you must know your true cost to price with confidence and defend your margin when a client pushes back. Agencies that quote on the marketing rate get squeezed the moment volume scales; agencies that quote on the all-in reality keep their margin intact at any volume.

The other half of the equation is what a client is willing to pay, which has far more to do with the value of the outcome than your cost per minute. We work through pricing models in depth in how much to charge for an AI voice agent. Read the cost breakdown here to protect your floor, then read the pricing guide to set your ceiling.

The Bottom Line

The $0.05 per minute figure is real, but it is one line item pretending to be a total. The honest number, once platform, model, transcription, voice, and separately-billed telephony are all counted, is $0.13 to $0.33 per minute. Internalize that band, quote against its top end, and you will never again be surprised by an invoice. Whether you build on Vapi, Retell, or anything else, and whether you package your demos through a tool like Ciela to win the deal, the underlying unit economics are the same, and knowing them cold is what separates a profitable agency from a busy one.

Ciela is the demo platform for AI agencies and AI consultants. It turns any prospect's website into a live, personalized AI demo (chat, voice, or missed-call text-back) you can send before the first call.

Build a free live AI demoCiela pricingNiche demo playbooksAll agency playbooks

Community · Training

Join First Client Club — 215+ AI agency owners.

First Client Club is our free community for AI automation agency builders. Get our outbound-with-live-demos platform, AI content templates, and a room of operators landing clients in days.

Join First Client Club, free
22 people joined this week