Voice Is Going to Fail for the Same Reason Chat Did

Two things can be true at once. Voice as a default surface, the "talk to your AI" product, is going to fail the way chat boxes did in 2024. Voice as an embedded wedge, the kind that's quietly running insurance intake and recruiter screens, has already won. The teams confusing the two are about to find out which side they bet on.

Anthropic shipped Claude 4 six days ago. Read the launch post and notice what isn't there. No voice mode. No consumer chat flourish. No new modality demos. The entire announcement is about coding, long-horizon agentic work, and the IDE. Claude Code, Cursor, Replit, and Block are the headline customers. The frontier lab itself shipped a launch that says: the surface is the workflow you are already in, not a new thing to talk to.

Meanwhile, every product roadmap I am reviewing this quarter wants to add voice. ChatGPT advanced voice mode. Apple's agentic Siri (now officially slipped to 2026). Sesame's demos. Voice-first startups still raising. The replay of the 2024 chat-bolt-on cycle is on speed, in a new modality.

I made the chat box argument in February of last year and the empathy gap argument in December. They both apply here, with the one important refinement above. The teams whose product roadmaps don't sort their voice bets into the right column by Q4 are going to spend Q1 2026 rewriting them.

The Pete Koomen frame

Pete Koomen wrote the spring 2025 essay that everyone in product circles now points at. "AI Horseless Carriages", April 23. His running example is the Gemini "draft" button in Gmail, the one where you click and Gemini writes a generic email you then have to edit into something that sounds like you. Koomen's line:

The Gmail team built a horseless carriage because they set out to add AI to the email client they already had, rather than ask what an email client would look like if it were designed from the ground up with AI.

The horseless carriage problem is now the dominant failure mode in AI product engineering. You take an existing surface, you bolt an AI feature on, you ask the user to do the prompting work the AI was supposed to eliminate. The user does it once, badly. The output doesn't fit their voice. They never use the feature again.

Allen Pike followed up a week later with "Post-Chat UI", which is the cleanest statement of the architectural alternative I have read this year:

While chat is powerful, for most products chatting with the underlying LLM should be more of a debug interface, a fallback mode, and not the primary UX.

That is the right shape. Chat as the debug interface for power users. Embedded as the default. The argument generalizes one-to-one to voice. Voice as the surface you talk to is the horseless carriage; voice as the channel for a workflow that already worked is the actual product.

The voice data is uncomfortable for both camps

The "voice is failing" reading is wrong. The "voice is winning" reading is also wrong. The data through Q1 2025 says something more specific.

a16z's January report on AI voice agents is the bullish read. Voice is winning where it is embedded in narrow, high-volume B2B workflows: recruiting screens, insurance intake, healthcare scheduling, sales follow-up. The user doesn't know an LLM is involved, doesn't have to learn to prompt, doesn't develop a relationship with a personality. Voice is just the channel. a16z's own framing, which I think is exactly right:

Voice will become the wedge, not the product.

a16z's "Top 100 Gen AI Consumer Apps, 4th edition" (March 6) is the empirical counterweight. The top of the list is Lovable, Bolt, and Cursor. The breakouts are tools that produce artifacts, not conversations. Voice-related apps that show up are the embedded utility kind: notetakers (Granola, Fathom, Otter), specialist creation tools (Suno, ElevenLabs). Standalone voice-first consumer products do not crack the rankings.

The Pi shutdown story from April makes the point bluntly. "The Rise and Fall of Inflection's AI Chatbot, Pi" has a quote from a former Inflection engineer that I keep going back to:

It's only really useful if you want to talk about your feelings.

Humane bricked every AI Pin in February. Rabbit r1 still ships but the discourse has moved on. The pattern is consistent. When voice is the product, users churn. When voice is one channel inside something they were going to do anyway, it sticks.

The fluency tax in a new modality

I keep coming back to a question I asked in December's user post: are we shipping for ourselves or for the 93% of users who don't have AI fluency?

Voice does not solve the fluency tax. It moves it. A user who didn't know what to type into a chat box also doesn't know what to say to a voice agent. The same first-try bounce I described in last February's chat post happens in voice, with one extra failure mode: voice doesn't give the user a way to look at their own input. They can't see what they said. They can't refine it. The interaction is more ephemeral and the cost of a bad reply is higher.

The teams that win in voice this year are the ones who use voice for the actual user behavior voice is good at: hands-busy, eyes-busy, narrow-task, scripted-domain interactions. Notetakers work because the user is already speaking, the AI is just transcribing and structuring. Insurance intake works because the script is fixed and the user is answering questions, not asking them. The model is doing recognition and form-filling, not conversation. The user does not have to be the prompter.

The teams that fail are the ones who put a microphone button on an existing app and tell the user "ask me anything." That is the chat box with worse affordances.

The IA layer is often the real failure

One pattern I have watched across product reviews this quarter: when an AI feature underperforms, the team's first instinct is to blame the model. Switch from Claude to GPT, swap context window, fine-tune. Half the time, the model is fine. The information architecture under it is what's broken.

Concrete shape. A team ships a "summarize this document" feature. The summaries are bad. They try a stronger model. The summaries are still bad. They eventually realize the documents have inconsistent headings, the metadata is stale, the source corpus is missing the half of the canonical content that lives in a different system, and the model is doing exactly what you would expect a model to do with bad inputs. The IA was the problem all along.

This applies to voice doubly. A voice agent that needs to answer questions from a knowledge base is exposing whatever your knowledge base actually contains. If your docs are organized for SEO and not for retrieval, your voice agent sounds incoherent. If your tools are wrappers around three different stale internal systems, your voice agent gives wrong answers confidently. The model is not the problem. The IA underneath is.

I don't think this is well understood yet. It is going to be the story of the second half of 2025.

A Q3 punch list for product engineers shipping AI surfaces

If you are leading product engineering and your roadmap has voice or "agent" surfaces on it for H2:

Audit every new AI surface for the "horseless carriage" smell. Is the AI added to a thing that already existed, or is the thing redesigned around the AI? If the former, the surface has a ceiling and you should know what it is before you commit headcount.

Stop shipping default-chat surfaces. Chat as a fallback for power users is fine. Chat as the primary affordance for a non-fluent user is a known failure mode. If your product currently opens to a chat input, your product currently has a bounce problem.

For voice specifically, find the wedge. Where is your user already speaking? Where is the script narrow enough that the model can recognize and respond rather than converse? Build there. Don't build "talk to your product."

Audit your IA before you audit your model. When an AI feature is failing, the next move should be looking at the inputs the model is getting, not the model itself. If your docs are stale, your tools are wrappers, your metadata is lossy, the strongest model in the world will produce the same answer.

Ship the prompt under the hood for power users. Almost no one will look at it. The few who do will replicate your behavior on their own terms and become the evangelists for your AI feature. The cost of exposing it is small. The benefit is asymmetric.

Where I land

Claude 4 launched without a voice mode and almost no one in product engineering noticed. That is the signal. The frontier lab that owns the highest-quality models in the world looked at what 2025's surface war is actually about and decided the answer is workflows, not conversations. The product engineering teams that read that signal are quietly building embedded AI inside the surfaces their users already use. The teams that didn't are building voice products that are going to look like Pi by year-end.

The user is still not you. The user is not your CEO who loves talking to Claude in his car, either. Your user is the 93% who do not have a relationship with an AI. They want their product to be smarter. They do not want a thing to talk to.

Build for that.