Your User Is Not You

This is the post I keep wanting to write at the end of every quarter and only have the data to write now.

2024 was the year every company shipped an AI feature. By any reasonable count, enterprise gen-AI spend grew six times, from $2.3 billion to $13.8 billion. Apple shipped Apple Intelligence. Microsoft shipped Copilot everywhere. Slack shipped Slack AI. Salesforce shipped Agentforce. Every B2B SaaS company shipped a sparkle icon. The race was on, and the race was treated as a race.

Then the year-end data came in, and it is sobering.

The Slack Workforce Lab's Fall 2024 Workforce Index found AI adoption growth stalled over the summer: US desk worker AI use crawled from 32% in May to 33% in August. Only 7% of workers consider themselves expert users. 48% would be uncomfortable telling their manager they use AI.

Today, SellCell's survey of 2,000+ iPhone and Galaxy owners found that 73% of iPhone users and 87% of Galaxy AI users say the new AI features add little or no value, and 58.4% of eligible iPhone owners had never tried a single Apple Intelligence feature six weeks after launch.

Menlo Ventures' year-end state of enterprise gen-AI reports enterprise gen-AI spend grew from $2.3 billion to $13.8 billion year-over-year, while the top reason pilots fail is disappointing ROI, with over a third of buyers lacking a clear implementation vision.

Marc Benioff was calling Microsoft Copilot "Clippy 2.0" in October, citing a Gartner stat that only 6% of surveyed IT leaders had moved a Copilot pilot to wide adoption. Self-interested attack, sure. But the number checked out. Every flagship AI product of 2024 is underperforming the headline.

I do not think this is a model quality problem. The models got better all year. Claude 3.5 Sonnet, GPT-4o, o1, Gemini 1.5 Pro. They are demonstrably more capable in December than they were in January. The capability curve went up. The adoption curve went flat.

I think the reason is that the people building these features, myself included, shipped them for ourselves. The users are not us.

The Curse of Being Fluent

If you are reading this, you probably use Claude or ChatGPT every day. You have ChatGPT muscle memory. You know how to phrase a prompt to get past a refusal. You know what the model is good at and what it isn't. You can tell the difference between a hallucination and a real answer, usually. You assume the people who use the products you ship can do all of this too.

They cannot. The base rate of LLM fluency in the general population is much lower than the base rate inside the companies building AI features. It is lower in your customer base than you think it is. It is lower in your power users than you think it is. It is lower in your own colleagues outside engineering than you think it is.

I have been making this argument for most of 2024. I wrote in February that putting AI behind a chat box was product engineering laziness, because users who don't know how to prompt hit one bad response and never come back. The empirical data this year has been a slow, public confirmation of that thesis. Every AI feature behind a "Try AI" button and a modal performed worse than every AI feature embedded into the existing workflow. Not by a little. By multiples.

The Slack Workforce Lab number that haunts me is the 7% who self-identify as expert users. That is the ceiling of fluency in the workforce right now. Everything you ship is being used by the other 93%. If your feature requires the 7% behavior, it does not have a market.

The Google Parallel, Revisited

I keep coming back to Google in 2002, because it is the cleanest analogue to where we are with AI in 2024.

Search felt impossible to non-engineers in 2002. The blank box was hostile. You typed words and got a list of links, and you had no idea why those links and not others. You had to learn to use it: keyword search, not sentence search. Quotes meant something. Boolean operators meant something. The "I'm Feeling Lucky" button was a joke that took a year to land. Plenty of people in 2002 used Yahoo's directory or asked their nephew because Google was, to them, broken.

It took a decade to turn "google" into a verb. The product did not change much. The users adapted. The next generation grew up with the assumption that you typed what you wanted and the right answer appeared. A generational shift, not a UX shift.

Chat in 2024 is hard the same way. The blank box is hostile. Users don't know what to type. They don't know what the model can do. They don't trust the output. The ones who do figure it out (the 7%) are the ones who use it daily and develop fluency by repetition. The other 93% are not going to develop that fluency this year, and the products that bet on them developing it this year are the ones with the disappointing ROI numbers.

The fluency is going to come. It is going to take a decade. The companies that win between now and then are the ones who build for the 93%, not the 7%.

What Actually Worked This Year

The features that found PMF in 2024 share a shape.

ChatGPT itself worked, because ChatGPT is the AI. The chat is the product. Users go there expecting to converse with a model. The expectation matches the surface. That works.

Cursor worked, because the LLM is embedded in the workflow the developer is already in. The user does not have to learn a new vocabulary or open a new modal. They get tab completions, inline edits, and a side panel that opens to a familiar prompt. The fluency required is low because the affordance is high.

GitHub Copilot worked, for the same reason at a smaller scale. Inline completion. No modal. No box.

Notion AI's inline blocks worked. Linear's auto-titling worked. Gmail's Smart Compose worked. Every place an LLM call got hidden behind a normal-looking button or autofill, adoption was strong.

The pattern is the same as what I have been writing all year. Hide the model behind familiar UI. Treat the agentic pipeline as the unit of work, not the chat exchange. Build the eval infrastructure that lets you swap models without users noticing. Meet users where they already are.

What did not work was every product that asked the user to be the prompter. Apple Intelligence's Writing Tools? Most users don't open them. Microsoft Copilot's chat panel in Office? See Benioff. The hundreds of "Ask AI" buttons across SaaS? Vanishingly low engagement. The pattern is loud in the data once you start looking for it.

The Counterargument Worth Taking Seriously

Ben Thompson made the strongest version of the contrary read in September: copilots lose because they require users to change behavior, and most users won't. The first big wave of enterprise AI, in his framing, is not assistive features that augment work. It is autonomous agents that replace work entirely, so the user does not have to learn anything new.

I think Thompson is half-right. He is right that the assistive copilot pattern hits a behavior-change ceiling. He is right that "you have to remember to use the new tool" is a losing product position. He is right that the autonomous shape (the model does the work, the user reviews the output) is the future.

He is wrong, in my read, that this means skipping the embedded UI step. The autonomous shape still needs surfaces. Surfaces for the user to specify what they want. Surfaces for the user to review the output. Surfaces for the user to course-correct.

Those surfaces should be embedded in the workflows users are already in, not in a new "agent console" tab that users have to remember to open. The autonomous-vs-assistive distinction is real, but the embedded-vs-overlay distinction is the one that determines whether the feature ships or sits unused.

A Q1 2025 Punch List for Product Engineers Shipping AI

If you are building AI features going into next year, these are the moves the data supports:

Treat the engineer on your team who uses AI the least as the canary. The engineers fluent in Claude are going to ship features the engineers not fluent in Claude can't use. Reverse the polarity. Ask the engineer who has never opened ChatGPT to try the feature. Their bounce is the bounce you should design against.

Audit every AI surface for the "fluency tax." For each AI feature you shipped this year, ask: does using it require the user to know how to prompt? If yes, redesign or rebuild. The 7% will keep using it. The other 93% are the market.

Replace at least one chat surface with an embedded action. Pick the highest-traffic chat in your product. Identify the three most common intents users actually have. Ship those as buttons, hide the LLM behind them, and watch the engagement numbers move.

Stop measuring AI adoption with "have you used the feature." Start measuring outcome. Adoption of a button click is meaningless. The shapes that actually matter: accept rate on inline suggestions, return rate within seven days, intent-completion rate per surface, and bounce on first interaction. Most teams I talk to do not have these instrumented yet. Fix that in January.

Get the eval infrastructure in place. Not because models are getting better, though they are. Because you are about to be asked to swap models, change prompts, and ship faster, and you cannot do any of that responsibly without evals telling you whether you regressed user-visible behavior.

Where I Land

The single biggest product engineering mistake of 2024 was assuming users were like the people building features. They aren't. They are less fluent, less patient, and less forgiving than I expect. They bounce on the first bad reply. They do not iterate. They do not develop the workflow we assume they will develop.

The teams that figured this out shipped features that found PMF. The teams that didn't shipped sparkle icons and called it AI strategy.

2025 is the year the gap between those two postures starts showing up in revenue, not just engagement metrics. The Menlo data point this year, where disappointing ROI was the top reason enterprise pilots failed, is the leading indicator. Next year that pressure moves from pilots to renewals.

Build for the 93%. Hide the model. Meet users where they are. The capability curve is going to keep going up. Whether the adoption curve follows it is a product engineering question, not an AI question.

Your user is not you. Start there.