ChatGPT Just Made AI a Product Engineering Problem

ChatGPT Just Made AI a Product Engineering Problem

Eight days ago, OpenAI introduced ChatGPT as a "research preview". The framing of the launch post was deliberately understated. A new sibling to InstructGPT, trained with reinforcement learning from human feedback, free during a feedback window, with a list of known limitations up front: hallucinations, sensitivity to phrasing, occasional verbosity. Nothing in that launch reads like a product announcement.

Then Sam Altman tweeted on December 5 that ChatGPT had crossed a million users in five days. The growth curve since then has been one I have not seen in this industry before. The Hacker News launch thread from November 30 hit hundreds of points within hours; the engineering audience, en masse, sat down on a Wednesday afternoon and started stress-testing the thing instead of reading marketing about it. Aaron Levie posted on December 3: "ChatGPT is one of those rare moments in technology where you see a glimmer of how everything is going to be different going forward." That tweet aged well within 48 hours.

For mobile platform engineers, the obvious reaction is: this isn't my problem. ChatGPT is a web product. The serious AI applications are still server-side. Most of the mobile-relevant ML work, the on-device stuff Apple keeps shipping at WWDC, is its own track. The natural conclusion is to keep your head down on the platform investments that mattered last week and reassess when the dust settles.

I think that reaction is wrong. Not subtly wrong. Architecturally wrong. The product engineering implications of ChatGPT are going to dominate the 2023 conversation, mobile is not exempt, and the platform investments mobile teams are starting to plan for next year need to take this into account.

What ChatGPT actually is, for engineers who don't follow ML

In one paragraph for engineers who don't track the field closely: ChatGPT is a tuned version of the GPT-3.5 series, OpenAI's refinement of the GPT-3 family they shipped two years ago. The technical breakthrough is not the model. The breakthrough is the surface. Conversation as the interface unlocks a use case the underlying model has had for a while but that nobody knew how to expose. The model itself produces plausible-sounding language. The conversational wrapping makes the produced language feel like talking to something. That difference, plausible language vs. conversation, is what shipped November 30 and is what every product team in tech is now staring at.

Ben Thompson's piece on December 5, "AI Homework", is the cleanest articulation of why this is a product event and not a research event. His line worth keeping: "The real skill in the homework assignment will be in verifying the answers the system churns out, learning how to be a verifier and an editor, instead of a regurgitator." That is exactly the shape the product engineering reckoning is about to take.

Why "this doesn't affect mobile" is wrong

The reflexive case for mobile being unaffected goes: ChatGPT is a chat product, mobile apps already have known interaction patterns, the relevant work for mobile is on-device ML which has its own roadmap, and besides, mobile orgs aren't staffed for AI. Each of those is partially true. None of them is the right read.

The right read, I think, is this. The product engineering conversation across the entire industry is about to shift toward "what does AI mean for our product." That conversation will be wrong-headed in most places. Almost every team will reach for the same answer: bolt a chat box onto the existing product. The chat box will be a sparkle icon in the corner that opens a modal that lets the user type to an AI. The product team will declare AI shipped. The metrics will be bad. The cycle will repeat.

Mobile is not exempt from this. If anything, mobile is going to be where the worst examples show up first, because mobile is where product engineering teams are most willing to ship a sparkle icon and call it innovation. The mistakes are going to be expensive. The opportunity cost of getting it wrong is going to be high.

The platform investments mobile teams are planning for 2023 need to account for this. Not by spinning up an AI team in January. By making the right groundwork investments now.

The product engineering reflex that's about to be wrong

The Stack Overflow moderators made an instructive call on December 5. Within six days of the launch they suspended ChatGPT-generated answers because the volume of plausible-but-wrong submissions was overwhelming their volunteer moderators. The mods' line: "the volume of these answers (thousands) has effectively swamped our volunteer-based quality curation infrastructure."

That's the failure mode of the next twelve months in one sentence. The cost of producing plausible content has dropped by an order of magnitude. The cost of verifying it hasn't. The institutions that depend on a roughly stable ratio between production and verification are going to break first. Stack Overflow is the first canary. Search is next. Customer support is after that. Any product surface where "plausibility" was a workable proxy for "correctness" is now broken.

The product engineering reflex that's about to be wrong is to apply this to mobile UI as "let users chat with our app's AI." Every product team is going to propose it. Most should not ship it. The reason they should not ship it is the same reason Stack Overflow had to ban the submissions. Confident wrongness at scale is a worse user experience than the slow-but-reliable thing it replaces.

The right reflex is to think about which existing mobile UI surfaces have a verification problem hidden inside them, and how to make those better. Auto-correct that you can train per user. Spam detection that explains itself. Search ranking that takes the user's actual context into account. Smart defaults in forms that the user can correct. These are the places where the new model class gives you a real product win and where the verification problem is bounded.

The platform investments that compound

Mobile platform teams looking at 2023 should think about a small number of foundational investments that compound regardless of what the AI product wave looks like.

Make your app introspectable. Most mobile apps are black boxes from the perspective of any system that wants to integrate AI features. The app's content, its state, its user actions, its history, none of it is structured or queryable. Even a basic local AI feature is going to need clean access to the user's recent actions and content. The work to make the app introspectable looks like ordinary platform work: better state management, structured content models, event streams. It compounds whether or not the AI features ever show up, because the same work also makes the app testable, observable, and analyzable.

Get serious about local data structure. If you have a year of users' content in your app and that content is stored as unstructured strings, you cannot do anything useful with it. The teams that have invested in well-typed, well-structured local data over the past few years are going to find themselves with a real moat. The teams that haven't are going to spend 2023 retroactively cleaning up their data layer to enable basic features.

Eval and instrument. If you ever ship an AI feature, you will need to know whether it works. The team that ships AI without an evaluation framework is going to ship a feature that breaks silently. The cost of building the evaluation framework is small. The cost of not having one is high. This is true today for any non-trivial feature, but the AI version is going to push the issue.

Own the bridge to whatever AI backend you eventually use. Don't hardcode against a single provider's API at the call site. Build a capability layer. The mobile app should ask for a "summarize this thread" capability and not know which model is on the other end. This is the same pattern that has worked for every external dependency for the past twenty years and the AI providers are about to be the most volatile external dependency a mobile app has had in a long time.

None of these investments require an AI team. They require platform discipline.

The org question

The harder question is who owns AI features in a mobile-first company. Most orgs today don't have a clear answer. The natural homes are product engineering (because it's about new product features), platform engineering (because it's about infrastructure), data science (because it's about models), or research (because it's about the underlying work). All of those are partially right. None of them is fully the answer.

The shape I would bet on for 2023 is a small AI product engineering team, embedded close to the product surfaces, with platform support for the bridge layer and data structures, and with eval infrastructure as a shared concern. The team is product engineering in flavor, not research. Their job is shipping features that work, not advancing the state of the model.

This is the same shape as the platform-vs-feature ownership question I have written about in the context of modularization and that runs through the mandates-vs-examples argument from last month. AI features in mobile are going to need the platform to own the bridge and the feature team to own the surface. If either side tries to own both, it will go badly. The pattern is consistent.

Where mobile platform engineers should pay attention in Q1 2023

If you are leading mobile platform engineering and trying to figure out what to do next quarter:

Read. The discourse is moving fast. Read Ben Thompson, Simon Willison, the OpenAI blog. Read the failure modes (Stack Overflow's ban) and the early demos (Simon Willison's Linux shell demo is the one I keep going back to). Your job for the next quarter is to be informed enough that when product teams come asking, you can have a real conversation.

Don't ship a chat box. Resist the proposal when it comes. It will come. The right answer is "what specific user problem are we solving" and the right response to "users want to chat with our app" is "no, they don't, they want their app to be smarter, and chat is the lazy version of that."

Hire one engineer who's serious about ML. Not a team. One engineer, if you run a platform org of meaningful size. The job is to keep up with the field, evaluate proposals, and be the bridge when the product teams want to do something. This is a real role and it pays for itself within a year.

Where I land

ChatGPT is not a research event. It is a product event, and it is the start of a product engineering cycle that is going to define the next two years. Mobile platform engineers who treat it as out of scope are going to find themselves on the back foot when their product orgs start shipping AI features in Q1.

The right response is not to spin up an AI roadmap. It is to make the small set of platform investments that compound regardless of how the AI product wave actually unfolds. Introspectable apps. Structured local data. Eval infrastructure. Capability layers over external providers. None of this is exotic. All of it is the kind of work platform teams should be doing for other reasons.

Eight days into the new era and the shape of the year ahead is already visible. Pay attention now. The teams that do are going to look very prepared in six months. The teams that don't are going to spend Q2 explaining to leadership why their AI roadmap is behind.