Writing
Enterprise Design Thinking for AI: What Breaks
Photo by Dimitri / Pexels

Enterprise Design Thinking for AI: What Breaks

design-thinkingai-product-managemententerprise-aiuxproduct-leadership

Enterprise Design Thinking for AI is mostly the right idea executed on the wrong assumption. The method is sound, and its bias toward user outcomes is exactly what AI projects need. But it was built for deterministic software, the kind where you design an interface to a known, fixed behavior, and AI is not that. Three of its load-bearing parts break on probabilistic systems, and if you do not rebuild them, you ship a beautifully facilitated workshop that produces a demo and not a product.

Start with the fact that should reframe the whole conversation. MIT's research found that roughly 95% of enterprise generative AI pilots produced no measurable profit, and the cause was integration and the learning gap, not model quality. RAND similarly found that more than 80% of AI projects fail for mostly organizational reasons. Read that plainly: enterprise AI is failing on workflow fit, trust, and adoption. Those are design problems. Which means design thinking is the correct lens. It just needs surgery.

What Enterprise Design Thinking actually is

IBM built Enterprise Design Thinking to run design thinking at the scale of a large organization. The core is the Loop, a continuous cycle of observe, reflect, and make, plus three Keys for keeping big distributed teams aligned: Hills, which are statements of intent written as user outcomes rather than features, Playbacks, which are story-driven reviews that surface misalignment, and Sponsor Users, real users embedded in the work from the start. A Forrester study commissioned by IBM reported teams using it were more efficient and faster to market, though a vendor-commissioned study is directional, not independent proof.

The reason it fits AI is the Hill. The single most common way enterprise AI dies is starting from the technology: we have an LLM, what can it do. Google's PAIR team names this directly as an anti-pattern, advising teams to emphasize the user benefit and not the underlying technology. A Hill forces the outcome first. That is design thinking's gift to AI, and it is more valuable now than it was for deterministic software.

The assumption that breaks

Here is the surgery. Classic design thinking, including IBM's, assumes you can substantially design the thing before users touch it. You research, you define, you prototype the intended behavior, you test it. The prototype represents the product because deterministic software does what you built it to do.

AI violates this. The behavior is probabilistic, so the same input can produce different outputs, and you cannot know what the model will actually do for your specific workflow until you run it against real data. You are now designing for two unknowns, the user and the model, and you discover the model's real capability envelope empirically, the same way you discover user needs. A design process that treats model behavior as a known quantity is designing on a foundation it has not checked.

Three parts of the method inherit this flaw. Each needs rebuilding.

Rebuild one: a prototype is no longer proof

In deterministic design, a working prototype is evidence. It does the thing, so the thing works. In AI, a demo that works once tells you almost nothing, because you saw one sample from a distribution. The flattering first demo is the single most expensive illusion in enterprise AI, because it greenlights budgets against behavior that does not hold up across the real spread of inputs.

So split the prototype in two. Prototype the experience the old way, including with Wizard of Oz methods where a human stands in for the model, to test whether the interaction is even desirable. Then validate the behavior separately and quantitatively, which is what PAIR's chapters on data, evaluation, and defining success are pointing at. In practice that means evals: a rubric for what a good output looks like, scored across dozens or hundreds of real cases, not a vibe check on the one example that made it into the slide.

Rebuild two: failure is a primary design surface, not an edge case

Deterministic design treats the error state as a corner you handle after the happy path. AI inverts this, because the model will be confidently wrong some fraction of the time and you cannot prompt that to zero. The unhappy path is not a corner. It is a main road with heavy traffic.

This is why two of the most useful references are organized around it. Google's People + AI Guidebook devotes a full chapter to errors and graceful failure, and Microsoft's 18 Guidelines for Human-AI Interaction, drawn from two decades of research, are heavy on scoping the service when uncertain, supporting correction, and making consequences clear. Fold this into the EDT Keys directly. Your Sponsor Users' most valuable output is not "I like it," it is a catalogue of where the system was wrong and what that cost them. Your Playbacks should play back failures, not just the demo reel. A Playback that only shows the happy path is theater.

Rebuild three: the Loop never closes

In deterministic software, test is a phase. You pass, you ship, you trust it. With AI the test never finishes, because a model upgrade, a prompt change, or quiet data drift can degrade a live feature without anyone touching the code. The observe-reflect-make Loop has to keep running after launch, instrumented with evals and monitoring in production, not just in the design studio. If your design process ends at ship, you have built something that will silently rot and you will not see it.

Keep the Hill, and sharpen it for AI

The Hill survives all of this, and it gets one upgrade. IBM's format is Who, What, and Wow: who the user is, what they can now do, and the emotional payoff. For AI, add two clauses that the deterministic version never needed. At what reliability is the outcome good enough to ship, and what happens when the system is wrong. A Hill that says "a support agent resolves a billing dispute in one touch" is incomplete for AI. The AI version is "a support agent resolves a billing dispute in one touch, the system is right often enough that agents trust it, and when it is wrong the agent catches it before the customer does." That last clause is where most enterprise AI products are actually won or lost.

The takeaway

Enterprise Design Thinking is necessary and not sufficient for AI. Keep the machinery that aligns teams on human outcomes, because outcome framing is the best defense against the technology-first chasing that kills most pilots. Then rebuild the three pieces that assume deterministic material: stop treating a demo as proof and validate behavior with evals, promote failure from edge case to primary design surface, and keep the Loop running in production because the test never ends.

If you do one thing this week, take your current AI initiative and write its Hill with the two new clauses attached: the reliability bar, and the wrong-answer path. If your team cannot answer either, you do not yet have a design. You have a demo waiting to disappoint someone in production.

Frequently asked questions

What is Enterprise Design Thinking for AI?
It is applying IBM's Enterprise Design Thinking, the Loop of observe, reflect, and make plus the Keys of Hills, Playbacks, and Sponsor Users, to building AI products. The framework's outcome focus fits AI well, but its prototyping and testing steps assume deterministic software and need to be adapted for probabilistic systems.
Does design thinking still work for AI products?
Yes, and it matters more, because most enterprise AI fails on integration, workflow fit, and adoption rather than model quality, and those are design problems. The catch is that classic design thinking assumes you can fully design behavior before users touch it. With AI you discover behavior empirically, so prototyping, failure handling, and testing all have to change.
What breaks when you apply design thinking to AI?
Three things. Prototyping stops being proof, because an AI demo that works once does not show the real distribution of outputs. Failure stops being an edge case and becomes a primary design surface. And the test phase never ends, because models and data drift, so evaluation has to run continuously in production.
What resources help design AI products?
Google's People + AI Guidebook from the PAIR team and Microsoft's 18 Guidelines for Human-AI Interaction are the two most practical references. Pair either with IBM's Enterprise Design Thinking for team alignment and with eval-driven development for measuring whether AI behavior is actually good.

Read next

Get new posts by email

No spam. Just the occasional note when I publish something worth your time.