The AI implementation roadmap — from "we should use AI" to working software

Most AI strategy decks die in production. The team gets excited, picks five use cases, hires a consultant, gets a Notion page full of recommendations, and twelve months later still doesn't have anything live.

We've been on the engineering side of this enough times to recognize the pattern. The roadmap below is what we actually use when clients ask us where to start. It's not the McKinsey version. It's the one that ships software.

The honest starting question

Before any roadmap discussion: are you trying to use AI or are you trying to solve a specific problem?

These sound similar; they're not.

Trying to use AI: You want AI in your business because the competition has AI, the board is asking about AI, or AI feels strategically important. Use cases are second.
Trying to solve a problem: You have a specific operational pain — support overflow, lead qualification, contract review, document extraction — and you suspect AI could help. The use case is first.

The second framing produces working software. The first produces strategy decks.

Most successful AI implementations we've seen start with one specific problem. Sometimes the problem grows into a bigger AI footprint over time; sometimes it stays focused. Either way, start with one real problem you'd be embarrassed to still have in 6 months.

The 90-day roadmap (the version that works)

Days 1–14: Discovery

The job here is picking the right first use case.

What to do:

Workshop your team. Where is the most repetitive, judgment-light work happening? Support tickets? Lead enrichment? Contract review? Sales drafting?
Estimate the volume. How many of these happen per week? Per month? Per year?
Estimate the current cost. Hours × people × hourly rate. This is your savings target.
Identify the data. Is the data the AI would need accessible? In what shape?
Identify the constraint. Is this safety-critical? Regulated? Customer-facing? Internal?
Rank candidates. Score each on: volume, cost, data accessibility, constraint difficulty.

Pick the top candidate that is high volume, high cost, accessible data, low constraint difficulty. That's your first use case. Resist the temptation to pick something more exciting but lower-quality on those four dimensions.

Output of discovery: a one-page written brief naming the use case, expected impact, data inventory, success criteria.

Days 15–28: Architecture

Translate the brief into a system design.

What to decide:

Build, buy, or integrate? If there's a SaaS tool that already does this well, use it. If there isn't, build. If you have a SaaS tool that almost does it, extend.
Model choice. Claude, GPT, open models. Cost, latency, quality, compliance — pick based on the task, not the brand.
Architecture shape. Workflow automation (deterministic with AI decision points) vs. agent (autonomous with tools). Most first use cases are workflow automation. Don't reach for an agent unless the task genuinely requires it.
Integration points. Which existing systems does it read from? Write to? Through what auth pattern?
Eval criteria. How will you measure quality? You need a real test set with real expected outputs.
Failure mode plan. When the AI gets it wrong, what happens? Who reviews? How do edge cases escalate?

Output: a one-page architecture document and an eval set.

This step is where most projects go wrong because it's skipped. The team goes straight from "we should do this" to "we're building" without writing down the design. Then six weeks in, a question comes up that the design would have answered.

Days 29–56: Build the MVP

Build the actual system. Real code, real data, real users (small set).

What to do:

Build against the real data. Mocked data hides the failures that matter.
Wire up evals first. Before optimizing prompts, set up the eval harness. Otherwise you're guessing.
Ship slim. First version handles the happy path. Edge cases get escalated to humans.
Get real users in early. Internal team, week 2 of build. Their feedback drives priorities.
Track every model call. Observability from day one. LangSmith, Helicone, or your own logs.

Output: a working system with eval suite, used by a small set of real users.

Days 57–84: Harden

Take the MVP and prepare it for broader rollout.

What to do:

Run the eval suite on real production traffic. Where does it fail? Why?
Tune. Prompt engineering, model selection, retrieval improvements, schema tightening.
Build the failure mode. Human-in-the-loop for low-confidence outputs. Escalation paths. Override mechanisms.
Cost controls. Budgets, alerts, rate limits per user.
Documentation. What does the system do, what doesn't it do, how do people use it.
Training. Workshop with the broader team that'll use it.

Output: a production-ready system, documented, used by the broader team.

Days 85–90: Roll out and decide what's next

Final week is rollout to the full user base and a retrospective.

What to do:

Roll out gradually. New users opt in. Watch usage and quality metrics.
Capture the wins. Hours saved? Tickets handled? Drafts generated? Make it visible.
Honest review. What worked? What didn't? What surprised the team?
Pick use case #2. With this experience, the next discovery cycle is faster and the criteria sharper.

Output: production rollout complete. Use case #2 in discovery.

What changes after the first use case

The first AI implementation teaches the team how this works. After it ships, three things change:

Future discoveries are faster. Your team now has intuition for what AI can and can't do.
Shared infrastructure pays back. Eval harness, observability, MCP servers — built once, reused for every future use case.
The architecture decision changes. You go from "should we build this?" to "should we build it the same way we built the last one?" — usually yes, with deltas.

A common pattern: the first use case is a workflow automation; the second is similar; by the third or fourth you're building a small platform of related automations with shared infrastructure. That's the right shape for most companies' AI footprint — not a single mega-agent, but a portfolio of focused automations on shared rails.

The four mistakes we see most

Skipping discovery

Teams that jump from "AI" to "build" without picking a specific high-quality use case end up with something half-finished that nobody uses. Discovery is two weeks; it saves you four months of wandering.

Picking too ambitious a first use case

"Replace our entire support team with AI" is a terrible first project. "Triage incoming tickets and draft first-touch responses for the 40% we can confidently handle" is a great first project. Pick boring and high-volume over exciting and complex.

No evals

Without an eval suite, you have no idea if your changes are improving or regressing quality. You're guessing. We'd rather a project ship six weeks later with evals than three months later without them — because the without-evals version regresses quietly and gets pulled.

Treating AI like deterministic software

It isn't. Edge cases are guaranteed. Failure modes need explicit handling. Human-in-the-loop is part of the system, not a fallback. Design for it from day one.

How to use this roadmap

If you're inside a company and trying to figure out where to start: walk through Days 1–14 with your team this month. The output is a written brief that surfaces whether you have a real opportunity. If you don't, you saved 90 days of wasted work.

If you have a clear use case and just need it built: skip to Days 15+ and we (or another senior team) can take it from there.

How we work with this

Our AI Implementation service follows exactly this structure — Discovery → Architecture → MVP → Harden → Roll out. The two-week Discovery engagement (fixed $8k) is a common starting point for clients who aren't sure where to begin.

If you already have the use case and need someone to ship it, our AI Agent Development and Custom Software Development services pick up from Architecture and run to production.

And if you want a deeper dive on the build side, the agentic coding guide covers the AI-augmented development practices we use to compress the build phase.

The honest starting question

The 90-day roadmap (the version that works)

Days 1–14: Discovery

Days 15–28: Architecture

Days 29–56: Build the MVP

Days 57–84: Harden

Days 85–90: Roll out and decide what's next

What changes after the first use case

The four mistakes we see most

Skipping discovery

Picking too ambitious a first use case

No evals

Treating AI like deterministic software

How to use this roadmap

How we work with this

The latest in AI, in your inbox.