Every AI project has the same skeleton: discover, pilot, build, launch, operate. The differences between projects that deliver value and projects that stall are in how the phases are scoped, what the handoffs look like, and which pitfalls get avoided at each step. This post is the phase-by-phase map we use to structure engagements and to run project reviews.
Phase 1: Discover (2-4 weeks)
Audit the problem, the data, the team, and the success criteria. The deliverable is a short document: what problem are we solving, how will we measure success, what data and capabilities exist today, what gaps need to close before Pilot can start.
Pitfalls: skipping this phase to 'save time'. Every week saved in Discover costs three weeks in Build when data turns out to be inaccessible or the problem was misunderstood. Over-spending in Discover — more than 4 weeks on most projects is diminishing returns. The Discover output should be crisp, not comprehensive.
Phase 2: Pilot (4-8 weeks)
Prove feasibility at small scale. End-to-end flow working for a narrow slice of the problem, against real data, with initial eval numbers. Not production quality; enough to show the approach can work.
Deliverables: working prototype, initial eval suite, cost and latency estimates, updated timeline and cost for full build. The Pilot de-risks the Build phase. A failed Pilot is valuable — it tells you to stop before spending Build budget.
Pitfalls: scoping the Pilot so broadly that it's indistinguishable from a partial Build. Scoping so narrowly that it doesn't prove anything. Accepting Pilot-quality performance as 'done' and deploying — Pilot is not production.
Phase 3: Build (8-16 weeks)
Turn the Pilot into a production-ready system. Real infrastructure, real guardrails, real observability, real eval discipline. The bulk of engineering time and cost lives here.
Sub-phases: (a) core engineering — extending the Pilot to handle edge cases, (b) guardrails and safety — see our guardrails post, (c) eval infrastructure — the production-grade version of the Pilot evals, (d) operational tooling — monitoring, alerts, rollback, runbook.
Pitfalls: declaring done at 'works in demo' without guardrails, evals, or ops tooling. Adding scope beyond what the Pilot validated. Skipping the operational tooling because 'we'll add it after launch' (you won't).
Phase 4: Launch (2-4 weeks)
The phase most teams underinvest in. Launch isn't flipping a switch; it's canary rollout, user training, support readiness, feedback loops, and the first round of production-driven tuning.
Activities: canary to 1-5% of traffic for a week to catch production-specific issues. Ramp to 25%, then 100% over 2-3 weeks. Training sessions for affected users. Support team readiness (what do they do when a user complains about AI output). Feedback capture mechanism live from day 1.
Pitfalls: 100% launch from day 1. Assuming training isn't needed. Ignoring feedback in the first weeks — that data is the most valuable you'll ever collect about your system.
Phase 5: Operate (ongoing)
Production AI systems need continuous attention: model updates from providers, prompt regressions, eval drift, cost trends, new use cases from users. The team structure supporting Operate is different from Build — more on-call discipline, more review cycles, fewer feature-building cycles.
A healthy Operate phase includes: weekly eval review, monthly cost review, quarterly roadmap update, ongoing incident handling via the runbook, and systematic capture of new cases into the eval set.
Pitfalls: abandoning the system to 'just run'. Losing the team context as people rotate off. Letting the eval dataset atrophy. Not having a process for adding scope incrementally — new features emerge from Operate and need a lightweight path to production.
Team shape across phases
Discover: a senior technologist plus a domain expert. 2-3 people for 2-4 weeks. Pilot: 2-3 engineers plus the domain expert. Build: 3-6 engineers plus product, design, and domain as needed. Launch: Build team plus customer-success and support engagement. Operate: 1-3 engineers on rotation depending on system complexity.
The same people shouldn't be on all phases if the project lasts more than 4-5 months. Phase-appropriate expertise matters — the engineer who loves Discover often isn't the engineer who loves Operate.