eazyware
Playbook·July 7, 2025·11 min read

The anatomy of an AI project: phases, deliverables, pitfalls

Every AI project has the same skeleton. Knowing the bones helps you see where yours will break.

KR
Kushal R.
Engineering lead

Every AI project has the same skeleton: discover, pilot, build, launch, operate. The differences between projects that deliver value and projects that stall are in how the phases are scoped, what the handoffs look like, and which pitfalls get avoided at each step. This post is the phase-by-phase map we use to structure engagements and to run project reviews.

Five phases
AI project anatomy 1. Discover audit · scope data check 2-4 weeks 2. Pilot prove feasibility evals baseline 4-8 weeks 3. Build production quality infra, guards 8-16 weeks 4. Launch canary · ramp training 2-4 weeks 5. Operate monitor · tune extend ongoing Pitfalls by phase 1. Skipping data audit — shows up as 6-month delay later 2. Over-scoping pilot — rebuilt anyway; pilot should cover minimum valuable slice 3. Declaring "done" at build — skipping evals, monitoring, rollback paths 4-5. Launch without training + operate plan — adoption stalls, system drifts
Discover (2-4w) → Pilot (4-8w) → Build (8-16w) → Launch (2-4w) → Operate (ongoing). Phase-specific pitfalls at every step; most failures trace to skipping or underscoping one.

Phase 1: Discover (2-4 weeks)

Audit the problem, the data, the team, and the success criteria. The deliverable is a short document: what problem are we solving, how will we measure success, what data and capabilities exist today, what gaps need to close before Pilot can start.

Pitfalls: skipping this phase to 'save time'. Every week saved in Discover costs three weeks in Build when data turns out to be inaccessible or the problem was misunderstood. Over-spending in Discover — more than 4 weeks on most projects is diminishing returns. The Discover output should be crisp, not comprehensive.

Phase 2: Pilot (4-8 weeks)

Prove feasibility at small scale. End-to-end flow working for a narrow slice of the problem, against real data, with initial eval numbers. Not production quality; enough to show the approach can work.

Deliverables: working prototype, initial eval suite, cost and latency estimates, updated timeline and cost for full build. The Pilot de-risks the Build phase. A failed Pilot is valuable — it tells you to stop before spending Build budget.

Pitfalls: scoping the Pilot so broadly that it's indistinguishable from a partial Build. Scoping so narrowly that it doesn't prove anything. Accepting Pilot-quality performance as 'done' and deploying — Pilot is not production.

Phase 3: Build (8-16 weeks)

Turn the Pilot into a production-ready system. Real infrastructure, real guardrails, real observability, real eval discipline. The bulk of engineering time and cost lives here.

Sub-phases: (a) core engineering — extending the Pilot to handle edge cases, (b) guardrails and safety — see our guardrails post, (c) eval infrastructure — the production-grade version of the Pilot evals, (d) operational tooling — monitoring, alerts, rollback, runbook.

Pitfalls: declaring done at 'works in demo' without guardrails, evals, or ops tooling. Adding scope beyond what the Pilot validated. Skipping the operational tooling because 'we'll add it after launch' (you won't).

Phase 4: Launch (2-4 weeks)

The phase most teams underinvest in. Launch isn't flipping a switch; it's canary rollout, user training, support readiness, feedback loops, and the first round of production-driven tuning.

Activities: canary to 1-5% of traffic for a week to catch production-specific issues. Ramp to 25%, then 100% over 2-3 weeks. Training sessions for affected users. Support team readiness (what do they do when a user complains about AI output). Feedback capture mechanism live from day 1.

Pitfalls: 100% launch from day 1. Assuming training isn't needed. Ignoring feedback in the first weeks — that data is the most valuable you'll ever collect about your system.

Phase 5: Operate (ongoing)

Production AI systems need continuous attention: model updates from providers, prompt regressions, eval drift, cost trends, new use cases from users. The team structure supporting Operate is different from Build — more on-call discipline, more review cycles, fewer feature-building cycles.

A healthy Operate phase includes: weekly eval review, monthly cost review, quarterly roadmap update, ongoing incident handling via the runbook, and systematic capture of new cases into the eval set.

Pitfalls: abandoning the system to 'just run'. Losing the team context as people rotate off. Letting the eval dataset atrophy. Not having a process for adding scope incrementally — new features emerge from Operate and need a lightweight path to production.

Team shape across phases

Discover: a senior technologist plus a domain expert. 2-3 people for 2-4 weeks. Pilot: 2-3 engineers plus the domain expert. Build: 3-6 engineers plus product, design, and domain as needed. Launch: Build team plus customer-success and support engagement. Operate: 1-3 engineers on rotation depending on system complexity.

The same people shouldn't be on all phases if the project lasts more than 4-5 months. Phase-appropriate expertise matters — the engineer who loves Discover often isn't the engineer who loves Operate.

Read next
How we structure AI engagements (and why)
Read next
The AI readiness audit: 10 questions before you write a single prompt
Read next
Build vs buy: when custom AI beats off-the-shelf
Tags
project managementdeliveryphases
/ Next step

Want to talk about this?

We love debating this stuff. 30-minute call, no pitch, just engineering conversation.

~4h
avg response
Q2 '26
next slot
100%
NDA on request