Prioritizing AI features requires factors that don't matter for traditional SaaS: per-use cost, quality ceiling from model capability, regulatory and liability risk, eval coverage. This post is the specific prioritization framework we use and how to rank AI features against each other and against traditional SaaS features on a shared roadmap.
Cost per use
Input tokens × price. Output tokens × price. Estimated volume. Calculate expected cost per month, per user, per feature. Surprisingly expensive features get deprioritized even when valuable.
Ratio of cost to value. $0.10 cost per use generating $1 of user value is viable. $1 cost generating $1.10 value is not. Margin matters.
Cost trajectories. Is the feature's cost likely to drop as models get cheaper? Long-arc features may be viable at future prices but not current. Roadmap timing matters. See cost modeling post.
Quality ceiling
What accuracy can current models achieve on this task? If the ceiling is 70% and your UX requires 95%, ship is deferred.
UX error tolerance. How bad is a wrong answer? A wrong coding autocomplete is easily dismissed; a wrong medical diagnosis is catastrophic. Error tolerance varies by feature.
User-perceivable quality gap. Sometimes users don't notice 5% errors; sometimes they notice 1%. Test with users before committing.
Capability trajectory. Is this task improving rapidly in models? Features near the frontier of current capability but improving quickly may be worth early investment.
Regulatory and legal risk
Legal exposure per error. Low-stakes errors: minor. High-stakes errors (medical, financial, legal advice): potentially huge. Factor into go/no-go.
Explainability requirements. Some jurisdictions and sectors require explainable AI decisions. Cost and complexity of explainability matter.
Compliance overhead. Documentation, monitoring, audit capability all add overhead. Some features have 3x overhead vs base development cost.
Data handling. PII, regulated data (healthcare, financial) creates constraints. Some features require on-prem or private deployment.
Value — same as always
Traditional prioritization inputs still matter. User demand, competitive differentiation, strategic value, revenue impact. AI features compete on these against each other and against non-AI features.
User testing matters even more. AI features are harder to spec on paper; user reaction to prototypes drives prioritization.
Adjacent value. AI features often enable further features downstream. Earlier foundational AI work unlocks later capabilities. See roadmap post.
Combining the factors
Score on each axis. Simple 1-5 for cost friendliness, quality ceiling clearance, low regulatory risk, high value. Multiply or weighted sum.
Tiebreak on trajectory. When features are close, one with improving inputs (cost dropping, quality rising) beats one at plateau.
Portfolio view. Ship some low-risk high-quality features now; start on some high-quality high-reward bets; defer marginal ones.
Common prioritization errors
Ignoring cost. PM ships a feature that looks compelling in demo but destroys unit economics at scale. Ran into it at several teams I've worked with.
Underestimating regulatory. Healthcare or finance features treated as normal SaaS features; compliance effort surfaces late. See healthcare AI compliance post.
Assuming quality will improve. Some tasks are at frontier models' best; shipping requires accepting current ceiling.
Not investing in evals early. Without evals, there's no baseline for 'quality enough.' Feature ships; quality regresses silently.
Process recommendations
Quarterly AI roadmap review. Revisit all in-flight features with current cost, quality, capability data. Kill what doesn't work; accelerate what does.
Executive visibility into cost and quality. Dashboards that show AI feature economics in near real-time. Builds trust with leadership.
Partnership with legal and compliance early. Not at launch; at prioritization. Risks surface earlier; features that can't clear risk get deprioritized before investment.
Framework summary
Score: cost, quality ceiling, regulatory risk, value. Weight by context (company stage, product bet, market). Revisit quarterly. Communicate scoring to stakeholders.
What makes this different from traditional RICE: cost per use and quality ceiling are AI-specific. Apply them explicitly; don't let AI features get treated as standard features.