eazyware
Ops·December 30, 2024·9 min read

AI changelogs: communicating behavior changes to users

Traditional release notes miss what matters for AI. What to publish, what to hide, and how to handle regressions.

KR
Kushal R.
Engineering lead

Traditional release notes were designed for deterministic software. 'Added feature X. Fixed bug Y.' AI systems change behavior in ways that don't fit cleanly — a prompt tuning, a model swap, a retrieval change can alter behavior users notice without any visible 'feature' change. This post is the changelog pattern we've evolved for AI products.

What to include
AI changelog — what to include Every release — publish · Behavior changes users will notice (tone, format, refusal patterns) · New features and capabilities with example prompts · Deprecations and removed capabilities — long notice Significant events — publish promptly · Model version switches · quality regressions · rollbacks · Data handling changes · pricing changes · SLA adjustments Generally don't publish · Internal prompt tweaks that don't change user-observable behavior · Minor eval tuning · ops-side routing changes · infra swaps
Every release: behavior changes, new features, deprecations. Significant events: model swaps, regressions, rollbacks. Generally skip: internal prompt tweaks, minor eval tuning.

Three tiers of communication

Tier 1 — Routine changelog. Standard release cadence. For AI: behavior changes users will notice (tone, formatting, refusal patterns), new features with example prompts, deprecations with long notice. User-facing summaries of what they'll experience differently.

Tier 2 — Significant events, published promptly. Model version switches: 'We're upgrading from GPT-4 to GPT-4o on date X. Expected impact: faster responses, slightly different writing style.' Quality regressions: 'We detected lower quality on [category] since [date]; fixing by Y.' Rollbacks. Data handling or pricing changes.

Tier 3 — Generally don't publish. Internal prompt tweaks that don't change user-observable behavior. Minor eval tuning. Ops-side routing changes. Infrastructure swaps.

Rule of thumb: if a user might see a change and be confused, publish. If no user would notice, skip.

Patterns for specific events

Model upgrades. Give 30+ days notice when possible. Describe expected behavior changes. Provide migration window where users can pin to the old version if they depend on specific behavior. Document the change clearly after migration.

Regressions. Acknowledge quickly. Be specific. Don't over-apologize but don't minimize. Set expectations for resolution. Follow through with resolution notification. If significant, publish a post-mortem.

New features that change existing behavior. Example: adding eval-based rejection for certain queries. Changes the 'shape' of what AI will answer. Announce in advance and at rollout. Explain why.

Deprecations. Longest notice for biggest changes. 90+ days for capability removals. Offer migration paths. If possible, keep old behavior reachable via an explicit flag during transition.

Format that works

Date. What changed. Who's affected. What to do (if anything). Link to more detail for technical users. Written for the user of the product, not for the internal team.

Example: 'Mar 15 — Response style update. We've tuned the summary feature to produce shorter bullet-point outputs by default. Affected: all users of the summary feature. Action: if you prefer longer paragraphs, use the "detailed" option in settings. Technical notes: we adjusted the system prompt after evaluating feedback from Jan-Feb.'

Internal vs external changelogs

Internal changelog captures everything — including the tier-3 items. Good for incident investigation ('what changed last Tuesday?') and for onboarding new engineers. External changelog is filtered to what users should know.

Separating these lets the internal log be complete and the external log be relevant. Don't try to combine — you either overwhelm users with technical minutiae or miss changes in the internal record.

Read next
The AI-ops runbook: what to do when things break at 3am
Read next
Building trust signals for AI products
Read next
AI incident response playbook
Tags
changelogcommunicationreleases
/ Next step

Want to talk about this?

We love debating this stuff. 30-minute call, no pitch, just engineering conversation.

~4h
avg response
Q2 '26
next slot
100%
NDA on request