Traditional release notes were designed for deterministic software. 'Added feature X. Fixed bug Y.' AI systems change behavior in ways that don't fit cleanly — a prompt tuning, a model swap, a retrieval change can alter behavior users notice without any visible 'feature' change. This post is the changelog pattern we've evolved for AI products.

What to include

Every release: behavior changes, new features, deprecations. Significant events: model swaps, regressions, rollbacks. Generally skip: internal prompt tweaks, minor eval tuning.

Three tiers of communication

Tier 1 — Routine changelog. Standard release cadence. For AI: behavior changes users will notice (tone, formatting, refusal patterns), new features with example prompts, deprecations with long notice. User-facing summaries of what they'll experience differently.

Tier 2 — Significant events, published promptly. Model version switches: 'We're upgrading from GPT-4 to GPT-4o on date X. Expected impact: faster responses, slightly different writing style.' Quality regressions: 'We detected lower quality on [category] since [date]; fixing by Y.' Rollbacks. Data handling or pricing changes.

Tier 3 — Generally don't publish. Internal prompt tweaks that don't change user-observable behavior. Minor eval tuning. Ops-side routing changes. Infrastructure swaps.

Rule of thumb: if a user might see a change and be confused, publish. If no user would notice, skip.

Patterns for specific events

Model upgrades. Give 30+ days notice when possible. Describe expected behavior changes. Provide migration window where users can pin to the old version if they depend on specific behavior. Document the change clearly after migration.

Regressions. Acknowledge quickly. Be specific. Don't over-apologize but don't minimize. Set expectations for resolution. Follow through with resolution notification. If significant, publish a post-mortem.

New features that change existing behavior. Example: adding eval-based rejection for certain queries. Changes the 'shape' of what AI will answer. Announce in advance and at rollout. Explain why.

Deprecations. Longest notice for biggest changes. 90+ days for capability removals. Offer migration paths. If possible, keep old behavior reachable via an explicit flag during transition.

Format that works

Date. What changed. Who's affected. What to do (if anything). Link to more detail for technical users. Written for the user of the product, not for the internal team.

Example: 'Mar 15 — Response style update. We've tuned the summary feature to produce shorter bullet-point outputs by default. Affected: all users of the summary feature. Action: if you prefer longer paragraphs, use the "detailed" option in settings. Technical notes: we adjusted the system prompt after evaluating feedback from Jan-Feb.'

Internal vs external changelogs

Internal changelog captures everything — including the tier-3 items. Good for incident investigation ('what changed last Tuesday?') and for onboarding new engineers. External changelog is filtered to what users should know.

Separating these lets the internal log be complete and the external log be relevant. Don't try to combine — you either overwhelm users with technical minutiae or miss changes in the internal record.

AI changelogs: communicating behavior changes to users

Three tiers of communication

Patterns for specific events

Format that works

Internal vs external changelogs

Continue the thread.

The AI-ops runbook: what to do when things break at 3am

Building trust signals for AI products

AI incident response playbook

Want to talk about this?