Continual learning — updating a model continuously as new data arrives — is a topic where production practice diverges significantly from academic research. The patterns that ship in production are less ambitious than research papers suggest but actually work. This post is the pragmatic view: what production teams actually do to keep AI systems current, what doesn't work, and why 'just periodically retrain' beats clever continual learning schemes.

What actually works

Periodic retraining wins in most production systems. Continual online learning is research territory. RAG updates, adapter swapping, and scheduled batches cover most real needs.

Why continual learning is hard

Catastrophic forgetting. Updating weights on new data can overwrite learning on old data. A model fine-tuned on yesterday's tickets might perform worse on last month's patterns.

Data distribution shift. User behavior changes; product changes; external context changes. What the model learned last month may not apply this month.

Eval stability. Continuously-updated models break eval stability — you can't compare week-to-week scores meaningfully if the model has changed.

Rollback complexity. When a continual learning update causes regression, what do you roll back to? If updates happen hourly, the state space is enormous.

What actually works in production

Periodic retraining. Retrain the adapter or fine-tune weekly or monthly. Collect data for the period; train on accumulated data; evaluate; deploy if quality improves. Boring but reliable. 90% of production AI systems use this pattern.

RAG updates. Most 'continual learning' needs are actually retrieval needs. Update the RAG index as new documents arrive; the model sees new information through retrieval without any weight updates. See RAG patterns post.

Adapter swapping. When new task distributions emerge, train a new LoRA adapter rather than updating existing weights. Keep adapters isolated; combine at inference if needed. See LoRA post.

Prompt adjustments. Many drift problems can be addressed by prompt changes rather than model updates. Cheaper, faster, easier to roll back. See prompt version control post.

What doesn't work (yet) in production

Online gradient updates. Updating weights on every production interaction sounds elegant but breaks everything in practice: reproducibility, evals, rollback, cost.

Replay buffers. Storing old examples and replaying them to prevent forgetting. Works in research; operationally complex in production. Most teams don't implement it and get by with periodic full retraining instead.

Elastic weight consolidation (EWC) and similar regularization methods. Research-active; limited production adoption. The operational overhead exceeds the benefit for most use cases.

Drift detection

Continual learning presumes you detect when to update. Standard drift detection: monitor eval scores on a stable test set weekly; alert when scores drop. See drift detection post for specific patterns.

Input distribution monitoring. Statistical tests on input token distributions, query categories, user patterns. Significant shifts trigger eval re-run.

Output quality monitoring. User feedback signals (thumbs-up/down), escalation rates, regeneration rates. Leading indicators of quality drift.

Retraining cadence

High-drift systems (user behavior changes rapidly, product evolves): weekly.

Moderate drift (content changes, some user pattern evolution): monthly. Most production systems.

Stable systems (enterprise workflows with stable patterns): quarterly or only when triggered by significant change.

More frequent isn't automatically better. Each retraining is risk of regression. Cadence should match actual drift rate.

Operationalizing

Pipeline: data collection → cleaning → training → eval → deploy (with shadow or canary). See shadow post and canary post.

Automate the boring parts (data collection, eval runs, deploy) but keep the deploy decision human. Continual learning that auto-deploys without human review is how regressions reach users.

Version everything. Dataset version used for training, model version produced, eval score achieved. Lineage enables rollback. See dataset versioning post.

Research vs practice gap

Continual learning research publishes impressive results on curated benchmarks. Production teams almost all use periodic retraining because the operational simplicity dominates the quality delta. Don't feel like your approach is 'behind' if you're doing monthly retraining rather than online learning — you're doing what works.

Continual learning: keeping production models current

Why continual learning is hard

What actually works in production

What doesn't work (yet) in production

Drift detection

Retraining cadence

Operationalizing

Research vs practice gap

Continue the thread.

When to fine-tune (and when RAG is fine)

AI drift detection: catching silent model changes

Dataset versioning for AI: DVC, LakeFS, and the patterns that scale

Want to talk about this?