What I’m Learning About MLOps
Early notes on the engineering discipline needed to move machine learning systems from experiments to dependable products.
MLOps is teaching me that machine learning systems are not finished when the model works once. The real challenge begins when the model has to keep working as data shifts, product needs evolve, and users interact with the system in unexpected ways.
Traditional software delivery gives us useful instincts: version control, automated testing, deployment pipelines, observability, rollback plans, and clear ownership. MLOps extends those instincts into data, features, models, prompts, evaluations, and monitoring.
The model is one part of the system
It is tempting to focus on model selection because models are visible and exciting. But the surrounding system often decides whether the product succeeds. Data quality, labeling strategy, retrieval design, evaluation datasets, and user feedback loops can matter as much as the model itself.
A good model in a weak system becomes unreliable. A modest model in a well-designed system can create real value.
Evaluation needs intent
One lesson I keep returning to is that evaluation must be tied to the job the product performs. Generic quality scores are not enough. Teams need examples, acceptance criteria, failure categories, and a way to inspect behavior over time.
This is especially important for generative AI products, where outputs can look fluent while still being incomplete, wrong, or misaligned with the user’s intent.
The leadership angle
MLOps is not only an engineering concern. Leaders need to understand the lifecycle well enough to ask better questions: how do we know this is working, what happens when it fails, who owns drift, and how will the team learn from production behavior?
That is the bridge I am trying to build in my own learning.