Data & AI 90 days 2-3 hours/day updated 2026-06-01

MLOps 90-Day Learning Path

Build production-grade MLOps skills in 90 days: MLflow experiment tracking, feature stores, model registries, serving pipelines, drift detection, and retraining automation. Go from notebook to live model.

What MLOps means

MLOps applies DevOps engineering discipline to the machine learning lifecycle — from data preparation and experiment tracking through model packaging, serving, monitoring, and retraining. It addresses the unique challenges of ML systems: data drift, model decay, reproducibility, and the dual ownership between data scientists and platform engineers. A mature MLOps practice ensures models in production are reliable, observable, and continuously improved.

Who should follow this path

Data scientists who want to own their models in production
ML engineers building model serving infrastructure
Platform engineers supporting ML workloads on Kubernetes
DevOps engineers adding ML pipeline expertise
Software engineers at ML-driven product companies

Prerequisites

Python proficiency and familiarity with scikit-learn or PyTorch
Basic understanding of ML model training concepts
Docker and containerization experience
Familiarity with CI/CD pipelines
Basic cloud platform experience (AWS, GCP, or Azure)

The 90-day plan

Daily study recommendation: 2-3 hours/day, six days a week. Consistency beats intensity — block the time in your calendar like a meeting.

Days 1–15: Foundation

MLOps maturity levels and lifecycle stages
ML pipeline components: data, features, training, evaluation, serving
Reproducibility challenges in ML
ML project structure and cookiecutter templates
Version control for data, code, and models

Outcome: Structure an ML project for reproducibility with version-controlled data, code, and model artifacts.

Days 16–30: Core concepts

Experiment tracking with MLflow
Hyperparameter tuning with Optuna or Ray Tune
Model registry: versioning and stage transitions
Dataset versioning with DVC
Comparing and selecting experiments systematically

Outcome: Track all experiments in MLflow, register the best model, and promote it through staging to production.

Days 31–45: Tools and workflows

Feature stores with Feast or Tecton
Feature engineering pipelines at scale
Training pipelines with Kubeflow Pipelines or ZenML
Containerizing ML training jobs
Distributed training concepts with Horovod or Ray

Outcome: Build a feature store-backed training pipeline deployed on Kubernetes with Kubeflow.

Days 46–60: Hands-on projects

Model serving with FastAPI, TorchServe, and BentoML
Kubernetes-native serving with KServe (formerly KFServing)
A/B testing and canary deployments for models
Shadow mode deployment patterns
Low-latency inference optimization (quantization, ONNX)

Outcome: Deploy a model as a production API with canary rollout and latency benchmarking.

Days 61–75: Advanced practices

Data drift detection with Evidently AI
Model performance monitoring and alerting
Concept drift vs data drift detection methods
Automated retraining triggers and pipelines
Model explainability with SHAP and LIME

Outcome: Implement drift detection, automated alerting, and triggered retraining for a production model.

Days 76–90: Portfolio, interview & certification prep

MLOps portfolio: end-to-end model pipeline project
Preparing for Databricks ML Professional certification
MLOps interview questions and system design
ML platform metrics: training cost, inference latency, model accuracy SLAs
Emerging topics: LLMOps, foundation model fine-tuning pipelines

Outcome: Present a complete MLOps pipeline project and be ready for ML engineer and MLOps engineer interviews.

Weekly outcomes at a glance

Phase	Outcome
Days 1–15	Structure an ML project for reproducibility with version-controlled data, code, and model artifacts.
Days 16–30	Track all experiments in MLflow, register the best model, and promote it through staging to production.
Days 31–45	Build a feature store-backed training pipeline deployed on Kubernetes with Kubeflow.
Days 46–60	Deploy a model as a production API with canary rollout and latency benchmarking.
Days 61–75	Implement drift detection, automated alerting, and triggered retraining for a production model.
Days 76–90	Present a complete MLOps pipeline project and be ready for ML engineer and MLOps engineer interviews.

Tools to learn

MLflow
Kubeflow Pipelines
DVC
Feast
KServe
BentoML
Evidently AI
ZenML
Ray
Optuna
FastAPI
Seldon Core

Labs to practice

Mini projects

Build an end-to-end MLflow + DVC + KServe pipeline from experiment tracking to production serving with drift monitoring
Implement a Kubeflow Pipelines workflow for automated model retraining triggered by Evidently drift alerts
Deploy a model A/B test on Kubernetes using KServe traffic splitting with latency and accuracy monitoring

Interview questions to prepare

What is the difference between a model registry and a feature store?
How do you detect and respond to data drift in a production model?
Explain the concept of shadow mode deployment for ML models.
How would you design a retraining pipeline that automatically triggers on performance degradation?
What is the difference between data drift and concept drift?
How do you ensure reproducibility of ML experiments across different environments?
Describe the architecture of a production-grade feature store.
What metrics would you track for an ML model in production?

Certification suggestions

Databricks Certified Machine Learning Professional — Databricks
AWS Certified Machine Learning Specialty — AWS
Google Professional Machine Learning Engineer — Google Cloud
Kubeflow Fundamentals — Linux Foundation

Browse the full certification registry for exam details and official links.

Free resources

// instructor-led option

Prefer live, guided training with mentors and certification support? DevOpsSchool.com runs paid instructor-led programs that pair well with this free path.

Explore paid training on DevOpsSchool.com ↗