Why MLOps matters for Machine Learning Engineers
3) Reproduce environments with a container
# Dockerfile
FROM python:3.10-slim
WORKDIR /app
COPY requirements.lock /app/requirements.lock
# requirements.lock contains exact versions + hashes
RUN pip install --no-cache-dir -r requirements.lock
COPY . /app
CMD ["python", "train.py"]Key idea: pin dependencies with a lockfile and build from a known base image to reduce "works on my machine" issues.
4) Automate training with a CI-like pipeline
# Conceptual pipeline
name: Train
on: [push]
jobs:
train:
runs-on: ubuntu-latest
steps:
- uses: checkout
- name: Set up Python
run: |
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.lock
- name: Pull exact data snapshot
run: dvc pull
- name: Train (idempotent)
env:
RUN_ID: ${{ github.sha }}
run: |
. .venv/bin/activate
python train.py --epochs 5 --lr 0.01 --seed 123 --data data/raw/customers.csv
- name: Upload artifacts
run: |
tar -czf run_artifacts.tgz artifacts
echo "Artifacts archived"Key idea: a repeatable, parameterized, and idempotent workflow that ties code and data versions together.
5) Register and promote a model with approvals
# Conceptual flow (registry-agnostic)
# 1) Register a new model version from your run artifacts
$ register-model --name churn_model --run-id <run-id> --path artifacts/model.json
# 2) Attach evaluations and a model card
$ attach-eval --model churn_model --version 7 --metrics "val_auc=0.89,ks=0.41"
$ attach-card --model churn_model --version 7 --file model_card.md
# 3) Request approval to move to Staging
$ request-approval --model churn_model --version 7 --to Staging --owner ml-eng
# 4) Approver promotes after checks
$ promote --model churn_model --version 7 --to Staging --approved-by leadKey idea: treat promotion as a governed change with recorded evaluations and approvals.
Drills and exercises
- Run a training script twice with a fixed seed and confirm identical metrics.
- Create a small dataset snapshot and restore it from version control on a new machine.
- Build a container and run training inside it; compare dependencies with your local env.
- Log one run with complete metadata: params, metrics, artifacts, code commit, and data version.
- Draft a one-page model card: purpose, data used, metrics, risks, and limitations.
- Write a one-paragraph incident rollback plan for a model.
Subskills
- Experiment Tracking Concepts: Log runs consistently and compare them to choose the best candidate.
- Model Registry Concepts: Register versions, manage stages, and promote with approvals.
- Reproducibility And Artifact Management: Pin seeds and environments; store artifacts like models and plots.
- Data And Model Versioning: Tie code commits to specific dataset and model versions.
- Training Automation: Build parameterized, idempotent pipelines for training and evaluation.
- Environment Reproducibility: Use lockfiles or containers to rebuild the same environment anywhere.
- Governance And Approval Flows Basics: Record approvals, sign-offs, and model cards for promotions.
- Handling PII And Compliance Basics: Minimize, mask, or avoid logging sensitive data; document controls.
- Incident Response For ML: Detect issues, rollback quickly, and communicate with a clear runbook.
Common mistakes and debugging tips
- Not tracking data versions: Fix by tying runs to exact dataset hashes or snapshots.
- Unpinned dependencies: Use a lockfile and container base; record both in run metadata.
- Non-idempotent pipelines: Make steps pure (same inputs → same outputs); clear temp state.
- Storing PII in artifacts: Mask, aggregate, or exclude sensitive fields; review logs before upload.
- Skipping approvals: Require a recorded sign-off before any stage promotion.
- No rollback path: Keep the previous production model ready; document the switch-back command.
Debugging tips
- Mismatch results? Compare seed, package versions, and data hashes first.
- Pipeline failures? Re-run each step locally with the same inputs to isolate the failing stage.
- Drift suspected? Plot feature distributions and key metrics across time windows.
- Slow runs? Cache immutable steps (e.g., feature extraction) and parallelize independent tasks.
Practical projects
- Reproducible baseline: Turn an existing notebook into a script with experiment tracking and a lockfile.
- Versioned data pipeline: Ingest a small dataset, create a snapshot, and retrain on the exact version later.
- Model registry demo: Register two model versions, attach metrics, and promote only the better one to Staging.
Mini project: churn prediction pipeline
Build a small, end-to-end pipeline that trains a churn model with tracked experiments, versioned data, reproducible environment, and a registry promotion.
Requirements
- Experiment tracking: log params (lr, epochs, seed), metrics (AUC), and artifacts (model file).
- Versioning: dataset is tracked and restored by version; code commit recorded.
- Environment: build and run training in a container with a lockfile.
- Automation: a script or CI-like config to run the training end-to-end.
- Registry: register the model, attach metrics, and simulate a promotion to Staging with an approval note.
- Governance & PII: confirm no PII is logged; include a model card.
- Incident plan: describe how to rollback to the previous model version.
Acceptance criteria
- Anyone can reproduce your best run with one command and get identical metrics.
- The dataset and code commit used by the best run are discoverable from run metadata.
- A model version exists in the registry with an attached model card and metrics.
- A documented rollback command exists and was tested once.
Next steps
- Add unit tests for feature code and a smoke test for the trained model.
- Introduce automated evaluation gates (e.g., block promotion if AUC drops or fairness metrics regress).
- Add monitoring for input drift and performance decay after deployment.
FAQ
- Do I need a specific tool? No. Concepts apply across many stacks. Start with any tracker, a registry, and containerization.
- What if compute is limited? Use small samples and fast models to practice the pipeline first.
- How do I practice safely with PII? Use synthetic or anonymized data. Avoid storing direct identifiers in artifacts.