Who this is for
MLOps Engineers, Data Scientists, and Platform Engineers who need reliable, reproducible model releases across dev, staging, and production.
Prerequisites
- Basic Git knowledge (commits, tags).
- Familiarity with training pipelines and model artifacts (e.g., pickle, ONNX, TorchScript).
- Understanding of evaluation metrics and datasets (train/val/test splits).
Why this matters
Real tasks you will face:
- Rolling back a bad production model fast and safely.
- Answering “Which code, data, and parameters produced this model?”
- Promoting models from Staging to Production with audit-ready metadata.
- Running A/B or shadow tests with clean lineage and comparison.
Good versioning and metadata make releases predictable, debuggable, and compliant.
Concept explained simply
Think of your model registry like a library:
- Each model version is a new edition of the same book.
- Metadata is the library card: who wrote it, when, which printing press (code), which paper (data), quality score (metrics), and how to find it again.
Mental model
Version = immutable snapshot of artifacts + rich metadata. Stages (e.g., Staging, Production) are movable labels pointing to specific versions.
Core components you need
- Model artifact: file(s) you deploy (e.g., model.onnx, tokenizer.json).
- Versioning scheme: e.g., semantic versioning MAJOR.MINOR.PATCH or auto-incremented numeric versions.
- Metadata fields:
- Identity: model name, version, stage (Dev/Staging/Production), owners.
- Lineage: code commit/commit tag, training pipeline run ID, feature store version.
- Data: dataset snapshot ID or hash, time window, sampling rules.
- Training: hyperparameters, random seeds, hardware, training time.
- Metrics: offline metrics (val/test), thresholds, confidence intervals where applicable.
- Runtime: framework version, Docker image digest, dependency lockfile hash.
- Compliance: PII notes, risk level, approval status, reviewer.
- Notes/Tags: free-form key–value tags, change summary, deprecation status.
- Stages/aliases: movable pointers like “Production → v12”.
- Policies: who can create, promote, deprecate, or delete versions.
Versioning strategies
- Immutable versions: once published, never change the artifacts or metadata that impact reproducibility.
- Semantic Versioning (SemVer):
- MAJOR: incompatible model interface or schema change.
- MINOR: new feature/architecture or retrain with expected behavior change.
- PATCH: bug fix with same intended behavior (e.g., fixed seed, minor preprocessing bug).
- Stage aliases: promote by moving aliases (Staging → v15, Production → v12) instead of rewriting versions.
- Branching/Forking: create a new model name or branch when experiments diverge significantly (e.g., classic-ctr vs gbdt-ctr).
Worked examples
Example 1: First release with clean lineage
- Train a model with commit abc123 and dataset snapshot ds_2025_10_01.
- Register as v1 with metadata: commit=abc123, dataset=ds_2025_10_01, metrics={auc:0.86}, docker_digest=sha256:... , seed=42.
- Set stage Staging → v1. After validation, promote Production → v1.
Example 2: Patch hotfix for preprocessing bug
- Bug fix in code (commit def456) that corrects a normalization step; interface unchanged.
- Retrain; register as v1.0.1 with metadata change_type=patch, commit=def456.
- Compare metrics to v1.0.0; if improved or equal and safe, move Production → v1.0.1.
Example 3: Rollback from bad production metrics
- Production points to v12; monitoring shows conversion drop.
- Move Production → v11 (immutable version already in registry). No rebuild needed.
- Tag v12 as deprecated and add root-cause notes in metadata.
Example 4: A/B test with two candidates
- Register v20 and v21 with different feature sets; both reference the same dataset snapshot where possible.
- Attach experiment_id to both versions; deploy as canary/shadow.
- Record online metrics per version; promote the winner by updating Production alias.
Practical workflow (step-by-step)
- Train: Produce artifacts and capture run metadata automatically.
- Register: Create a new immutable version with all lineage fields filled.
- Validate: Gate with checks (schema, metrics thresholds, bias tests) recorded in metadata.
- Promote: Move stage alias to the approved version; record approver and reason.
- Monitor: Link online metrics dashboards to the version; store drift alerts as tags.
- Rollback or Iterate: Use aliases to switch versions; keep notes for audit trail.
Metadata checklist
- Model identity: name, version, stage, owners.
- Code lineage: repo URL (as text), commit hash, pipeline run ID.
- Data lineage: dataset snapshot ID/hash, feature definitions version, time window.
- Training setup: hyperparameters, seeds, hardware, duration.
- Artifacts: model files, preprocessing assets, schema, signature.
- Environment: framework version, Docker image digest, dependency lockfile hash.
- Metrics: offline metrics, dataset split, acceptance thresholds.
- Approvals: reviewer, date, decision, risk notes.
- Online linkage: monitoring job ID, alert thresholds.
- Change summary: what changed and why (SemVer bump rationale).
Exercises
Do these now. The Quick Test below is available to everyone; log in to save your progress.
Exercise 1: Design a versioning plan for a CTR model
Define how you will version a click-through-rate (CTR) model across Dev, Staging, Production. Include SemVer rules and stage alias policy.
- Decide which changes trigger MAJOR/MINOR/PATCH.
- Describe how to roll back within 2 minutes.
- Specify who can promote to Production and what must be checked.
Exercise 2: Draft a minimal metadata schema
Write a compact JSON-like schema that ensures full reproducibility for your model. Include code, data, metrics, and environment fields.
Need a hint?
- Tie versions to immutable artifacts and commit hashes.
- Use aliases like Production/Staging for promotions and rollbacks.
- Include dataset snapshot IDs and dependency lockfile/Docker digests.
Show example solutions
Exercise 1 — Example
{
"versioning": {
"scheme": "semver",
"rules": {
"MAJOR": "feature schema or output contract change",
"MINOR": "new feature set/architecture retrain",
"PATCH": "bug fix or non-behavioral change"
}
},
"stages": {
"aliases": ["Dev", "Staging", "Production"],
"promotion": {
"required_checks": ["schema_ok", "metric_auc>=0.85", "bias_check_ok"],
"approver_role": "ML Lead"
},
"rollback": "move Production alias to previous known-good version; immutable versions guaranteed"
}
}
Exercise 2 — Example
{
"model_name": "ctr_ranking",
"version": "1.2.0",
"owners": ["mlops@company"],
"code": {"repo": "git@example:ml/ctr.git", "commit": "abc123"},
"data": {"snapshot_id": "ds_2025_10_01", "features_version": "feats_v7"},
"training": {"params": {"lr": 0.01, "depth": 8}, "seed": 42, "hardware": "A10G"},
"artifacts": {"model": "model.onnx", "preproc": "featspec.json"},
"env": {"framework": "onnxruntime 1.17", "docker_digest": "sha256:...", "lockfile_hash": "..."},
"metrics": {"split": "val", "auc": 0.87, "logloss": 0.42, "thresholds": {"auc": ">=0.85"}},
"approvals": {"reviewer": "ml_lead", "status": "approved", "date": "2025-10-03"},
"notes": {"change": "Added user_age feature", "risk": "low"}
}
Self-check checklist
- Can you promote or roll back without rebuilding artifacts?
- Can someone reproduce the model with only the metadata?
- Is the SemVer bump justified by the actual change?
Common mistakes and self-check
- Mutable versions: Editing artifacts after registration. Self-check: hash of model file should never change for a version.
- Missing data lineage: Not recording snapshot/hash. Self-check: can you locate the exact dataset used?
- Unclear promotion criteria: Ad-hoc decisions. Self-check: list objective thresholds before training.
- Environment drift: Deploying with different libraries than training. Self-check: store Docker digest and lockfile hash.
- Overloaded tags: Free-form notes without structure. Self-check: keep core fields structured, use tags only for extras.
Practical projects
- Build a local registry directory structure with immutable version folders and a metadata.json per version; write a CLI script to promote stage aliases.
- Create a reproducibility report generator that turns metadata into a one-page audit (who, what, when, thresholds, results).
- Implement a rollback playbook: given a Production alias, automatically switch to the last passing version and notify owners (simulation).
Learning path
- Before this: experiment tracking, dataset versioning, CI basics.
- Now: model versioning + metadata (this lesson).
- Next: promotion workflows, governance, monitoring and drift handling.
Next steps
- Adopt immutable versioning and stage aliases in your current project.
- Fill the metadata checklist for your latest model and fix any gaps.
- Document promotion criteria that your team will enforce.
Mini challenge
Scenario: A new Production model v3.1.0 shows a 4% KPI drop. In 5 steps, outline how you will roll back, investigate, and update metadata so the issue cannot repeat.
Suggested outline
- Move Production → previous known-good version (v3.0.2); record rollback timestamp and reason.
- Freeze v3.1.0 with a tag "investigation"; collect online metrics and input samples.
- Compare lineage: commit, dataset snapshot, env digest; check for mismatches vs Staging.
- Fix root cause; register v3.1.1 with change_type=patch; attach validation evidence.
- Promote after checks; add postmortem link/note to both v3.1.0 and v3.1.1 metadata.