Who this is for
Example 4: GitOps overlays for environments
Goal: Keep one deployment template with environment-specific overlays.
deploy/
base/
model-deployment.yaml
overlays/
dev/
kustomization.yaml
values.yaml # small resources, debug logging
staging/
kustomization.yaml
values.yaml # production-like, canary enabled
prod/
kustomization.yaml
values.yaml # full resources, strict SLOs
Outcome
Promotion becomes a pull request merging the staging overlay into production with policy checks running automatically.
Typical promotion workflow (step cards)
- Build: Train, package, and sign artifacts; export metrics and metadata.
- Validate: Run unit/integration tests, data quality checks, and performance benchmarks.
- Gate: Evaluate metrics vs. baseline; run fairness and drift checks; require approvals.
- Promote: Transition registry stage; update Git overlay; trigger CD to target environment.
- Progressive deliver: Canary or shadow with active SLO monitoring and auto-rollback.
- Audit: Record promotion decision, approvers, artifact digest, and environment versions.
Reusable templates you can adapt
Minimal promotion gate (YAML)
gate:
require:
- tests_passed
- artifact_signed
metrics:
accuracy_delta: ">= 0.00"
latency_p95_ms: "<= 180"
approvals:
min: 1
Environment-specific variables
env:
dev:
traffic: 100
monitoring: light
staging:
traffic: 100
monitoring: full
prod:
traffic: 100
monitoring: full
rollout: canary
Exercises
Hands-on practice. The quick test is available to everyone; only logged-in users get saved progress.
- Exercise 1: Write a promotion policy that blocks prod if F1 drops by >1%, fairness gap >0.05, or latency p95 > 150 ms. Require one approval. Include auto-rollback for canary failures.
- Exercise 2: Create dev/staging/prod overlays where only prod enables canary and strict SLOs. Show how a registry stage change triggers the prod deployment.
- [ ] Metrics thresholds defined and commented
- [ ] Approval rule specified
- [ ] Canary steps and rollback conditions set
- [ ] Overlays differ only where necessary
- [ ] Registry stage transition wired into CD
Common mistakes and self-check
- No baseline comparison: Self-check: Do gates compare new to last stable?
- Overfitting to one metric: Self-check: Do you balance accuracy and latency? Any fairness check?
- Manual approvals without audit: Self-check: Is approval recorded in code or logs?
- Environment drift: Self-check: Are overlays minimal and reviewed? Same base image across envs?
- Missing rollback: Self-check: Is rollback automatic on SLO breach?
Practical projects
- Implement a policy-as-code gate that reads metrics.json and blocks promotion if thresholds fail.
- Set up canary rollout with two steps (10% → 50%) and auto-rollback on error spikes.
- Wire model registry stage transitions to trigger GitOps promotion to prod.
- Create environment overlays with strict prod SLOs, light dev checks, and parity in dependencies.
Learning path
- Build and Test Pipelines → Artifact Management → Environment Promotion Automation → Progressive Delivery → Monitoring and Rollback
Next steps
- Complete the exercises and take the quick test to validate understanding.
- Pick one practical project and implement it in a demo repo.
- Iterate on your policy thresholds based on real SLOs.
Mini challenge
Design a single YAML policy that: (1) reads baseline metrics, (2) blocks if fairness gap > 0.03, (3) allows up to 0.5% accuracy drop if latency improves by 10%, and (4) requires one human approval for prod. Keep it under 40 lines.