How to learn Deployment Automation for CI CD for ML in Machine Learning Engineer for free

Why this matters

In real ML engineering, the model only creates value once it’s serving reliably. Deployment automation lets you ship models and services quickly, safely, and repeatably—without manual steps that break under pressure.

Push a change to code or model and have it built, tested, and deployed automatically.
Roll out gradually (canary or blue/green) to reduce risk and roll back fast if metrics degrade.
Keep environments consistent with containers and infrastructure as code.
Ship both online inference APIs and batch jobs on predictable schedules.

Concept explained simply

Deployment automation is a conveyor belt from commit to production. Every commit rides the belt through gates: build, test, package, deploy, verify. No hand-holding, just reliable steps.

Mental model

Source of truth: code + model artifacts (registry) + configs.
Factory steps: build image, run tests, publish artifact.
Traffic control: safely route users to new versions.
Observability: watch health and metrics; auto-stop if needed.

Tip: Map it to your setup

Write down: where your code lives, where Docker images go, how you deploy (e.g., Kubernetes), and what tests block promotion. That’s your conveyor belt.

Key building blocks

Versioned artifacts: package models with explicit versions. Store them in a registry or as immutable image tags.
Containers: the same image runs in dev/staging/prod for environment parity.
Infrastructure as Code: declarative manifests for services, jobs, secrets, and networks.
CI runner: executes the pipeline (build, test, deploy).
Deployment targets: cluster or serverless runtime for APIs and batch jobs.
Rollout strategies: blue/green, canary, or shadow to reduce risk.
Secrets management: inject keys and configs securely at deploy time.
Observability gates: smoke tests, health checks, and simple SLO guards to block bad releases.

Worked examples

Example 1 — Auto-deploy an online inference API

Goal: On push to main, build a Docker image, deploy to staging, smoke test, then allow manual approval to production.

Minimal pipeline (generic CI syntax)

# .github/workflows/deploy.yml (example syntax, adapt to your CI)
name: deploy-api
on:
  push:
    branches: ["main"]
jobs:
  build-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build image
        run: |
          docker build -t registry.example.com/ml-api:${GITHUB_SHA} .
      - name: Unit tests
        run: |
          pytest -q
      - name: Push image
        run: |
          echo "$REGISTRY_TOKEN" | docker login registry.example.com -u token --password-stdin
          docker push registry.example.com/ml-api:${GITHUB_SHA}
  deploy-staging:
    needs: build-test
    runs-on: ubuntu-latest
    steps:
      - name: Kube auth
        run: echo "$KUBE_CONFIG_STAGING" > $HOME/.kube/config
      - name: Update image and apply
        run: |
          kubectl set image deploy/ml-api ml-api=registry.example.com/ml-api:${GITHUB_SHA} -n staging
          kubectl rollout status deploy/ml-api -n staging --timeout=120s
      - name: Smoke test
        run: |
          curl -fsS http://staging.example.local/healthz
  manual-approve-and-deploy-prod:
    needs: deploy-staging
    runs-on: ubuntu-latest
    steps:
      - name: Manual approval gate
        run: echo "Approve in CI UI to continue" # Use your CI's approval feature
      - name: Kube auth
        run: echo "$KUBE_CONFIG_PROD" > $HOME/.kube/config
      - name: Deploy prod
        run: |
          kubectl set image deploy/ml-api ml-api=registry.example.com/ml-api:${GITHUB_SHA} -n prod
          kubectl rollout status deploy/ml-api -n prod --timeout=180s
      - name: Post-deploy check
        run: |
          curl -fsS http://api.example.com/healthz

Kubernetes deployment (sketch)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-api
  namespace: staging
spec:
  replicas: 2
  selector:
    matchLabels: { app: ml-api }
  template:
    metadata:
      labels: { app: ml-api }
    spec:
      containers:
        - name: ml-api
          image: registry.example.com/ml-api:TAG
          ports: [{ containerPort: 8080 }]
          readinessProbe:
            httpGet: { path: /healthz, port: 8080 }
            initialDelaySeconds: 5
            periodSeconds: 5
          resources:
            requests: { cpu: "200m", memory: "256Mi" }
            limits:   { cpu: "1",    memory: "512Mi" }

Example 2 — Automate a batch scoring job

Goal: Package a batch job (predict file-to-file) into a container and run nightly.

Cron-style runtime

apiVersion: batch/v1
kind: CronJob
metadata:
  name: batch-scoring
  namespace: prod
spec:
  schedule: "0 2 * * *"  # nightly at 02:00
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
            - name: scorer
              image: registry.example.com/batch-scorer:TAG
              args: ["--input", "/data/input.parquet", "--output", "/data/pred.parquet"]

Pipeline step to update CronJob image

kubectl set image cronjob/batch-scoring scorer=registry.example.com/batch-scorer:${GIT_SHA} -n prod
kubectl rollout status cronjob/batch-scoring -n prod --timeout=120s

Example 3 — Blue/Green promotion for low-risk releases

Goal: Deploy v2 alongside v1, switch traffic instantly, and keep v1 ready for rollback.

Two deployments, one service

# v1 deployment has label app: ml-api, version: v1
# v2 deployment has label app: ml-api, version: v2
# Service selects by version label; we switch the selector to promote.
apiVersion: v1
kind: Service
metadata:
  name: ml-api
  namespace: prod
spec:
  selector:
    app: ml-api
    version: v1  # switch to v2 to promote
  ports:
    - port: 80
      targetPort: 8080

Promotion and rollback

# Promote
kubectl patch svc ml-api -n prod -p '{"spec":{"selector":{"app":"ml-api","version":"v2"}}}'
# Rollback (instant)
kubectl patch svc ml-api -n prod -p '{"spec":{"selector":{"app":"ml-api","version":"v1"}}}'

Step-by-step: build your first automated deployment

Prepare repo
app/ with API or batch code, tests/ with unit tests, Dockerfile, k8s/ manifests, and .ci/ pipeline file.

Containerize

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["uvicorn","app.main:app","--host","0.0.0.0","--port","8080"]

Add tests

def test_predict_shape():
    from app.model import predict
    assert predict([[1,2,3]]).shape == (1,)

Write CI pipeline: build, test, push image; fail fast on test errors.
Deploy to staging: apply manifests; wait for rollout; smoke test /healthz.
Manual approval gate: require a human click to deploy to prod.
Post-deploy checks: hit /healthz, verify logs, and readiness status. Consider automated rollback if checks fail.

Exercises

These exercises are available for everyone; only logged-in users will have progress saved.

ex1: Write a CI pipeline that builds an image, deploys to staging, and runs a smoke test. See details below.
ex2: Implement blue/green in Kubernetes using two deployments and switch traffic by updating the Service selector.

Exercise checklist

Pipeline builds deterministically with versioned image tags.
Deploy waits for readiness and runs a smoke test.
No hard-coded secrets in code or manifests.
Blue and green are independently scalable.
Rollback is a single command.

Common mistakes and self-check

Skipping health checks: Add readiness/liveness probes; verify rollouts wait for readiness.
Mutable tags: Avoid latest; use commit SHA or a version string.
Manual config drift: Keep manifests in version control; do not edit live resources by hand.
Secrets in repo: Inject via CI secrets or your platform’s secret store.
No smoke tests: A simple GET /healthz would have caught basic issues.
One-step prod deploys: Use staging + approval, or a controlled rollout.

Self-check mini audit

Can you redeploy the same commit and get the same result?
Can you promote/rollback in under 1 minute?
Is the current production image tag findable in your CI logs?

Practical projects

API: Containerize a FastAPI model server, deploy with blue/green, add a one-minute rollback script.
Batch: Nightly feature generation + scoring CronJob with success/failure alerts.
Shadow traffic: Send a copy of requests to a new model version and compare metrics offline before promotion.

Who this is for

Machine Learning Engineers owning model serving and reliability.
Data Scientists promoting models to production with minimal ops.
MLOps/Platform Engineers building paved roads for teams.

Prerequisites

Basic containerization (Docker) and command-line familiarity.
Comfort with writing unit tests and simple HTTP endpoints.
Understanding of Kubernetes or your chosen deployment runtime.

Learning path

Start: containerize your app and add health endpoints.
Add CI: build, test, push images on every commit.
Add CD: deploy to staging with smoke tests; add manual prod approval.
Introduce blue/green or canary; practice rollbacks.
Automate post-deploy checks and metric guards.

Next steps

Integrate automated rollback triggers on failed smoke tests.
Add basic SLOs for latency and error rate.
Automate migration tasks (e.g., feature store backfills) as pre-deploy hooks.

Mini challenge

Pick an existing model service. Implement blue/green and demonstrate: deploy v2, run smoke tests, switch traffic, confirm health, and roll back. Capture the commands you used as a runbook.

Ready? Take the quick test below. The test is available to everyone; log in to save your result.

Menu

Deployment Automation

Table of Contents

Why this matters

Concept explained simply

Mental model

Key building blocks

Worked examples

Example 1 — Auto-deploy an online inference API

Example 2 — Automate a batch scoring job

Example 3 — Blue/Green promotion for low-risk releases

Step-by-step: build your first automated deployment

Exercises

Common mistakes and self-check

Practical projects

Who this is for

Prerequisites

Learning path

Next steps

Mini challenge

Practice Exercises

Build, deploy to staging, and smoke test an ML API

Instructions

Expected Output

Blue/Green switch using Service selector

Deployment Automation — Quick Test

Have questions about Deployment Automation?

AI Assistant