How to learn CI CD for ML for Machine Learning Engineer for free

Why CI/CD for ML matters for Machine Learning Engineers

CI/CD turns your ML work from one-off notebooks into reliable, repeatable releases. As a Machine Learning Engineer, you’ll automate tests for data and code, train and evaluate models in pipelines, enforce quality gates, package artifacts, deploy with confidence, and promote them across environments. This reduces regressions, speeds up delivery, and keeps models safe and traceable in production.

Who this is for

Machine Learning Engineers formalizing model delivery.
Data Scientists moving from notebooks to production.
MLOps/Platform Engineers standardizing ML pipelines.

Prerequisites

Comfort with Python and virtual environments.
Basic Git usage (branches, commits, pull requests).
Familiarity with testing (pytest) and packaging (pip/poetry).
Basic Docker skills helpful for deployment automation.

Learning path

Milestone 1: Stabilize the codebase
Set up linting, formatting, unit tests, and type checks in CI for fast feedback on every pull request.

Milestone 2: Make data checks first-class
Add data schema checks, sample-based validation, and smoke tests that run before training.

Milestone 3: Automate training and evaluation in CI
Run quick training on small samples, compute metrics, and fail the build if quality gates aren’t met.

Milestone 4: Package and store artifacts
Version and publish model packages and metadata; upload build artifacts for reproducibility.

Milestone 5: Deployment automation
Automate blue/green or canary deployments using environment-specific configs and approval gates.

Milestone 6: Promotion & rollback
Promote immutable artifacts across dev → staging → prod; define fast rollback/rollforward procedures.

What to automate first

Run lint + tests on every pull request.
Validate a small data sample before training.
Cache dependencies to keep CI fast (<10 min).

Worked examples

Example 1 — CI pipeline with lint, tests, and data checks (GitHub Actions)

name: ci-ml
on: [pull_request, push]
jobs:
  build-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.11' }
      - name: Cache pip
        uses: actions/cache@v4
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements*.txt') }}
      - name: Install
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
      - name: Lint & static checks
        run: |
          ruff check .
          black --check .
          mypy src || true  # treat types as advisory early on
      - name: Run unit tests
        run: pytest -q
      - name: Data schema check (Pandera)
        run: pytest -q tests/test_data_schema.py

Minimal Pandera-based data schema test:

# tests/test_data_schema.py
import pandas as pd
import pandera as pa
from pandera import Column, DataFrameSchema

def test_training_data_schema():
    schema = DataFrameSchema({
        "age": Column(pa.Int, checks=pa.Check.ge(0)),
        "income": Column(pa.Float, checks=pa.Check.ge(0)),
        "label": Column(pa.Int, checks=pa.Check.isin([0, 1]))
    }, coerce=True)
    sample = pd.DataFrame({
        "age": [25, 44, 61],
        "income": [55000.0, 83000.0, 42000.0],
        "label": [0, 1, 0]
    })
    schema.validate(sample)

Example 2 — Train small and gate on metrics

Train on a small sample in CI, compute F1, and fail if below threshold.

# scripts/train_small.py
import json
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score

X, y = load_breast_cancer(return_X_y=True)
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression(max_iter=200).fit(Xtr, ytr)
preds = model.predict(Xte)
metric = f1_score(yte, preds)
print(f"f1={metric:.4f}")
with open("metrics.json", "w") as f:
    json.dump({"f1": float(metric)}, f)

# scripts/quality_gate.sh
set -euo pipefail
THRESHOLD=${1:-0.90}
F1=$(jq -r .f1 metrics.json)
echo "F1=$F1 threshold=$THRESHOLD"
awk -v f1="$F1" -v t="$THRESHOLD" 'BEGIN{exit !(f1 >= t)}' || {
  echo "Quality gate failed"; exit 1; }

# add to workflow steps after install
      - name: Train (small) and compute metrics
        run: |
          python scripts/train_small.py
      - name: Enforce quality gate
        run: bash scripts/quality_gate.sh 0.90

Example 3 — Package and upload build artifacts

# scripts/package.sh
set -euo pipefail
mkdir -p dist
cp metrics.json dist/
echo "1.2.${GITHUB_RUN_NUMBER:-0}" > dist/VERSION
python -m pip install build
python -m build  # if you use pyproject.toml

# workflow steps
      - name: Package
        run: bash scripts/package.sh
      - name: Upload artifact
        uses: actions/upload-artifact@v4
        with:
          name: ml-build
          path: dist/

Artifacts make runs reproducible: keep version, metrics, and the built wheel/model files together.

Example 4 — Deployment automation with environment gates

# Deploy only on main after artifact build
jobs:
  deploy:
    if: github.ref == 'refs/heads/main'
    needs: [build-test]
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - uses: actions/checkout@v4
      - name: Download artifact
        uses: actions/download-artifact@v4
        with: { name: ml-build, path: dist }
      - name: Deploy to staging (blue)
        env:
          KUBE_CONFIG: ${{ secrets.KUBE_CONFIG_STAGING }}
        run: |
          echo "$KUBE_CONFIG" > kubeconfig
          export KUBECONFIG=$PWD/kubeconfig
          kubectl apply -f k8s/staging/blue.yaml

Use protected environments for manual approval steps before promoting to production.

Example 5 — Rollback and rollforward

# scripts/rollback.sh
set -euo pipefail
APP=${1:-ml-service}
NAMESPACE=${2:-prod}
# Roll back last deployment revision (Helm-style) or re-apply previous manifest
if command -v helm >/dev/null; then
  helm rollback "$APP" 1 --namespace "$NAMESPACE"
else
  echo "Applying previous manifest";
  kubectl apply -n "$NAMESPACE" -f k8s/prod/previous.yaml
fi

# scripts/rollforward.sh
set -euo pipefail
VERSION=${1:?provide version}
# Re-deploy the fixed build
kubectl set image deploy/ml-service ml-service=registry.example.com/ml-service:"$VERSION" -n prod

Security and secrets tips

Never commit secrets; use the CI platform’s encrypted secrets.
Prefer short-lived credentials and workload identity (no static keys).
Restrict secret exposure to specific jobs/environments.

Drills and exercises

[ ] Add ruff/black/mypy to your pipeline and make the job complete under 5 minutes.
[ ] Write one data schema test that fails on negative values.
[ ] Train on a 1% data sample in CI and output accuracy/F1 to a JSON file.
[ ] Implement a quality gate that fails if your key metric drops by 2% from main.
[ ] Package your model and upload a versioned artifact.
[ ] Create a staging deploy job that requires manual approval to proceed to prod.
[ ] Document a rollback playbook and test it in a sandbox environment.

Common mistakes and debugging tips

Training full datasets in CI: CI should be fast. Train on small samples; schedule full retraining separately.
Ignoring data validation: Most failures are data-related. Validate schemas and distributions before training.
Non-deterministic runs: Seed random generators, pin dependencies, and capture commit SHA + data version.
Mixing build and environment configs: Keep artifacts immutable; apply env-specific configs at deploy time.
Weak secrets hygiene: No plaintext keys in code or logs. Scope secrets to environments.
No rollback plan: Practice rollback and rollforward. Keep previous versions readily available.

Debugging checklist

Did the job pick the correct commit SHA and artifact version?
Are seeds and dependency versions fixed?
Do data checks run before training?
Are quality gates reading the right metrics file?
Is the deploy job pointing to staging/prod namespaces as intended?

Mini project: From PR to production

Build a small classification model and deliver it through CI/CD.

Create a repo with src/, tests/, scripts/, and k8s/ folders.
Implement linting and unit tests (src/ code + data schema tests).
Train on a tiny sample in CI, compute F1, and enforce a 0.90 threshold.
Package and upload an artifact with VERSION and metrics.json.
Deploy to staging after main merges; require approval to promote to prod.
Write rollback and rollforward scripts and validate them in a mock prod namespace.

Subskills

Automated Tests For Data And Code: Linting, unit tests, and data validation to catch issues early.
Pipeline Linting And Static Checks: Enforce style and types for maintainable ML repos.
Training And Evaluation In CI: Small-sample training and fast metric calculation.
Model Quality Gates: Fail builds if metrics regress beyond thresholds.
Packaging And Publishing Artifacts: Versioned, immutable model builds with metadata.
Deployment Automation: Scripted rollouts with environment separation.
Rollbacks And Rollforward Strategy: Fast recovery and safe re-deploys.
Secrets Management Basics: Safe storage and restricted exposure in CI jobs.
Promotion Across Environments: Dev → staging → prod using approvals and immutable artifacts.

Next steps

Harden quality gates with drift checks and business KPIs.
Add monitoring alerts for latency, errors, and model performance.
Introduce scheduled full retraining and compare to last prod metrics.

FAQ

How often should CI train models? Usually only small, quick runs per PR; schedule full training separately.
How to control costs? Cache dependencies, sample data, parallelize tests, and prune artifacts with retention policies.
What about salaries? Varies by country/company; treat as rough ranges if you research them elsewhere.

Menu

CI CD for ML

Table of Contents

Why CI/CD for ML matters for Machine Learning Engineers

Who this is for

Prerequisites

Learning path

Worked examples

Example 1 — CI pipeline with lint, tests, and data checks (GitHub Actions)

Example 2 — Train small and gate on metrics

Example 3 — Package and upload build artifacts

Example 4 — Deployment automation with environment gates

Example 5 — Rollback and rollforward

Drills and exercises

Common mistakes and debugging tips

Mini project: From PR to production

Subskills

Next steps

CI CD for ML — Skill Exam

Topics

Automated Tests For Data And Code

Pipeline Linting And Static Checks

Training And Evaluation In CI

Promotion Across Environments

Model Quality Gates

Packaging And Publishing Artifacts

Deployment Automation

Rollbacks And Rollforward Strategy

Secrets Management Basics

Have questions about CI CD for ML?

AI Assistant