Why this matters
In MLOps, reliable and repeatable containers are the backbone of CI/CD. You will use them to run model training, batch scoring, real-time inference services, and offline jobs. Solid build and publishing practices ensure small images, faster pipelines, secure deployments, and easy rollbacks.
- Ship inference services that start fast and fit security policies.
- Package training code with pinned dependencies for reproducible runs.
- Promote the same image from staging to production using immutable digests.
- Trace each deployment back to code, data, and model versions.
Who this is for
- Aspiring and current MLOps Engineers implementing CI/CD for ML services.
- Data/ML Engineers who package models and pipelines for production.
- Software Engineers adding ML components to existing stacks.
Prerequisites
- Basic command line and Git skills.
- Familiarity with Docker fundamentals (images, containers, Dockerfile).
- Python project structure (for examples) or ability to translate to another language.
Concept explained simply
Think of a container image like a frozen, ready-to-run workspace. It includes your code, dependencies, and entry command. Publishing pushes that image to a registry so your CI/CD and runtime can pull it reliably.
Mental model
- Base image: your starting point (e.g., python:3.11-slim).
- Layers: every Dockerfile instruction adds a cached layer; fewer, smaller layers build faster.
- Multi-stage builds: use a builder stage for heavy stuff, copy only artifacts into a minimal runtime stage.
- Tags vs digest: tags are human labels; digests are immutable content identifiers. Deploy by digest for safety.
- Reproducibility: pin versions, avoid secrets in the image, and generate an SBOM to know what’s inside.
The standard flow
- Design your Dockerfile: multi-stage, install only what you need, run as non-root.
- Pin dependencies: lock files or exact versions; pin base image by digest when possible.
- Speed up builds: use .dockerignore, layering wisely, and build cache (BuildKit).
- Tag images for traceability: include semantic version, date, and Git SHA.
- Scan and produce SBOM: identify vulnerabilities and contents.
- Publish to a registry: login, tag, push; prefer immutable deployment by digest.
Example tag scheme
my-registry/ml/serving:1.3.0
my-registry/ml/serving:1.3.0-20260104-ab12cd3
my-registry/ml/serving:sha-ab12cd3
# Deploy by digest (example):
my-registry/ml/serving@sha256:9f...e1Worked example 1: FastAPI inference service (multi-stage)
Goal: small, non-root, reproducible image.
Dockerfile (multi-stage)
# syntax=docker/dockerfile:1.6
FROM python:3.11-slim AS base
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
WORKDIR /app
FROM base AS builder
RUN apt-get update && apt-get install -y --no-install-recommends build-essential && rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt
FROM base AS runtime
RUN useradd -m -u 10001 appuser
COPY --from=builder /wheels /wheels
RUN pip install --no-cache-dir /wheels/* && rm -rf /wheels
COPY . .
USER 10001
EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD python -c "import socket; s=socket.socket(); s.settimeout(3); s.connect(('127.0.0.1',8000))" || exit 1
CMD ["python","-m","uvicorn","app:app","--host","0.0.0.0","--port","8000"].dockerignore (keep builds fast and clean)
.git
__pycache__
*.pyc
*.pyo
*.pyd
.env
.vscode
.idea
.venv
venv
build/
dist/
*.ipynb_checkpoints
.data/
artifacts/
Build and tag
export IMAGE=my-registry/ml/fastapi-service
export SHA=$(git rev-parse --short HEAD 2>/dev/null || echo local)
export DATE=$(date -u +%Y%m%d)
docker buildx build \
--platform linux/amd64 \
-t $IMAGE:1.0.0 \
-t $IMAGE:1.0.0-$DATE-$SHA \
-t $IMAGE:sha-$SHA \
--push .Worked example 2: Training job container
Goal: package training script with pinned dependencies and a clear entrypoint.
Dockerfile
FROM python:3.11-slim AS base
ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1
WORKDIR /app
COPY requirements-train.txt ./requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
COPY train.py .
ENTRYPOINT ["python","train.py"]Build and run
export IMAGE=my-registry/ml/train-job
export SHA=$(git rev-parse --short HEAD 2>/dev/null || echo local)
docker buildx build -t $IMAGE:0.2.0 -t $IMAGE:sha-$SHA --load .
# Example run (local)
docker run --rm -e EPOCHS=3 -e LR=0.001 $IMAGE:0.2.0Worked example 3: Publish with cache, SBOM, digest
Build, cache, push, and SBOM
export IMAGE=my-registry/ml/fastapi-service
export DATE=$(date -u +%Y%m%d)
export SHA=$(git rev-parse --short HEAD 2>/dev/null || echo local)
docker buildx build \
--platform linux/amd64 \
--cache-from=type=registry,ref=$IMAGE:buildcache \
--cache-to=type=registry,ref=$IMAGE:buildcache,mode=max \
-t $IMAGE:1.1.0 \
-t $IMAGE:1.1.0-$DATE-$SHA \
-t $IMAGE:sha-$SHA \
--push .
# Generate SBOM (stored locally)
docker sbom $IMAGE:1.1.0 --format spdx-json > sbom.spdx.json
# Get digest for immutable deploy
DIGEST=$(docker buildx imagetools inspect $IMAGE:1.1.0 | awk '/Digest:/ {print $2; exit}')
echo "Deploy using: $IMAGE@$DIGEST"Security and compliance essentials
- Run as non-root and drop unnecessary Linux capabilities.
- Pin base image by digest when possible; pin dependency versions.
- Keep secrets out of images; use build secrets and runtime env/secret stores.
- Generate and store SBOM; scan for vulnerabilities and update regularly.
- Keep images small to reduce surface area and speed up CI/CD.
Publishing strategy
- Login once in CI, then tag and push.
- Use three tags per build: version, version+date+sha, and sha-only.
- Deploy by digest for immutability; keep a mapping from tag to digest.
- Promote artifacts: move the same digest from staging to production.
- Apply retention policies for caches and old tags.
CI/CD integration
- Pre-merge checks: lint Dockerfile, build image, run unit tests inside container.
- Use BuildKit with registry cache to speed up subsequent builds.
- On main branch or release tag: build, push, produce SBOM, store metadata (tags, digest).
- Optionally build multi-arch images if required.
- Record provenance (who built what, when, from which commit).
Exercises
These mirror the tasks below. You can complete them locally or adapt to your environment.
Exercise 1: Minimal, secure inference image
- Create a multi-stage Dockerfile for a FastAPI service with uvicorn.
- Install dependencies via wheels in builder stage; copy only what is needed.
- Run as non-root, expose 8000, add a basic HEALTHCHECK.
- Add a .dockerignore to keep the build clean.
- Build with three tags: 1.0.0, 1.0.0-YYYYMMDD-
, and sha- .
Need a nudge?
- Use python:3.11-slim base.
- pip wheel in builder; pip install from local wheels in runtime.
- useradd to create a non-root user.
Exercise 2: Cache, push, and SBOM
- Configure buildx with registry cache-from/cache-to.
- Push your image to a registry namespace you control.
- Generate an SBOM in SPDX JSON format and save it to a file.
- Retrieve and print the image digest you will deploy with.
Need a nudge?
- Use --cache-from/--cache-to with type=registry.
- Use docker sbom to generate SBOM.
- Use docker buildx imagetools inspect to read the digest.
Checklist: am I ready?
- Multi-stage Dockerfile builds successfully and runs as non-root.
- Image size is reasonably small for the stack you chose.
- Tags include semantic version and Git SHA; you can obtain the digest.
- SBOM is generated; you know where it’s stored.
- Builds are fast on repeat due to effective caching.
- No secrets or large data bundled into the image.
Common mistakes and self-check
- Bundling secrets in the image. Self-check: search layers and .dockerignore; ensure secrets are injected at runtime.
- Not pinning versions. Self-check: look for exact versions or a lock file for base and dependencies.
- Using only latest tag. Self-check: confirm you have immutable tags and deploy by digest.
- Overly large images. Self-check: compare sizes; enable multi-stage and slim base.
- Root user. Self-check: confirm USER is set to a non-root UID.
- Poor caching. Self-check: rearrange Dockerfile so rarely changed instructions run first.
Practical projects
- Inference container for a text classification model with FastAPI and background warmup.
- Batch scoring container that mounts data at runtime and writes results to an output directory.
- Training container that saves checkpoints to a mounted path and supports configurable hyperparameters via env vars.
Learning path
- Before this: Docker basics, Python packaging, environment variables and secrets.
- Now: Container build and publishing with caching, tagging, SBOM.
- Next: CI/CD pipeline automation, deployment strategies (blue/green, canary), and monitoring.
Next steps
- Automate your build and publish steps in your CI system.
- Add vulnerability scanning and enforce non-root images in CI.
- Adopt a registry promotion flow that moves the same digest through environments.
Mini challenge
Take an existing ML service and reduce its image size by at least 30% without changing functionality. Provide the before/after sizes, the new Dockerfile, and the exact tags used. Bonus: add SBOM and a health check.
Note about progress
The Quick Test is available to everyone. Log in to save your results and track progress.