How to learn Containerization Docker for Machine Learning Engineer for free

Why Docker matters for Machine Learning Engineers

Docker makes your ML code, models, and dependencies portable and reproducible across laptops, servers, and cloud. It reduces the “works on my machine” problem, speeds up deployment, and helps you standardize inference and training environments, including GPU acceleration.

With Docker, you can: build reliable inference services, bundle models and feature logic, spin up local stacks (API + cache + message queue), and run scheduled training jobs in the same environment you test locally.

Who this is for

ML Engineers and Data Scientists moving models to production.
Backend-leaning ML practitioners who deploy APIs or batch jobs.
Anyone who needs reproducible training/inference environments, including GPUs.

Prerequisites

Comfortable with Python and virtual environments.
Basic command line usage (build, run, copy files).
Familiarity with ML libraries (scikit-learn, PyTorch or TensorFlow) and how to load a trained model.
Optional: NVIDIA GPU on host + NVIDIA Container Toolkit for GPU exercises.

Learning path (roadmap)

Containers basics: images vs containers, layers, tags, .dockerignore, build context.
Writing Dockerfiles for ML: base images, pip installs, copying models, non-root user, health checks.
Dependency and runtime optimization: slim images, multi-stage builds, caching, BuildKit, pinned deps.
GPU-enabled containers: choosing CUDA bases, running with --gpus all, verifying device visibility.
Local stacks with Docker Compose: API + model + cache/message broker, networks, volumes, healthchecks.
Security scanning basics: scan images, minimize surface, least privilege, secrets handling.
Container debugging: logs, exec shell, health checks, inspecting layers, reproducible builds.
Reproducible builds: pinned versions, image digests, lock files, deterministic ARGs, repeatable build args.

Milestones checklist

Build a working CPU inference container that serves a simple model.
Reduce image size by at least 50% using multi-stage and slim bases.
Run a GPU container that sees the device and performs a quick CUDA op.
Spin up a Compose stack with API + Redis and pass a healthcheck.
Scan the image and fix at least two medium-severity issues.
Produce a reproducible build from a clean machine with the same image digest.

Worked examples

Example 1 — Minimal CPU inference service (FastAPI + scikit-learn)

Goal: Package a small model and expose a /predict endpoint.

# Dockerfile
FROM python:3.11-slim AS base
ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1
WORKDIR /app
# Reduce build context noise
COPY .dockerignore /app/.dockerignore
# System deps (only what's needed)
RUN apt-get update && apt-get install -y --no-install-recommends build-essential && rm -rf /var/lib/apt/lists/*
# Use a lock file if available; else requirements.txt
COPY requirements.txt ./
RUN --mount=type=cache,target=/root/.cache/pip pip install --no-cache-dir -r requirements.txt
COPY app/ ./app/
COPY model/ ./model/
EXPOSE 8000
CMD ["python", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

# app/main.py
from fastapi import FastAPI
import joblib
import numpy as np
app = FastAPI()
model = joblib.load("model/model.pkl")
@app.get("/health")
def health():
    return {"status": "ok"}
@app.post("/predict")
def predict(x: list[float]):
    arr = np.array([x])
    y = model.predict(arr).tolist()
    return {"y": y}

# requirements.txt
fastapi==0.110.0
uvicorn[standard]==0.29.0
scikit-learn==1.4.0
joblib==1.3.2
numpy==1.26.4

Build and run:

docker build -t ml-infer:cpu .
docker run --rm -p 8000:8000 ml-infer:cpu

Example 2 — Multi-stage build to shrink image size

Goal: Compile wheels in a builder, copy only artifacts into a slim runtime.

# Dockerfile
# Stage 1: build wheels
FROM python:3.11-slim AS builder
WORKDIR /w
RUN apt-get update && apt-get install -y --no-install-recommends build-essential python3-dev && rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN --mount=type=cache,target=/root/.cache/pip pip wheel --no-cache-dir --wheel-dir=/wheels -r requirements.txt
# Stage 2: runtime
FROM python:3.11-slim AS runtime
ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1
WORKDIR /app
COPY --from=builder /wheels /wheels
RUN pip install --no-cache-dir --no-index --find-links=/wheels -r /wheels/requirements.txt || true
# If requirements.txt not in wheels dir, copy and install directly
COPY requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir -r /app/requirements.txt
COPY app/ ./app/
COPY model/ ./model/
USER 1000:1000
CMD ["python", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Tip: Keep COPY of code after dependency layers to maximize cache hits during iterative development.

Example 3 — GPU-enabled container (PyTorch)

Goal: Use a CUDA-enabled base and verify GPU access.

# Dockerfile.gpu
FROM pytorch/pytorch:2.1.0-cuda11.8-cudnn8-runtime
WORKDIR /app
RUN pip install --no-cache-dir fastapi uvicorn[standard]
COPY gpu_app.py .
CMD ["python", "-m", "uvicorn", "gpu_app:app", "--host", "0.0.0.0", "--port", "8000"]

# gpu_app.py
from fastapi import FastAPI
import torch
app = FastAPI()
@app.get("/gpu")
def gpu():
    ok = torch.cuda.is_available()
    n = torch.cuda.device_count() if ok else 0
    return {"cuda": ok, "devices": n}

Build and run (host must have NVIDIA drivers + container toolkit):

docker build -f Dockerfile.gpu -t ml-infer:gpu .
docker run --rm --gpus all -p 8000:8000 ml-infer:gpu

Example 4 — Local stack with Docker Compose (API + Redis)

Goal: Run an inference API and a Redis cache locally.

# docker-compose.yml
services:
  api:
    build: .
    ports: ["8000:8000"]
    depends_on:
      redis:
        condition: service_healthy
    environment:
      - REDIS_HOST=redis
  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 2s
      retries: 10

docker compose up --build

Your app can connect using host "redis" on default port 6379.

Example 5 — Security scan and run as non-root

Goal: Scan image and harden container.

# Hardened snippet in Dockerfile
RUN adduser --disabled-password --gecos "" appuser && chown -R appuser:appuser /app
USER appuser
# expose and CMD as before

# Image scan using Trivy container (no host install required)
docker build -t myimg:latest .
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock aquasec/trivy image myimg:latest

Address findings by updating the base image, upgrading packages, and removing unnecessary tools.

Example 6 — Reproducible builds with pinned deps and image digests

Goal: Ensure consistent builds across machines.

# requirements.txt (pinned)
fastapi==0.110.0
uvicorn[standard]==0.29.0
scikit-learn==1.4.0
numpy==1.26.4
# Consider using hashes via pip-compile and --require-hashes for even stronger guarantees

# Dockerfile base pinned by digest
FROM python@sha256:...  # replace with the exact digest for a chosen tag

Build twice on clean machines; the resulting image digests should match when inputs are identical.

Drills and exercises

Create a Dockerfile for a CPU-only inference service under 300 MB compressed size.
Add a healthcheck endpoint and verify it with docker health status.
Convert your Dockerfile to a multi-stage build and measure size/time differences.
Run your service with a mounted model volume and update the model without rebuilding.
Use docker compose to add a cache and a message broker; confirm inter-service networking.
Run a GPU container that returns torch.cuda.is_available() == true.
Scan your image; fix at least two issues and re-scan to confirm.
Pin Python deps and base image digest; rebuild to confirm identical image digest.

Common mistakes and debugging tips

Installing deps after copying code: Leads to cache busts. Install dependencies before copying frequently changing app code.
Huge images: Use slim bases, multi-stage builds, and remove build tools in final images.
Running as root: Switch to a non-root user to reduce risk.
Forgetting .dockerignore: Large contexts slow builds and leak secrets. Add virtualenvs, data, and build artifacts to .dockerignore.
Mutable tags: Pin exact versions and consider image digests for reproducibility.
GPU not visible: Ensure host has correct NVIDIA drivers and container toolkit; run with --gpus all and verify with nvidia-smi or framework call.
Compose waits: Use healthchecks plus depends_on with condition to ensure readiness.

Quick debug commands

Logs: docker logs -f <container>
Shell: docker exec -it <container> sh (or bash)
Inspect: docker image history <image:tag>
Test network: docker exec -it <container> sh -c "apk add --no-cache curl || apt-get update && apt-get install -y curl; curl -v http://service:port/health"

Mini project — Production-ready ML inference microservice

Build a FastAPI service that loads a trained model, serves /predict, caches responses in Redis, and supports optional GPU inference.

Deliverables:
- Dockerfile (multi-stage, non-root, pinned deps).
- docker-compose.yml with api + redis + healthchecks.
- Makefile or scripts to build, run, test, and scan.
- Benchmark: cold start time and p95 latency under small load.
Stretch goals:
- Mount model via volume and hot-swap the file.
- Optional GPU variant and automatic CUDA detection.
- Prometheus-ready /metrics endpoint (simple counters).

Practical projects

Batch scoring job: container that pulls data, scores with a model, writes results to storage, then exits with proper codes.
Feature service: lightweight container exposing feature transformations as an HTTP endpoint, tested via docker compose.
Training pipeline step: containerized training that logs metrics and saves artifacts to a mounted volume.

Subskills

Writing Dockerfiles For ML — Build reliable images with proper bases, .dockerignore, healthchecks, and non-root users.
Managing Images And Tags — Tagging, pushing, pulling, and organizing images for environments (dev/stage/prod).
Dependency And Runtime Optimization — Multi-stage builds, slim bases, caching, and keeping final images minimal.
GPU Enabled Containers Basics — Choose CUDA-enabled bases and run with GPUs exposed to containers.
Docker Compose For Local Stacks — Define multi-service stacks with healthchecks, volumes, and networks.
Security Scanning Basics — Scan images, patch vulnerabilities, least privilege, and secrets hygiene.
Container Debugging — Use logs, exec shells, healthchecks, and inspect layers to diagnose issues.
Reproducible Builds — Pin versions, use lock files and image digests for deterministic builds.

Next steps

Automate builds in CI with cached layers and image signing.
Add load testing to validate performance envelopes and resource needs.
Move to orchestration (Kubernetes) once your container is production-hardened.

Skill exam

The skill exam is available to everyone. Only logged-in users get saved progress and certificates. When ready, start the exam below.

Menu

Containerization Docker

Table of Contents

Why Docker matters for Machine Learning Engineers

Who this is for

Prerequisites

Learning path (roadmap)

Worked examples

Example 1 — Minimal CPU inference service (FastAPI + scikit-learn)

Example 2 — Multi-stage build to shrink image size

Example 3 — GPU-enabled container (PyTorch)

Example 4 — Local stack with Docker Compose (API + Redis)

Example 5 — Security scan and run as non-root

Example 6 — Reproducible builds with pinned deps and image digests

Drills and exercises

Common mistakes and debugging tips

Mini project — Production-ready ML inference microservice

Practical projects

Subskills

Next steps

Skill exam

Containerization Docker — Skill Exam

Topics

Writing Dockerfiles For ML

Managing Images And Tags

Dependency And Runtime Optimization

GPU Enabled Containers Basics

Docker Compose For Local Stacks

Security Scanning Basics

Container Debugging

Reproducible Builds

Have questions about Containerization Docker?

AI Assistant