Why this matters
As a Machine Learning Engineer, you ship training and inference code that must run the same way on laptops, CI, and cloud. A well-crafted Dockerfile gives you:
- Reproducible experiments and model training
- Fast builds via caching (lower cost, less waiting)
- Reliable GPU access and compatible CUDA/cuDNN stacks
- Smaller images for quicker deployments
- Safer images with non-root users and no secrets baked in
Concept explained simply
A Dockerfile is a recipe to create a machine image. Each instruction adds a new layer. When a layer hasn’t changed, Docker reuses it from cache. So if you structure your Dockerfile from the most stable parts (base OS, system packages, Python deps) to the most changing parts (your code), builds stay fast and predictable.
Mental model
- Base image = your starting kitchen
- RUN/apt + pip install = stocking the pantry
- COPY code = adding your unique recipe
- ENTRYPOINT/CMD = how to start cooking
- .dockerignore = don’t bring unnecessary clutter into the kitchen
Core building blocks for ML Dockerfiles
- Base images:
python:3.10-slim(CPU) or CUDA-enabled images (GPU). Avoid floatinglatesttags; pin versions. - System deps: install in one
RUNlayer; clean apt caches to keep images small. - Python deps: copy only lock/requirements files before installing to maximize caching.
- Non-root user: reduce risk; write to app dirs without root.
- ENV vs ARG:
ARGfor build-time,ENVfor runtime. - Multi-stage builds: build heavy stuff first, copy only what you need into a slim runtime.
- .dockerignore: exclude data, models,
__pycache__, and secrets. - GPU: use CUDA-capable base and run with
--gpus all(host must have NVIDIA drivers/toolkit).
Why caching matters (quick example)
Good:
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY src/ ./src
Bad (breaks cache on every code change):
COPY . .
RUN pip install -r requirements.txt
Choosing a base image
- CPU training/inference:
python:3.10-slimor similar. - GPU training/inference: CUDA runtime images (e.g.,
nvidia/cuda:<version>-runtime-ubuntu22.04) or framework vendor images. - Pin versions (e.g.,
3.10-slim, specific CUDA numbers) for reproducibility.
Worked examples
Example 1 — CPU training image (scikit-learn)
Goal: Reproducible training without baking data inside the image.
# Dockerfile
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1
RUN apt-get update \
&& apt-get install -y --no-install-recommends build-essential git \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# 1) Cache-friendly dependency install
COPY requirements.txt ./
RUN pip install --upgrade pip \
&& pip install -r requirements.txt
# 2) Copy only the code last (changes often)
COPY src/ ./src
COPY train.py .
# 3) Non-root for safety
RUN useradd -m -u 1000 appuser
USER appuser
CMD ["python", "train.py"]
requirements.txt
pandas==2.1.4
scikit-learn==1.3.2
joblib==1.3.2
train.py
from pathlib import Path
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import joblib
# Expect CSV via mounted volume: /data/train.csv
csv_path = Path("/data/train.csv")
if not csv_path.exists():
print("Missing /data/train.csv. Mount a data volume.")
raise SystemExit(1)
# Simple demo: first N-1 columns as features, last column as label
df = pd.read_csv(csv_path)
X = df.iloc[:, :-1]
y = df.iloc[:, -1]
model = LogisticRegression(max_iter=200)
model.fit(X, y)
acc = accuracy_score(y, model.predict(X))
print(f"Training complete. In-sample accuracy: {acc:.3f}")
Path("/app/models").mkdir(exist_ok=True)
joblib.dump(model, "/app/models/model.pkl")
print("Saved /app/models/model.pkl")
.dockerignore
__pycache__/
*.pyc
.env
.git
models/
data/
How to run
# Build
docker build -t ml-sklearn-train:cpu .
# Run (mount local data dir containing train.csv)
docker run --rm -v "$PWD/data":/data ml-sklearn-train:cpu
Example 2 — GPU training image (PyTorch)
Goal: Use CUDA-enabled base and verify GPU visibility.
# Dockerfile.gpu
FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04
ENV DEBIAN_FRONTEND=noninteractive \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1
RUN apt-get update \
&& apt-get install -y --no-install-recommends python3 python3-pip python3-venv \
&& rm -rf /var/lib/apt/lists/*
RUN python3 -m venv /opt/venv \
&& /opt/venv/bin/pip install --upgrade pip \
&& /opt/venv/bin/pip install torch==2.2.0+cu121 torchvision==0.17.0+cu121 --index-url https://download.pytorch.org/whl/cu121
ENV PATH="/opt/venv/bin:$PATH"
WORKDIR /app
COPY gpu_check.py .
RUN useradd -m -u 1000 appuser
USER appuser
CMD ["python", "gpu_check.py"]
gpu_check.py
import torch
print("CUDA available:", torch.cuda.is_available())
print("Device count:", torch.cuda.device_count())
if torch.cuda.is_available():
print("Device name:", torch.cuda.get_device_name(0))
How to run (host must have NVIDIA drivers and container toolkit)
docker build -f Dockerfile.gpu -t ml-pytorch-train:gpu .
docker run --rm --gpus all ml-pytorch-train:gpu
Example 3 — Multi-stage: slim inference image (FastAPI)
Goal: Build dependencies in one stage, copy minimal runtime with a non-root user.
# Dockerfile.infer
# 1) Builder stage
FROM python:3.10-slim AS builder
ENV PIP_NO_CACHE_DIR=1
RUN apt-get update \
&& apt-get install -y --no-install-recommends build-essential \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /build
COPY requirements.txt .
RUN python -m venv /opt/venv \
&& /opt/venv/bin/pip install --upgrade pip \
&& /opt/venv/bin/pip install -r requirements.txt
# 2) Runtime stage
FROM python:3.10-slim AS runtime
ENV PATH="/opt/venv/bin:$PATH" \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
WORKDIR /app
# Copy only the virtualenv and app sources
COPY --from=builder /opt/venv /opt/venv
COPY app ./app
# Security: run as non-root
RUN useradd -m -u 1000 appuser
USER appuser
EXPOSE 8000
CMD ["uvicorn", "app.main:api", "--host", "0.0.0.0", "--port", "8000"]
requirements.txt
fastapi==0.109.0
uvicorn[standard]==0.25.0
scikit-learn==1.3.2
joblib==1.3.2
app/main.py
from fastapi import FastAPI
api = FastAPI()
@api.get("/")
def root():
return {"status": "ok"}
How to run
docker build -f Dockerfile.infer -t ml-fastapi-infer:cpu .
docker run --rm -p 8000:8000 ml-fastapi-infer:cpu
# Visit http://localhost:8000 (returns {"status": "ok"})
Pre-build checklist
- Pin base image and key package versions
- Place dependency install before copying the full source
- Use a .dockerignore to exclude data, models, and secrets
- Install system packages in one RUN and clean apt lists
- Create a non-root user and set WORKDIR
- Choose CMD/ENTRYPOINT clearly; expose ports only if needed
- For GPU: choose a CUDA-matching base and test with a minimal script
Exercises
These mirror the practice section. You can complete them locally. The quick test is available for free; if you log in, your progress is saved.
Exercise 1 — CPU training image (cache-friendly)
- Create files:
Dockerfile,requirements.txt,train.py,.dockerignoreusing Example 1 as a guide. - Build the image:
docker build -t ex1-sklearn:cpu . - Prepare
./data/train.csvwith simple numeric columns and a binary label. - Run:
docker run --rm -v "$PWD/data":/data ex1-sklearn:cpu
- Self-check: Edit only
train.pyand rebuild; the dependency layers should be cached (much faster).
Exercise 2 — Multi-stage inference (non-root, slim)
- Create
Dockerfile.infer,requirements.txt, andapp/main.pyas in Example 3. - Build:
docker build -f Dockerfile.infer -t ex2-infer:cpu . - Run:
docker run --rm -p 8000:8000 ex2-infer:cpu - Open
http://localhost:8000and confirm JSON response.
- Self-check: Compare image size with and without multi-stage (single-stage typically larger).
Common mistakes and self-check
- Copying full source before installing deps, breaking cache. Fix: copy only requirements first.
- Using
latesttags. Fix: pin versions. - Leaving apt caches. Fix:
&& rm -rf /var/lib/apt/lists/* - Running as root. Fix: create and switch to non-root user.
- Baking large datasets or models into images. Fix: mount volumes or download at runtime.
- Leaking secrets via
ENVor COPY. Fix: use runtime env vars or secret managers; add to .dockerignore. - Mismatched CUDA/toolkit vs framework wheels. Fix: match CUDA versions and test with a small script.
Practical projects
- Reproducible training: Create a CPU image for a tabular model with dependency caching and non-root user.
- GPU training: Build a PyTorch CUDA image; run a mini training loop; log GPU name at startup.
- Slim inference: Ship a FastAPI model service using multi-stage build and measure image size difference.
Who this is for
- ML Engineers and Data Scientists who need portable training and inference environments
- MLOps practitioners standardizing project containers
Prerequisites
- Basic Python packaging and virtual environments
- Command line and Docker basics (build, run, volumes)
- Optional: NVIDIA GPU on host for GPU exercise
Learning path
- Before: Docker fundamentals, Linux basics
- Now: Writing Dockerfiles for ML (this lesson)
- Next: Docker Compose for multi-service ML stacks; CI build caching; model serving patterns
Next steps
- Automate builds in CI with pinned images and vulnerability scans
- Create team templates for CPU/GPU training and inference
- Adopt multi-stage builds and non-root defaults across repos
Mini challenge
Take your current ML repo and ship two images:
- Training image (CPU) with cached deps and a
train.pyentrypoint - Inference image (slim) exposing port 8000 with a health endpoint
Hint
Start from Example 1 and Example 3. Verify cache by timing rebuilds after changing only code.
Ready? Take the quick test below for free. If you log in, your test progress will be saved.