Why this matters
MLOps Engineers frequently containerize two critical workloads: model training (often GPU-accelerated, data-heavy) and model serving (fast, secure, and lightweight). Well-structured Dockerfiles make builds faster, images smaller, deployments repeatable, and incidents easier to debug. In real projects you will:
- Package training jobs with pinned dependencies and reproducible environments.
- Ship serving images that start fast, are secure (non-root), and expose the right ports.
- Use caching and multi-stage builds to keep images small and CI builds fast.
- Handle GPUs, model artifacts, and configuration cleanly across environments.
Who this is for
Engineers and practitioners who need reliable containers for ML training and inference—especially those integrating with CI/CD, orchestration, and registries.
Prerequisites
- Basic Docker commands (build, run, push, tag).
- Comfortable with Python project structure.
- Familiarity with training scripts and simple web servers (FastAPI/Uvicorn or Flask/Gunicorn).
Concept explained simply
A Dockerfile is a recipe for your runtime environment. For training, it should reproduce your experiment reliably and efficiently. For serving, it should boot a web service with your model as quickly and securely as possible. The main difference: training images emphasize toolchains and data access; serving images emphasize minimal size, startup speed, and security.
Mental model
Think of layers like a stack of cached steps. The lower layers change rarely (base image, system packages), while the top layers change often (your code). Order instructions so that the least changing layers come first and the most changing ones come last. This maximizes cache hits and speeds up builds.
Key components you will use
- FROM: choose a slim base; for GPU use an NVIDIA CUDA base.
- WORKDIR: set a stable working directory.
- COPY/ADD: copy only what you need; use .dockerignore.
- RUN: install system deps and Python packages; clean caches.
- ENV/ARG: pass configuration and build-time values.
- USER: run as non-root for security.
- EXPOSE: document service port (serving).
- CMD vs ENTRYPOINT: use CMD for defaults users can override; ENTRYPOINT for required commands.
- Multi-stage builds: build in a heavy stage, copy into a slim final stage.
Worked examples
Example 1 — Training image (CPU) with caching
# Dockerfile.train
FROM python:3.11-slim as base
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
# System deps first (rarely change)
RUN apt-get update \
&& apt-get install -y --no-install-recommends build-essential \
&& rm -rf /var/lib/apt/lists/*
# Requirements next for better caching
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Then copy source (changes frequently)
COPY . .
# Create non-root user
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser
# Default command: one-epoch demo training
CMD ["python", "train.py", "--epochs", "1", "--output", "/outputs/model.bin"]
Notes: order improves cache; non-root user boosts security; outputs are written to a mounted volume like /outputs.
Example 2 — Serving image (FastAPI + Uvicorn/Gunicorn)
# Dockerfile.serve
FROM python:3.11-slim
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PORT=8080 \
MODEL_DIR=/models
RUN apt-get update \
&& apt-get install -y --no-install-recommends build-essential \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements-serve.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
# Copy only serving code, not training extras
COPY api/ ./api/
# Add user
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser
EXPOSE 8080
# Use gunicorn with uvicorn workers for production
CMD ["gunicorn", "api.main:app", "-k", "uvicorn.workers.UvicornWorker", "-b", "0.0.0.0:8080", "--workers", "2", "--timeout", "60"]
Notes: Model files are provided at runtime via a mounted volume to MODEL_DIR. Start command is production-ready.
Example 3 — Multi-stage build to keep serving image small
# Stage 1: build wheels for native deps
FROM python:3.11 as build
WORKDIR /wheels
COPY requirements-serve.txt .
RUN pip wheel --wheel-dir=/wheels -r requirements-serve.txt
# Stage 2: minimal runtime
FROM python:3.11-slim
ENV PYTHONUNBUFFERED=1 PORT=8080 MODEL_DIR=/models
WORKDIR /app
# Install from prebuilt wheels
COPY --from=build /wheels /wheels
RUN pip install --no-cache-dir /wheels/* && rm -rf /wheels
COPY api/ ./api/
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser
EXPOSE 8080
CMD ["gunicorn", "api.main:app", "-k", "uvicorn.workers.UvicornWorker", "-b", "0.0.0.0:8080"]
Notes: Heavy builds happen in the first stage; the final image is slim and fast to pull.
GPU notes
For GPU training or serving, base your image on an NVIDIA CUDA runtime (for example, nvidia/cuda:12.1.0-runtime-ubuntu22.04) that matches your CUDA/cuDNN requirements. At runtime, enable the GPU device using your container runtime's GPU support. Keep CUDA/cuDNN versions aligned with your ML framework to avoid runtime errors.
Security and size optimizations
- Use slim bases and remove build tools if not needed at runtime.
- Combine apt-get commands and clean apt lists to reduce layers.
- Pin package versions for reproducibility.
- Run as non-root (USER) and limit filesystem permissions; avoid writing to the app directory.
- Use .dockerignore to exclude .git, data, models, and local caches.
- Keep model weights out of the image; mount at runtime or pull on start.
Exercises
Hands-on tasks that mirror real MLOps work. Build locally and run the container to see the expected output.
Exercise 1 — Training Dockerfile (CPU)
Goal: Build a training image that runs a simple script and writes a model file to /outputs.
Instructions
- Create files: requirements.txt (can be empty), train.py (simple script below), and Dockerfile.train.
- Train script (save as train.py):
import os, time, argparse
parser = argparse.ArgumentParser()
parser.add_argument('--epochs', type=int, default=1)
parser.add_argument('--output', type=str, default='/outputs/model.bin')
args = parser.parse_args()
print(f"Starting training for {args.epochs} epoch(s)...")
for e in range(args.epochs):
time.sleep(1)
print(f"Epoch {e+1} done")
os.makedirs(os.path.dirname(args.output), exist_ok=True)
with open(args.output, 'wb') as f:
f.write(b'FAKE_MODEL')
print('Training complete; saved model to', args.output)
- Write Dockerfile.train using python:3.11-slim, non-root user, and default CMD to run train.py.
- Build: docker build -f Dockerfile.train -t ds-train:latest .
- Run: docker run --rm -v $(pwd)/outputs:/outputs ds-train:latest
Expected output: Training complete; saved model to /outputs/model.bin
Exercise 2 — Serving Dockerfile (FastAPI)
Goal: Build a serving image that exposes a /health endpoint and reads model path from MODEL_DIR.
Instructions
- Create structure: api/main.py, requirements-serve.txt, Dockerfile.serve.
- requirements-serve.txt contents:
fastapi==0.110.0
uvicorn==0.25.0
gunicorn==21.2.0
- api/main.py contents:
import os
from fastapi import FastAPI
app = FastAPI()
MODEL_DIR = os.getenv('MODEL_DIR', '/models')
@app.get('/health')
def health():
present = os.path.isdir(MODEL_DIR)
return {'status':'ok','model_dir':MODEL_DIR,'present':present}
- Write Dockerfile.serve (similar to Example 2) with EXPOSE 8080 and non-root user.
- Build: docker build -f Dockerfile.serve -t ds-serve:latest .
- Run: docker run --rm -p 8080:8080 -e MODEL_DIR=/models -v $(pwd)/models:/models ds-serve:latest
- Test (from host): curl http://localhost:8080/health
Expected output: {"status":"ok","model_dir":"/models","present":true}
Checklist before you build
- [ ] .dockerignore excludes .git, data/, outputs/, models/, __pycache__/
- [ ] Use python:3.11-slim (or similar) and clean package caches
- [ ] Non-root USER is set
- [ ] Requirements are copied and installed before app code for caching
- [ ] Training writes artifacts to a mounted volume, not the image
- [ ] Serving reads MODEL_DIR from env and exposes correct port
Common mistakes and self-check
Mistake: baking large datasets or model weights into the image
Impact: Huge images and long pulls. Fix: Mount datasets/models at runtime or download on startup.
Mistake: placing COPY . . before installing requirements
Impact: Cache invalidation on every code change. Fix: COPY requirements first, install, then copy the rest.
Mistake: running as root
Impact: Security risk. Fix: Create a user and switch with USER before CMD.
Mistake: missing .dockerignore
Impact: Slow builds and leaked secrets. Fix: Add .dockerignore with common patterns.
Mistake: using CMD for mandatory startup behavior
Impact: Easy to override accidentally. Fix: Use ENTRYPOINT for required commands; CMD for defaults.
Self-check: Can you rebuild quickly after a code-only change? Are images under a few hundred MB for serving? Does your container start as non-root and still work?
Practical projects
- Create a training image that logs metrics to stdout and writes a model to /outputs; wire it into a minimal CI build.
- Build a serving image that loads the latest model from a mounted volume and provides /predict and /health endpoints.
- Refactor both into multi-stage builds and measure image size reduction and build time improvements.
Learning path
- Docker basics: images, containers, volumes, networks
- Writing efficient Dockerfiles: caching, .dockerignore, non-root
- Training images: reproducibility, artifact outputs, GPU variants
- Serving images: lightweight bases, ports, start commands
- Multi-stage builds and dependency wheels
- Compose/Kubernetes runtime configs (env vars, secrets, volumes)
- Image registries and CI/CD integration
Next steps
- Add health and readiness endpoints to your serving app.
- Introduce GPU support for your training image if needed.
- Pin all dependency versions and record them at build time for reproducibility.
Mini challenge
Create a repo with two Dockerfiles (train and serve) and a docker-compose.yml that:
- Runs training to write a model into a shared volume.
- Starts the serving service mounting the same volume.
- Lets you curl /health and see present=true once the model exists.
Ready to check yourself?
Take the Quick Test below. It’s available to everyone; only logged-in users get saved progress.