How to learn Writing Dockerfiles For ML for Containerization Docker in Machine Learning Engineer for free

Why this matters

As a Machine Learning Engineer, you ship training and inference code that must run the same way on laptops, CI, and cloud. A well-crafted Dockerfile gives you:

Reproducible experiments and model training
Fast builds via caching (lower cost, less waiting)
Reliable GPU access and compatible CUDA/cuDNN stacks
Smaller images for quicker deployments
Safer images with non-root users and no secrets baked in

Concept explained simply

A Dockerfile is a recipe to create a machine image. Each instruction adds a new layer. When a layer hasn’t changed, Docker reuses it from cache. So if you structure your Dockerfile from the most stable parts (base OS, system packages, Python deps) to the most changing parts (your code), builds stay fast and predictable.

Mental model

Base image = your starting kitchen
RUN/apt + pip install = stocking the pantry
COPY code = adding your unique recipe
ENTRYPOINT/CMD = how to start cooking
.dockerignore = don’t bring unnecessary clutter into the kitchen

Core building blocks for ML Dockerfiles

Base images: python:3.10-slim (CPU) or CUDA-enabled images (GPU). Avoid floating latest tags; pin versions.
System deps: install in one RUN layer; clean apt caches to keep images small.
Python deps: copy only lock/requirements files before installing to maximize caching.
Non-root user: reduce risk; write to app dirs without root.
ENV vs ARG: ARG for build-time, ENV for runtime.
Multi-stage builds: build heavy stuff first, copy only what you need into a slim runtime.
.dockerignore: exclude data, models, __pycache__, and secrets.
GPU: use CUDA-capable base and run with --gpus all (host must have NVIDIA drivers/toolkit).

Why caching matters (quick example)

Good:

COPY requirements.txt .
RUN pip install -r requirements.txt
COPY src/ ./src

Bad (breaks cache on every code change):

COPY . .
RUN pip install -r requirements.txt

Choosing a base image

CPU training/inference: python:3.10-slim or similar.
GPU training/inference: CUDA runtime images (e.g., nvidia/cuda:<version>-runtime-ubuntu22.04) or framework vendor images.
Pin versions (e.g., 3.10-slim, specific CUDA numbers) for reproducibility.

Worked examples

Example 1 — CPU training image (scikit-learn)

Goal: Reproducible training without baking data inside the image.

# Dockerfile
FROM python:3.10-slim

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_NO_CACHE_DIR=1

RUN apt-get update \ 
 && apt-get install -y --no-install-recommends build-essential git \ 
 && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# 1) Cache-friendly dependency install
COPY requirements.txt ./
RUN pip install --upgrade pip \ 
 && pip install -r requirements.txt

# 2) Copy only the code last (changes often)
COPY src/ ./src
COPY train.py .

# 3) Non-root for safety
RUN useradd -m -u 1000 appuser
USER appuser

CMD ["python", "train.py"]

requirements.txt

pandas==2.1.4
scikit-learn==1.3.2
joblib==1.3.2

train.py

from pathlib import Path
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import joblib

# Expect CSV via mounted volume: /data/train.csv
csv_path = Path("/data/train.csv")
if not csv_path.exists():
    print("Missing /data/train.csv. Mount a data volume.")
    raise SystemExit(1)

# Simple demo: first N-1 columns as features, last column as label
df = pd.read_csv(csv_path)
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

model = LogisticRegression(max_iter=200)
model.fit(X, y)
acc = accuracy_score(y, model.predict(X))
print(f"Training complete. In-sample accuracy: {acc:.3f}")

Path("/app/models").mkdir(exist_ok=True)
joblib.dump(model, "/app/models/model.pkl")
print("Saved /app/models/model.pkl")

.dockerignore

__pycache__/
*.pyc
.env
.git
models/
data/

How to run

# Build
docker build -t ml-sklearn-train:cpu .

# Run (mount local data dir containing train.csv)
docker run --rm -v "$PWD/data":/data ml-sklearn-train:cpu

Example 2 — GPU training image (PyTorch)

Goal: Use CUDA-enabled base and verify GPU visibility.

# Dockerfile.gpu
FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04

ENV DEBIAN_FRONTEND=noninteractive \
    PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_NO_CACHE_DIR=1

RUN apt-get update \ 
 && apt-get install -y --no-install-recommends python3 python3-pip python3-venv \ 
 && rm -rf /var/lib/apt/lists/*

RUN python3 -m venv /opt/venv \
 && /opt/venv/bin/pip install --upgrade pip \
 && /opt/venv/bin/pip install torch==2.2.0+cu121 torchvision==0.17.0+cu121 --index-url https://download.pytorch.org/whl/cu121

ENV PATH="/opt/venv/bin:$PATH"

WORKDIR /app
COPY gpu_check.py .

RUN useradd -m -u 1000 appuser
USER appuser

CMD ["python", "gpu_check.py"]

gpu_check.py

import torch
print("CUDA available:", torch.cuda.is_available())
print("Device count:", torch.cuda.device_count())
if torch.cuda.is_available():
    print("Device name:", torch.cuda.get_device_name(0))

How to run (host must have NVIDIA drivers and container toolkit)

docker build -f Dockerfile.gpu -t ml-pytorch-train:gpu .
docker run --rm --gpus all ml-pytorch-train:gpu

Example 3 — Multi-stage: slim inference image (FastAPI)

Goal: Build dependencies in one stage, copy minimal runtime with a non-root user.

# Dockerfile.infer
# 1) Builder stage
FROM python:3.10-slim AS builder
ENV PIP_NO_CACHE_DIR=1
RUN apt-get update \ 
 && apt-get install -y --no-install-recommends build-essential \ 
 && rm -rf /var/lib/apt/lists/*
WORKDIR /build
COPY requirements.txt .
RUN python -m venv /opt/venv \
 && /opt/venv/bin/pip install --upgrade pip \
 && /opt/venv/bin/pip install -r requirements.txt

# 2) Runtime stage
FROM python:3.10-slim AS runtime
ENV PATH="/opt/venv/bin:$PATH" \
    PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1
WORKDIR /app
# Copy only the virtualenv and app sources
COPY --from=builder /opt/venv /opt/venv
COPY app ./app

# Security: run as non-root
RUN useradd -m -u 1000 appuser
USER appuser

EXPOSE 8000
CMD ["uvicorn", "app.main:api", "--host", "0.0.0.0", "--port", "8000"]

requirements.txt

fastapi==0.109.0
uvicorn[standard]==0.25.0
scikit-learn==1.3.2
joblib==1.3.2

app/main.py

from fastapi import FastAPI
api = FastAPI()

@api.get("/")
def root():
    return {"status": "ok"}

How to run

docker build -f Dockerfile.infer -t ml-fastapi-infer:cpu .
docker run --rm -p 8000:8000 ml-fastapi-infer:cpu
# Visit http://localhost:8000 (returns {"status": "ok"})

Pre-build checklist

Pin base image and key package versions
Place dependency install before copying the full source
Use a .dockerignore to exclude data, models, and secrets
Install system packages in one RUN and clean apt lists
Create a non-root user and set WORKDIR
Choose CMD/ENTRYPOINT clearly; expose ports only if needed
For GPU: choose a CUDA-matching base and test with a minimal script

Exercises

These mirror the practice section. You can complete them locally. The quick test is available for free; if you log in, your progress is saved.

Exercise 1 — CPU training image (cache-friendly)

Create files: Dockerfile, requirements.txt, train.py, .dockerignore using Example 1 as a guide.
Build the image: docker build -t ex1-sklearn:cpu .
Prepare ./data/train.csv with simple numeric columns and a binary label.
Run: docker run --rm -v "$PWD/data":/data ex1-sklearn:cpu

Self-check: Edit only train.py and rebuild; the dependency layers should be cached (much faster).

Exercise 2 — Multi-stage inference (non-root, slim)

Create Dockerfile.infer, requirements.txt, and app/main.py as in Example 3.
Build: docker build -f Dockerfile.infer -t ex2-infer:cpu .
Run: docker run --rm -p 8000:8000 ex2-infer:cpu
Open http://localhost:8000 and confirm JSON response.

Self-check: Compare image size with and without multi-stage (single-stage typically larger).

Common mistakes and self-check

Copying full source before installing deps, breaking cache. Fix: copy only requirements first.
Using latest tags. Fix: pin versions.
Leaving apt caches. Fix: && rm -rf /var/lib/apt/lists/*
Running as root. Fix: create and switch to non-root user.
Baking large datasets or models into images. Fix: mount volumes or download at runtime.
Leaking secrets via ENV or COPY. Fix: use runtime env vars or secret managers; add to .dockerignore.
Mismatched CUDA/toolkit vs framework wheels. Fix: match CUDA versions and test with a small script.

Practical projects

Reproducible training: Create a CPU image for a tabular model with dependency caching and non-root user.
GPU training: Build a PyTorch CUDA image; run a mini training loop; log GPU name at startup.
Slim inference: Ship a FastAPI model service using multi-stage build and measure image size difference.

Who this is for

ML Engineers and Data Scientists who need portable training and inference environments
MLOps practitioners standardizing project containers

Prerequisites

Basic Python packaging and virtual environments
Command line and Docker basics (build, run, volumes)
Optional: NVIDIA GPU on host for GPU exercise

Learning path

Before: Docker fundamentals, Linux basics
Now: Writing Dockerfiles for ML (this lesson)
Next: Docker Compose for multi-service ML stacks; CI build caching; model serving patterns

Next steps

Automate builds in CI with pinned images and vulnerability scans
Create team templates for CPU/GPU training and inference
Adopt multi-stage builds and non-root defaults across repos

Mini challenge

Take your current ML repo and ship two images:

Training image (CPU) with cached deps and a train.py entrypoint
Inference image (slim) exposing port 8000 with a health endpoint

Hint

Start from Example 1 and Example 3. Verify cache by timing rebuilds after changing only code.

Ready? Take the quick test below for free. If you log in, your test progress will be saved.

Menu

Writing Dockerfiles For ML

Table of Contents