Why this matters
As a Machine Learning Engineer, you often train models, fine-tune large checkpoints, and serve low-latency inference. GPUs dramatically speed these tasks. GPU-enabled containers ensure your code runs consistently across machines while accessing the GPU for acceleration.
- Reproducible training: same CUDA, drivers, and libs in every run.
- Fast inference: leverage GPU in Dockerized microservices.
- Team collaboration: share images that "just work" with GPUs.
- Efficient CI/experiments: run identical GPU jobs on different hosts.
Concept explained simply
A GPU-enabled container is a regular Docker container that can talk to the host GPU through a thin compatibility layer. On Linux with NVIDIA GPUs, the NVIDIA Container Toolkit makes the host GPU and CUDA libraries visible inside the container. You then run your image with a flag that requests GPUs.
Mental model
- Host provides the physical GPU and driver.
- NVIDIA Container Toolkit connects the container to the host GPU.
- The container image includes CUDA runtimes and ML frameworks.
- Your app calls CUDA via frameworks (e.g., PyTorch, TensorFlow).
Who this is for
- Machine Learning Engineers and Data Scientists moving training/inference into Docker.
- Engineers deploying GPU-backed APIs or batch jobs.
- Anyone needing consistent, portable GPU environments.
Prerequisites
- Basic Docker usage: images, containers, Dockerfile.
- Linux command line familiarity.
- Access to a machine with an NVIDIA GPU and drivers installed.
- NVIDIA Container Toolkit installed on the host. (Exact steps vary by OS; follow your OS guidance.)
Setup checklist
- Host: run
nvidia-smion the host. It should list your GPU and driver. - Docker: verify you can run containers (e.g.,
docker run --rm hello-world). - NVIDIA Container Toolkit: running a CUDA image with
--gpusshould work.
Step-by-step: first GPU container
- Check GPU on host:
nvidia-smi
- Run a test CUDA container and query the GPU:
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
Expected: the NVIDIA-SMI table appears inside the container.
- Limit to a specific GPU (example: GPU 0 only):
docker run --rm --gpus device=0 nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
Notes on quoting and platforms
If your shell complains about the --gpus argument, try quoting the device selector: --gpus \"device=0\". Shell quoting differs across Linux/macOS/Windows terminals.
Worked examples
Example 1 — Inspect GPU from a CUDA base image
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
What you learn: your container can see the GPU and the driver stack provided by the host.
Example 2 — Minimal PyTorch GPU check
Create a Dockerfile that installs Python and PyTorch with CUDA support, then verify CUDA availability.
Show Dockerfile
FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04
RUN apt-get update \
&& apt-get install -y python3 python3-pip \
&& rm -rf /var/lib/apt/lists/*
# Install a PyTorch build with CUDA support (example version)
RUN pip3 install --no-cache-dir torch==2.1.0+cu121 torchvision==0.16.0+cu121 --index-url https://download.pytorch.org/whl/cu121
CMD ["python3", "-c", "import torch;print('CUDA available:', torch.cuda.is_available());\n\
print('Device count:', torch.cuda.device_count());\n\
print('Device name:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'N/A')"]
docker build -t torch-gpu-check . docker run --rm --gpus all torch-gpu-check
Expected output: CUDA available: True and your GPU name.
Example 3 — Mount data and pin GPU
Run an inference script with a mounted data directory and a specific GPU device:
docker run --rm \ --gpus device=0 \ -v $(pwd)/data:/app/data \ -w /app \ torch-gpu-check \ python3 -c "import torch;print(torch.cuda.get_device_name(0))"
What you learn: volume mounting and device selection work together for practical workflows.
Advanced: a simple Compose snippet
Compose configurations vary. This example requests a GPU and runs a quick check.
services:
app:
image: nvidia/cuda:12.2.0-runtime-ubuntu22.04
command: ["bash", "-lc", "nvidia-smi && python3 -c 'import torch,sys;print(getattr(torch.cuda,\"is_available\", lambda: False)())'"]
deploy:
resources:
reservations:
devices:
- capabilities: ["gpu"]
volumes:
- ./work:/work
Note: Some Compose features (e.g., deploy reservations) may require Docker Engine/Compose versions that support them or Swarm mode. For basics, docker run --gpus is sufficient.
Exercises you can try now
Tackle these hands-on tasks. Compare with solutions if you get stuck.
- Exercise 1 (mirrors ex1): Run a CUDA base image with GPU access and print the NVIDIA-SMI table from inside the container. Save a screenshot or copy the first line of the output.
- Exercise 2 (mirrors ex2): Build a minimal PyTorch GPU image and verify that
torch.cuda.is_available()returnsTrueinside the container.
Exercise checklist
- Host
nvidia-smishows your GPU. docker run --gpus allworks without errors.- Inside-container
nvidia-smimatches your host GPU model. - PyTorch reports CUDA available and detects at least one device.
Common mistakes and self-check
- Forgetting the
--gpusflag: container runs, but no GPU visible. Self-check: runnvidia-smiinside the container. - Host driver missing or inactive:
nvidia-smifails on host and in container. Self-check: fix host first, then retry the container. - Version mismatches: using an image without needed CUDA runtime for your framework. Self-check: print framework version and CUDA build, adjust base image or wheel as needed.
- Wrong quoting for
--gpus device=0: shell error or ignored flag. Self-check: try quoting the argument or simplify to--gpus allto confirm basics. - Assuming Windows/Mac behave like Linux: GPU passthrough differs. Self-check: test on a Linux host with supported NVIDIA drivers and Container Toolkit for reliable results.
Practical projects
- GPU inference microservice: containerize a small FastAPI server that loads a CUDA-enabled model and serves predictions.
- Reproducible training job: package a training script and dependencies; run with a mounted dataset and a fixed GPU device.
- Benchmark suite: write a script that times CPU vs GPU for a matrix multiply or a small model forward pass from within a container.
Learning path
- Start: GPU-enabled containers basics (this page).
- Next: Building optimized images (slim base images, layer caching).
- Then: Multi-GPU scheduling and limits (
--gpusoptions, process pinning). - Finally: Production deployment patterns (Compose, orchestration, healthchecks, logging).
Next steps
- Automate: write a Makefile with targets to build, run, and test GPU containers.
- Harden: add non-root user, pinned versions, and minimal runtimes.
- Document: include a short README with run commands and expected outputs.
Mini challenge
Create a single docker run command that:
- Uses GPU device 0 only,
- Mounts
./modelsto/app/models, - Runs a Python one-liner that prints the current CUDA device name.
Hint
Combine --gpus device=0, -v, and a small Python command that queries torch.cuda.get_device_name(0).
Quick Test
The quick test is available to everyone; only logged-in users get saved progress.