How to learn GPU Enabled Containers Basics for Containerization Docker in Machine Learning Engineer for free

Why this matters

As a Machine Learning Engineer, you often train models, fine-tune large checkpoints, and serve low-latency inference. GPUs dramatically speed these tasks. GPU-enabled containers ensure your code runs consistently across machines while accessing the GPU for acceleration.

Reproducible training: same CUDA, drivers, and libs in every run.
Fast inference: leverage GPU in Dockerized microservices.
Team collaboration: share images that "just work" with GPUs.
Efficient CI/experiments: run identical GPU jobs on different hosts.

Concept explained simply

A GPU-enabled container is a regular Docker container that can talk to the host GPU through a thin compatibility layer. On Linux with NVIDIA GPUs, the NVIDIA Container Toolkit makes the host GPU and CUDA libraries visible inside the container. You then run your image with a flag that requests GPUs.

Mental model

Host provides the physical GPU and driver.
NVIDIA Container Toolkit connects the container to the host GPU.
The container image includes CUDA runtimes and ML frameworks.
Your app calls CUDA via frameworks (e.g., PyTorch, TensorFlow).

Who this is for

Machine Learning Engineers and Data Scientists moving training/inference into Docker.
Engineers deploying GPU-backed APIs or batch jobs.
Anyone needing consistent, portable GPU environments.

Prerequisites

Basic Docker usage: images, containers, Dockerfile.
Linux command line familiarity.
Access to a machine with an NVIDIA GPU and drivers installed.
NVIDIA Container Toolkit installed on the host. (Exact steps vary by OS; follow your OS guidance.)

Setup checklist

Host: run nvidia-smi on the host. It should list your GPU and driver.
Docker: verify you can run containers (e.g., docker run --rm hello-world).
NVIDIA Container Toolkit: running a CUDA image with --gpus should work.

Step-by-step: first GPU container

Check GPU on host:
```
nvidia-smi
```
Run a test CUDA container and query the GPU:
```
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
```
Expected: the NVIDIA-SMI table appears inside the container.

Limit to a specific GPU (example: GPU 0 only):

docker run --rm --gpus device=0 nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

Notes on quoting and platforms

If your shell complains about the --gpus argument, try quoting the device selector: --gpus \"device=0\". Shell quoting differs across Linux/macOS/Windows terminals.

Worked examples

Example 1 — Inspect GPU from a CUDA base image

docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

What you learn: your container can see the GPU and the driver stack provided by the host.

Example 2 — Minimal PyTorch GPU check

Create a Dockerfile that installs Python and PyTorch with CUDA support, then verify CUDA availability.

Show Dockerfile

FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04

RUN apt-get update \
 && apt-get install -y python3 python3-pip \
 && rm -rf /var/lib/apt/lists/*

# Install a PyTorch build with CUDA support (example version)
RUN pip3 install --no-cache-dir torch==2.1.0+cu121 torchvision==0.16.0+cu121 --index-url https://download.pytorch.org/whl/cu121

CMD ["python3", "-c", "import torch;print('CUDA available:', torch.cuda.is_available());\n\
print('Device count:', torch.cuda.device_count());\n\
print('Device name:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'N/A')"]

docker build -t torch-gpu-check .
docker run --rm --gpus all torch-gpu-check

Expected output: CUDA available: True and your GPU name.

Example 3 — Mount data and pin GPU

Run an inference script with a mounted data directory and a specific GPU device:

docker run --rm \
  --gpus device=0 \
  -v $(pwd)/data:/app/data \
  -w /app \
  torch-gpu-check \
  python3 -c "import torch;print(torch.cuda.get_device_name(0))"

What you learn: volume mounting and device selection work together for practical workflows.

Advanced: a simple Compose snippet

Compose configurations vary. This example requests a GPU and runs a quick check.

services:
  app:
    image: nvidia/cuda:12.2.0-runtime-ubuntu22.04
    command: ["bash", "-lc", "nvidia-smi && python3 -c 'import torch,sys;print(getattr(torch.cuda,\"is_available\", lambda: False)())'"]
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: ["gpu"]
    volumes:
      - ./work:/work

Note: Some Compose features (e.g., deploy reservations) may require Docker Engine/Compose versions that support them or Swarm mode. For basics, docker run --gpus is sufficient.

Exercises you can try now

Tackle these hands-on tasks. Compare with solutions if you get stuck.

Exercise 1 (mirrors ex1): Run a CUDA base image with GPU access and print the NVIDIA-SMI table from inside the container. Save a screenshot or copy the first line of the output.
Exercise 2 (mirrors ex2): Build a minimal PyTorch GPU image and verify that torch.cuda.is_available() returns True inside the container.

Exercise checklist

Host nvidia-smi shows your GPU.
docker run --gpus all works without errors.
Inside-container nvidia-smi matches your host GPU model.
PyTorch reports CUDA available and detects at least one device.

Common mistakes and self-check

Forgetting the --gpus flag: container runs, but no GPU visible. Self-check: run nvidia-smi inside the container.
Host driver missing or inactive: nvidia-smi fails on host and in container. Self-check: fix host first, then retry the container.
Version mismatches: using an image without needed CUDA runtime for your framework. Self-check: print framework version and CUDA build, adjust base image or wheel as needed.
Wrong quoting for --gpus device=0: shell error or ignored flag. Self-check: try quoting the argument or simplify to --gpus all to confirm basics.
Assuming Windows/Mac behave like Linux: GPU passthrough differs. Self-check: test on a Linux host with supported NVIDIA drivers and Container Toolkit for reliable results.

Practical projects

GPU inference microservice: containerize a small FastAPI server that loads a CUDA-enabled model and serves predictions.
Reproducible training job: package a training script and dependencies; run with a mounted dataset and a fixed GPU device.
Benchmark suite: write a script that times CPU vs GPU for a matrix multiply or a small model forward pass from within a container.

Learning path

Start: GPU-enabled containers basics (this page).
Next: Building optimized images (slim base images, layer caching).
Then: Multi-GPU scheduling and limits (--gpus options, process pinning).
Finally: Production deployment patterns (Compose, orchestration, healthchecks, logging).

Next steps

Automate: write a Makefile with targets to build, run, and test GPU containers.
Harden: add non-root user, pinned versions, and minimal runtimes.
Document: include a short README with run commands and expected outputs.

Mini challenge

Create a single docker run command that:

Uses GPU device 0 only,
Mounts ./models to /app/models,
Runs a Python one-liner that prints the current CUDA device name.

Hint

Combine --gpus device=0, -v, and a small Python command that queries torch.cuda.get_device_name(0).

Quick Test

The quick test is available to everyone; only logged-in users get saved progress.

Menu

GPU Enabled Containers Basics

Table of Contents