luvv to helpDiscover the Best Free Online Tools
Topic 3 of 8

Dependency And Runtime Optimization

Learn Dependency And Runtime Optimization for free with explanations, exercises, and a quick test (for Machine Learning Engineer).

Published: January 1, 2026 | Updated: January 1, 2026

Why this matters

As a Machine Learning Engineer, your models must ship reliably and run fast. Poor dependency management and bloated runtimes lead to slow builds, large images, cold-start delays, security risks, and inconsistent behavior across environments. Optimizing dependencies and runtimes means faster deployments, smaller images, reproducible builds, and lower costs.

  • Real task: Build a CPU-only inference image under 800MB that starts in under 1s.
  • Real task: Produce a GPU image compatible with a specific CUDA and driver version.
  • Real task: Ensure reproducible builds from a clean CI machine.

Concept explained simply

Dependency and runtime optimization is choosing only what you need (base image, OS packages, Python/conda packages, model runtimes) and arranging Docker layers so builds are fast, reproducible, and minimal. For GPU, it also means matching CUDA/CuDNN precisely to your framework wheels.

Mental model

  • Start lean: pick the smallest base that can run your code.
  • Build then trim: compile in a builder stage, copy only artifacts to a slim runtime stage.
  • Freeze the recipe: pin versions and hashes so results don’t change unexpectedly.
  • Cache wisely: separate infrequent from frequent changes to reuse layers.
  • Match the GPU stack: CUDA runtime version must match your framework build.

Core principles

  • Minimal base images: prefer slim or distroless where feasible; avoid full OS images unless required.
  • Multi-stage builds: compile native deps (e.g., numpy, opencv) in a builder; copy wheels/binaries into a clean runtime.
  • Layer caching: copy dependency files (requirements.txt, lock files) before app code; install deps in a separate step.
  • Pin versions and hashes: use exact versions; when possible, include hash checks for deterministic installs.
  • OS package hygiene: combine apt-get update with install in one RUN; remove apt lists and build tools after use.
  • .dockerignore: exclude data, venvs, build artifacts, and caches to keep context small.
  • Non-root runtime: run as a non-root user for security and least privilege.
  • Runtime choice: CPU vs GPU, and for GPU choose matching CUDA runtime (not devel) unless you compile at runtime.
  • Environment parity: align Python version, libc, and glibc with your target environment.
  • Deterministic builds: use lock files (requirements.txt with pins, poetry.lock, conda-lock) and consistent indexes.

Worked examples

Example 1 — Shrink a CPU FastAPI inference image

Naive Dockerfile:

FROM python:3.11
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt
EXPOSE 8080
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]

Issues: large base, no caching for deps, copies junk, runs as root.

Improved:

# Builder
FROM python:3.11-slim AS builder
ENV PIP_NO_CACHE_DIR=0 \ 
    PYTHONDONTWRITEBYTECODE=1 \ 
    PYTHONUNBUFFERED=1
RUN apt-get update && apt-get install -y --no-install-recommends build-essential gcc && rm -rf /var/lib/apt/lists/*
WORKDIR /wheels
COPY requirements.txt ./
RUN pip wheel --wheel-dir=/wheels -r requirements.txt

# Runtime
FROM python:3.11-slim AS runtime
ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1
WORKDIR /app
COPY --from=builder /wheels /wheels
RUN pip install --no-cache-dir --no-compile /wheels/*
COPY . /app
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser
EXPOSE 8080
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]

Typical result: significantly smaller image and faster rebuilds (varies by project).

Example 2 — Clean up OS packages and context
RUN --mount=type=cache,target=/var/cache/apt \
    apt-get update && apt-get install -y --no-install-recommends \
    libgomp1 \
    && rm -rf /var/lib/apt/lists/*

Plus ensure .dockerignore includes:

__pycache__/
*.pyc
.env
.venv/
.data/
models/
.git/
.dist/
node_modules/

Effect: smaller context, fewer invalidated layers, smaller final image.

Example 3 — GPU runtime with CUDA
ARG CUDA_VERSION=12.2.0
FROM nvidia/cuda:${CUDA_VERSION}-runtime-ubuntu22.04 as runtime
ENV NVIDIA_VISIBLE_DEVICES=all \ 
    NVIDIA_DRIVER_CAPABILITIES=compute,utility \ 
    PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1
RUN apt-get update && apt-get install -y --no-install-recommends python3 python3-pip && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements-gpu.txt ./
# Use the correct torch/TF build that matches CUDA_VERSION
# Example (adjust versions to match your CUDA):
# RUN pip install --no-cache-dir --extra-index-url https://download.pytorch.org/whl/cu121 torch==<ver>+cu121 torchvision==<ver>+cu121
RUN pip install --no-cache-dir -r requirements-gpu.txt
COPY . /app
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser
CMD ["python3", "serve.py"]

Notes: Use CUDA runtime (not devel) for inference. Ensure framework wheel matches CUDA version. Host must have a compatible NVIDIA driver.

Example 4 — Deterministic installs with lock + hashes

Use a locked requirements file:

# requirements.txt (pinned)
fastapi==0.111.0 --hash=sha256:<hash1>
uvicorn==0.30.0 --hash=sha256:<hash2>

Then:

RUN pip install --require-hashes -r requirements.txt

Benefit: exact, reproducible installs across machines.

Exercises

Complete these tasks locally. A simple CPU machine is enough unless noted. Mirror answers in the Exercises panel below.

Exercise ex1 (CPU inference): Optimize the provided naive Dockerfile for a FastAPI model server. Goals: image <= 800MB (CPU-only), non-root user, multi-stage build, pinned deps.
Exercise ex2 (GPU inference): Create a CUDA runtime image for PyTorch/TensorFlow inference. Goals: use a runtime CUDA base, framework build matching CUDA, OS cleanup, non-root user.
Starter files and tips

Starter Dockerfile (ex1):

FROM python:3.11
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt
EXPOSE 8080
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]

Checklist you should hit:

  • Use slim/minimal base images.
  • Use multi-stage to build wheels, copy only what’s needed.
  • Pin dependency versions (and hashes if possible).
  • Combine apt-get update/install, remove lists after install.
  • Use .dockerignore to exclude junk.
  • Run as a non-root user.
  • Separate dependency install from source copy to maximize cache.

Common mistakes and self-check

  • Using a full OS image when slim/distroless suffices. Self-check: List native tools actually needed at runtime.
  • Installing build tools in the final image. Self-check: Are gcc/build-essential in your final stage? If yes, move to builder.
  • Not pinning versions. Self-check: requirements show exact versions? If not, pin them.
  • Invalidating cache by copying source before dependencies. Self-check: Does Dockerfile copy requirements before app code?
  • CUDA mismatch. Self-check: Does your framework wheel match the CUDA runtime tag?
  • Leaving apt cache and lists. Self-check: rm -rf /var/lib/apt/lists/* at the end of apt RUN?
  • Running as root. Self-check: USER set to a non-root account?

Mini challenge

Pick any of your current images. In 30 minutes, apply: slim base, multi-stage build, pinned requirements, and non-root user. Measure image size and cold start before/after. Write down three changes that delivered the biggest wins.

Who this is for

  • Machine Learning Engineers deploying inference/training services.
  • Data/Platform Engineers managing ML microservices and batch jobs.
  • MLOps engineers maintaining GPU fleets.

Prerequisites

  • Basic Docker knowledge (images, layers, Dockerfile, build, run).
  • Working Python project (FastAPI/Flask/CLI) to containerize.
  • For GPU: access to NVIDIA driver and nvidia-container-runtime.

Learning path

  1. Start with minimal base images and .dockerignore.
  2. Add multi-stage builds for native deps.
  3. Pin versions and enable deterministic installs.
  4. Optimize OS layers and remove build-time packages.
  5. Handle GPU runtimes with correct CUDA/framework pairing.
  6. Adopt non-root users, healthcheck, and sensible entrypoints.

Practical projects

  • CPU inference service: FastAPI with a small sklearn model; image target <= 500–800MB.
  • GPU inference service: PyTorch ResNet; ensure CUDA match and measure throughput.
  • Batch job image: nightly feature computation; validate deterministic installs by rebuilding on a clean runner.

Next steps

  • Automate image scans and size checks in CI.
  • Create a base image you own (internal standard) and inherit from it.
  • Document your dependency policy (pinning, hashes, approved indexes).

Quick Test

Everyone can take the test below for free. Only logged-in users have their progress saved.

Practice Exercises

2 exercises to complete

Instructions

You are given a naive Dockerfile for a FastAPI ML inference server. Optimize it to:

  • Use a minimal base image and multi-stage build.
  • Install dependencies using cached layers (copy requirements before app code).
  • Pin dependency versions; add hashes if you can.
  • Remove build tools from the final image.
  • Run as a non-root user.
  • Keep the image under roughly 800MB (varies by project).
Naive Dockerfile
FROM python:3.11
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt
EXPOSE 8080
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]
Expected Output
A Dockerfile that builds successfully, installs pinned dependencies via a builder stage, runs as a non-root user, and produces a CPU-only image around or under 800MB. 'curl /healthz' should return 200 when the container is started with your app.

Dependency And Runtime Optimization — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Dependency And Runtime Optimization?

AI Assistant

Ask questions about this tool