How to learn Minimizing Image Size for Containerization And Images in MLOps Engineer for free

Who this is for

MLOps engineers deploying model services and batch jobs.
Data scientists packaging experiments into reproducible containers.
Platform/SRE folks optimizing build times, registry costs, and cold starts.

Prerequisites

Basic Docker knowledge: FROM, RUN, COPY, CMD.
Familiarity with Python packaging (requirements.txt or pyproject) or Conda.
Comfort with Linux shell and package managers (apt, pip, micromamba).
Optional: GPU containers basics (NVIDIA CUDA runtime vs devel images).

Why this matters

Faster CI/CD: smaller images push/pull faster and build more reliably.
Lower costs: reduced registry storage and bandwidth.
Quicker rollouts/cold starts: scale-to-zero services start faster.
Better cache efficiency: lean layers mean fewer cache misses.
Smaller attack surface: fewer packages, fewer vulnerabilities.

Quick checklist for small images

Pick minimal base (slim/runtime/distroless when feasible).
Use multi-stage builds; keep build deps out of final image.
Pin dependencies, build wheels in builder, install with --no-cache-dir.
Combine apt-get update && install in one RUN, then clean apt lists.
Use .dockerignore to exclude .git, data, notebooks, caches.
Avoid COPY . when you only need a subset; copy precise paths.
Prefer CPU-only libs when GPU isn’t needed.
Remove build artifacts, caches, and test data before final stage.

Concept explained simply

A container image is a stack of layers. Size mostly comes from the base OS, language runtime, and dependencies you install. Your code is usually tiny compared to those. To reduce size, cut out what you don’t need and avoid baking temporary stuff into layers.

Mental model

Think of your image as a travel backpack:

Base image = the backpack itself. Choose a smaller backpack (slim/distroless).
Build tools = tools you only need before the trip. Leave them at home (multi-stage).
Dependencies = items you’ll actually use. Pack only essentials and in compact form (wheels, micromamba).
Artifacts & junk = receipts you don’t need. Don’t put them in the backpack (.dockerignore, clean caches).

Deep dive: Alpine vs Slim vs Distroless

Alpine is tiny but uses musl libc; many scientific Python wheels target glibc. You may end up compiling heavy packages (bigger and slower builds). Use Alpine only when you’re sure dependencies have musl-compatible wheels.
Debian/Ubuntu slim images are a safe default for Python data/ML stacks.
Distroless images are minimal runtime-only; great for final stage if your app is fully self-contained. Debugging is harder.

Worked examples

Example 1: FastAPI service — naive vs optimized

Naive Dockerfile (big image)

FROM python:3.11
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]

Optimized multi-stage (much smaller)

# syntax=docker/dockerfile:1.6
FROM python:3.11-slim AS builder
WORKDIR /wheels
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install --upgrade pip wheel
COPY requirements.txt .
RUN pip wheel --no-cache-dir -r requirements.txt -w /wheels

FROM python:3.11-slim
ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1
WORKDIR /app
COPY --from=builder /wheels /wheels
RUN pip install --no-cache-dir --no-deps /wheels/*
COPY app app
USER 1000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]

Add a .dockerignore to prevent large contexts:

.git
__pycache__/
*.ipynb
tests/
data/
venv/
*.csv
*.parquet
.wheels

Example 2: CPU-only vs GPU image

If your service doesn’t need CUDA, install CPU builds to avoid multi-GB images.

Single Dockerfile with build arg to switch

# syntax=docker/dockerfile:1.6
ARG DEVICE=cpu

FROM python:3.11-slim AS base-cpu
FROM nvidia/cuda:12.2.0-runtime-ubuntu22.04 AS base-gpu

FROM base-${DEVICE} AS builder
SHELL ["/bin/bash", "-lc"]
WORKDIR /wheels
RUN if [[ "${DEVICE}" == "gpu" ]]; then apt-get update && apt-get install -y --no-install-recommends python3-pip && rm -rf /var/lib/apt/lists/*; fi
RUN python3 -m pip install --upgrade pip wheel || pip install --upgrade pip wheel
# Copy only requirement files for better cache
COPY requirements.txt .
# Create wheels (CPU or GPU specific below)
RUN if [[ "${DEVICE}" == "gpu" ]]; then \
      pip wheel --no-cache-dir -r requirements.txt -w /wheels --extra-index-url https://download.pytorch.org/whl/cu121; \
    else \
      pip wheel --no-cache-dir -r requirements.txt -w /wheels; \
    fi

FROM base-${DEVICE}
WORKDIR /app
COPY --from=builder /wheels /wheels
RUN python3 -m pip install --no-cache-dir --no-deps /wheels/* || pip install --no-cache-dir --no-deps /wheels/*
COPY app app
CMD ["python3", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]

Build CPU: docker build -t svc:cpu --build-arg DEVICE=cpu .
Build GPU: docker build -t svc:gpu --build-arg DEVICE=gpu .

CPU images are typically hundreds of MB; GPU runtime images are several GB. Choose only what you need.

Example 3: Slimming Conda with micromamba

Micromamba-based image

# environment.yml
name: svc
channels: [conda-forge]
dependencies:
  - python=3.11
  - fastapi
  - uvicorn
  - pip
  - pip:
      - pydantic

FROM mambaorg/micromamba:1.5.5
COPY --chown=$MAMBA_USER:$MAMBA_USER environment.yml /tmp/env.yml
RUN micromamba install -y -n base -f /tmp/env.yml && micromamba clean --all --yes
WORKDIR /app
COPY --chown=$MAMBA_USER:$MAMBA_USER app app
CMD ["micromamba", "run", "-n", "base", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]

Micromamba images are minimal and include a fast solver. The clean step trims caches.

Common mistakes and self-check

Installing build tools in the final image. Fix: use multi-stage; keep compilers and headers in the builder stage.
Running apt-get update and apt-get install in separate layers. Fix: combine them and clean apt lists in the same RUN.
Copying the whole repo with COPY . when only a few files are needed. Fix: copy specific paths and use .dockerignore.
Leaving pip and conda caches. Fix: pip --no-cache-dir; micromamba/conda clean --all.
Using Alpine with scientific Python without prebuilt musl wheels. Fix: prefer slim Debian/Ubuntu for data/ML stacks.

How to self-check sizes

docker images | grep your-tag to see final size.
docker history your-image:tag to see which layer is huge.
docker build --no-cache . to validate improvements are real (not cached).

Exercises

Do these hands-on tasks, then compare with the solutions. Aim for small, repeatable builds.

Exercise 1: Shrink a FastAPI image below 200 MB using a multi-stage Dockerfile, wheelhouse, and .dockerignore. Measure size before/after.
Exercise 2: Create CPU and GPU variants of a model server image with a build ARG. Verify the CPU image is far smaller.

Checklist while working:
- Minimal base chosen.
- Build deps removed from final image.
- Caches cleaned; no temporary files copied.
- .dockerignore excludes large/unneeded files.

Practical projects

Refactor an existing ML service Dockerfile to a multi-stage build and record pull time difference in CI.
Package a batch inference job with micromamba and compare image sizes against pip-only and full Conda.
Create a parametric Dockerfile (ARG) switching between CPU and GPU stacks; publish both tags to a registry.

Learning path

Now: Minimizing Image Size.
Next: Layer caching strategies, reproducible builds (lock files), and build attestation.
Later: Vulnerability scanning, distroless + SBOM, and content trust.

Next steps

Apply the checklist to one of your current services.
Automate image size checks in CI (e.g., fail builds if size increases beyond a threshold).
Create a team template Dockerfile with multi-stage and cleaning patterns.

Mini challenge

Take a small web or model service and get the image under 150 MB without losing functionality. Tip: multi-stage + slim base + precise COPY.

Quick Test

This quick test is available to everyone. If you log in, your progress is saved.

Menu

Minimizing Image Size

Table of Contents

Who this is for

Prerequisites

Why this matters

Concept explained simply

Mental model

Worked examples

Example 1: FastAPI service — naive vs optimized

Example 2: CPU-only vs GPU image

Example 3: Slimming Conda with micromamba

Common mistakes and self-check

Exercises

Practical projects

Learning path

Next steps

Mini challenge

Quick Test

Practice Exercises

Shrink a FastAPI image under 200 MB

Instructions

Expected Output

CPU vs GPU image switch

Minimizing Image Size — Quick Test

Have questions about Minimizing Image Size?

AI Assistant