Who this is for
- Junior to mid-level Machine Learning Engineers who need consistent results across laptops, servers, and CI.
- Data Scientists preparing models to hand off to engineering.
- MLOps/Platform engineers standardizing environments for teams.
Prerequisites
- Basic Python and command line usage.
- Familiarity with pip or conda.
- Optional: basic Docker knowledge.
Why this matters
Environment reproducibility is your guarantee that code behaves the same on every machine. In the ML lifecycle, it prevents "works on my machine" issues and makes debugging, collaboration, and deployment predictable.
- Real task: Train a model on a GPU server today, retrain next month with the same results.
- Real task: Share a project with pinned dependencies so teammates can run it without surprises.
- Real task: Rebuild a serving container identically to match the model you validated.
Concept explained simply
Think of your ML project like baking. The recipe is your code. The ingredients are your libraries and system packages. The oven is your OS/CPU/GPU. Reproducibility means you precisely list ingredients, their brands and versions, and control the oven settings so the cake always turns out the same.
Mental model
- Pin: exact versions for everything you can (packages, base images, CUDA versions, Python version).
- Isolate: avoid polluting global systems (use venv/conda/containers).
- Record: save lockfiles, hashes, and metadata used to run.
- Verify: recreate in a clean environment and compare outcomes.
Core components of a reproducible ML environment
- Version pinning: Use exact versions (e.g., pandas==2.2.0). Prefer lockfiles (poetry.lock, requirements.txt with exact pins, conda lock files).
- Environment isolation: Python venv, conda environments, or containers (Docker).
- System-level dependencies: Capture OS libs (e.g., libgomp, gcc) via containers or documented setup steps.
- Base images: Pin Docker base images by version tag or digest for stability.
- Randomness control: Set seeds across libraries (random, numpy, torch) and use deterministic backends where possible.
- Data versioning: Reference immutable data snapshots (e.g., by checksum or versioned path) so training inputs don’t change unexpectedly.
- Config management: Centralize configuration in files (YAML/TOML) and avoid hidden environment differences.
- Verification: Rebuild from scratch on a clean machine/CI and run sanity checks.
Sample seed setup (deterministic where possible)
import os, random, numpy as np
SEED = 42
random.seed(SEED)
os.environ["PYTHONHASHSEED"] = str(SEED)
np.random.seed(SEED)
try:
import torch
torch.manual_seed(SEED)
torch.cuda.manual_seed_all(SEED)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
except ImportError:
pass
Worked examples
Example 1: Reproducible Python venv with pinned requirements
- Create a clean environment:
python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate python -m pip install --upgrade pip - Pin versions in requirements.txt:
echo "numpy==1.26.4" > requirements.txt echo "pandas==2.2.0" >> requirements.txt echo "scikit-learn==1.4.0" >> requirements.txt - Install and freeze a lockfile (optional but helpful):
pip install -r requirements.txt pip freeze > requirements.lock - Recreate on another machine using the lockfile for exact transitive deps:
pip install --no-deps --require-virtualenv -r requirements.lock
Tip: requirements.txt pins your direct deps; requirements.lock pins every package including transitive deps.
Example 2: Reproducible conda environment
- Create environment.yml with exact versions:
name: mlproj channels: - conda-forge dependencies: - python=3.10.13 - numpy=1.26.4 - pandas=2.2.0 - scikit-learn=1.4.0 - Create the environment:
conda env create -f environment.yml conda activate mlproj - Export an exact spec for rebuilds:
conda list --explicit > conda-spec.txt # then later: conda create --name mlproj2 --file conda-spec.txt
Note: The explicit spec file locks exact build strings and channels, improving reproducibility.
Example 3: Reproducible Docker image for training
- Create a Dockerfile with a pinned base image and explicit versions:
FROM python:3.10-slim@sha256:REPLACE_WITH_DIGEST ENV PYTHONDONTWRITEBYTECODE=1 \ PYTHONUNBUFFERED=1 WORKDIR /app COPY requirements.txt /app/requirements.txt RUN pip install --no-cache-dir --upgrade pip \ && pip install --no-cache-dir -r requirements.txt COPY train.py /app/train.py CMD ["python", "train.py"] - requirements.txt (pinned):
numpy==1.26.4 pandas==2.2.0 scikit-learn==1.4.0 - Build and run:
docker build -t ml-train:1 . docker run --rm ml-train:1 python -c "import pandas, sklearn; print('OK')"
Pinning the base image by digest prevents upstream tag drift.
Reproducibility checklist
- [ ] Python version pinned (e.g., 3.10.13)
- [ ] Dependencies pinned exactly; lockfile saved
- [ ] Environment isolated (venv/conda/container)
- [ ] Random seeds set; deterministic flags used where possible
- [ ] Base image/version pinned (if using Docker)
- [ ] Data snapshot referenced immutably (path/checksum/version)
- [ ] Config file committed (YAML/TOML) instead of hidden env-only settings
- [ ] Rebuild verified on a clean machine/CI
How to verify quickly
- Create a fresh venv or new container.
- Install from lockfile or build from Dockerfile.
- Run a short script to print library versions and a hash of a small dataset/model artifact.
- Compare against the expected versions and hash.
Common mistakes and self-checks
- Mistake: Using floating dependency ranges (e.g., pandas>=2.0). Self-check: Is every package pinned with == in the file that others will use?
- Mistake: Relying only on requirements.txt without a lockfile. Self-check: Do you have a frozen list (pip freeze or explicit conda spec)?
- Mistake: Forgetting system libs. Self-check: Can you rebuild in a minimal container successfully?
- Mistake: Not setting seeds. Self-check: Do repeated runs produce equivalent metrics within expected noise?
- Mistake: Mixing global and project environments. Self-check: Does deactivating your venv break your run? It should, otherwise you’re leaking globals.
- Mistake: Tag-only Docker base images (e.g., latest). Self-check: Is your base image pinned by version or digest?
Exercises
Note: Anyone can do the exercises and take the quick test; only logged-in users get saved progress.
Exercise 1 (mirrors ex1): Pin and recreate a Python environment
- Create a venv and upgrade pip.
- Create requirements.txt with exact versions for numpy, pandas, scikit-learn (choose versions that are compatible).
- Install, then freeze to requirements.lock.
- Delete the venv, recreate it, and install from requirements.lock.
- Run a script printing imported package versions and confirm they match.
What to submit for yourself
- requirements.txt and requirements.lock
- Console output showing version prints matching the lockfile
Exercise 2 (mirrors ex2): Minimal reproducible Docker image
- Write a Dockerfile using a pinned Python slim image (specify a version tag; if you know the digest, pin it).
- Copy a pinned requirements.txt and install.
- Create app.py that prints library versions and a seeded random number.
- Build and run the container twice; verify identical outputs for versions and the random number.
What to submit for yourself
- Dockerfile and requirements.txt
- Two identical runs of the container output
Practical projects
- Project A: Reproducible training baseline. Create a small training script (e.g., Iris classification) with seeds, pinned env, and a script that prints a checksum of the trained model file. Verify the same checksum across two clean rebuilds.
- Project B: Data snapshot runner. Add a tiny dataset (or generate synthetic) and compute its SHA256 before training; fail the run if the checksum differs from a stored value.
- Project C: CI environment test. Write a script that rebuilds the env from lockfiles in a clean environment and runs a smoke test to confirm everything loads.
Learning path
- Environment isolation and pinning (this lesson)
- Data versioning and artifact tracking
- Experiment tracking and configuration management
- CI checks for reproducibility and drift detection
- Deployment with pinned containers and staged rollouts
Next steps
- Adopt a lockfile workflow in all new ML repos.
- Add a reproducibility check script that verifies versions, seeds, and data hashes.
- Integrate a CI job that rebuilds from scratch and runs a smoke test.
Mini challenge
Given an existing ML repo that only has requirements.txt with version ranges, make it reproducible. Deliver: pinned requirements, a lockfile, a seed setup, and a short README section describing how to rebuild and verify determinism.