Why this matters
As a Machine Learning Engineer, you regularly switch between projects that need different Python and library versions. Without isolation and version control you get conflicts, broken notebooks, and unreproducible results. Virtual environments and dependency management solve this by making your experiments and deployments consistent and repeatable.
- Real tasks: reproduce a teammate’s training run exactly
- Real tasks: update pandas safely without breaking feature engineering
- Real tasks: prepare a clean inference environment for deployment
What you’ll be able to do after this lesson
- Create and activate virtual environments quickly
- Pin exact package versions and freeze them to files
- Rebuild the same environment on any machine
- Connect your venv to Jupyter for reliable notebooks
Concept explained simply
A virtual environment is a private folder containing its own Python interpreter and packages. It isolates your project from the system and other projects. Dependency management is the practice of choosing, pinning, and recording package versions so others (and future you) can recreate the same environment.
Mental model
- Think of each project as a "sealed lab" (the virtual environment). Everything you install lives in that lab.
- Your lab’s "shopping list" (requirements.txt) records exact package versions you used.
- Anyone can rebuild the lab by creating a new venv and installing from your shopping list.
Key tools you’ll use
- python -m venv: create isolated environments
- Activate scripts: source .venv/bin/activate (macOS/Linux), .venv\Scripts\activate (Windows)
- pip install / pip uninstall: add or remove packages
- pip freeze > requirements.txt: capture exact versions
- pip install -r requirements.txt: reproduce the environment
- Optional advanced: constraints.txt for controlled upgrades; ipykernel for Jupyter integration
Worked examples
Example 1 — Create a clean venv and install packages
- Make a project folder and venv:
mkdir ml-env-demo cd ml-env-demo python -m venv .venv - Activate it:
- macOS/Linux:
source .venv/bin/activate - Windows (PowerShell):
.venv\Scripts\Activate.ps1
- macOS/Linux:
- Install exact versions (stable example):
pip install numpy==1.26.4 scikit-learn==1.3.2 - Verify:
python -c "import numpy, sklearn; print(numpy.__version__, sklearn.__version__)"
Example 2 — Freeze and reproduce
- Freeze the environment:
pip freeze > requirements.txt - Create a brand-new venv to simulate a teammate’s machine:
deactivate # if active python -m venv .venv2 source .venv2/bin/activate # or .venv2\Scripts\Activate.ps1 on Windows pip install -r requirements.txt - Confirm versions match:
python -c "import numpy, sklearn; print(numpy.__version__, sklearn.__version__)"
Example 3 — Controlled upgrades with constraints
constraints.txt lets you pin transitive dependencies while you upgrade a top-level package.
- Create constraints from your current lock:
pip freeze > constraints.txt - Upgrade a single package while holding others steady:
pip install --upgrade pandas==2.1.4 -c constraints.txt - Test your code, then regenerate requirements.txt if all good:
pip freeze > requirements.txt
Example 4 — Use your venv in Jupyter
- Install ipykernel inside the active venv:
pip install ipykernel - Register kernel with a friendly name:
python -m ipykernel install --user --name ml-env-demo --display-name "Python (ml-env-demo)" - Open Jupyter and select "Python (ml-env-demo)" kernel to ensure the notebook uses this environment.
Who this is for
- Machine Learning Engineers who run multiple projects and experiments
- Data Scientists transitioning from notebooks to production workflows
- Anyone who needs repeatable training and inference environments
Prerequisites
- Basic command line usage
- Python installed (3.9+ recommended)
- pip available (bundled with recent Python)
Learning path
- Create and activate virtual environments
- Install specific package versions and verify
- Freeze to requirements.txt and rebuild elsewhere
- Use constraints.txt for controlled upgrades
- Connect venvs to Jupyter kernels
- Apply to a small ML project and share with a teammate
Exercises
Complete these hands-on tasks. Everyone can take the test; saved progress is available to logged-in users.
Exercise 1 — Clean venv, pin versions, print versions
- Create a folder named project-a, then a venv named .venv.
- Activate it and install:
pip install numpy==1.26.4 scikit-learn==1.3.2 - Create print_versions.py:
import numpy, sklearn print("NumPy:", numpy.__version__) print("scikit-learn:", sklearn.__version__) - Run the script and confirm versions print.
Exercise 2 — Freeze and reproduce in a fresh venv
- In project-a, run:
pip freeze > requirements.txt - Deactivate, create .venv2, activate it.
- Run:
pip install -r requirements.txt - Re-run print_versions.py and confirm same versions.
Checklist — I did this
- I created and activated a venv without errors
- I installed exact package versions
- I froze dependencies to requirements.txt
- I rebuilt the environment successfully
- My reproduced versions matched exactly
Common mistakes and self-check
Using system Python for everything
Risk: breaking system tools or mixing dependencies. Fix: always create a venv per project and activate before installing.
Not pinning versions
Risk: silent upgrades change results. Fix: pin with == and commit requirements.txt; rebuild with -r.
Mixing environments
Symptom: python uses a different interpreter than pip. Self-check: run which python and which pip (or where on Windows). They should both point inside your venv.
Forgetting Jupyter kernel binding
Symptom: notebook imports differ from terminal. Fix: install ipykernel in the venv and select the correct kernel.
Upgrading everything at once
Risk: large surface area for breakage. Fix: use constraints.txt and upgrade one package at a time with tests.
Practical projects
- Reproducible notebook: Train a small classifier on Iris, freeze requirements, and rebuild environment on a second venv to verify identical accuracy.
- Controlled upgrade: Start with a working pandas-based feature pipeline, then upgrade pandas using constraints.txt and ensure unit tests still pass.
- Team handoff: Package a minimal inference script and requirements.txt that a teammate can run in a fresh venv to replicate your predictions.
Next steps
- Automate environment setup with a simple setup script (create venv, install -r)
- Add pre-commit checks to prevent accidental unpinned installs
- Learn packaging basics (pyproject.toml) when turning code into reusable modules
Mini challenge
Create a tiny ML project that trains LogisticRegression on Iris. Pin versions, save requirements.txt, and write a short README with 3 commands: create venv, install -r, run script. Ask a peer to reproduce the same accuracy (±1e-6). If results differ, investigate differences in Python version, BLAS, or package pins.