Who this is for
You write Python for data pipelines, feature engineering, or ML training/inference and need fast, reliable tests. Ideal for Machine Learning Engineers and data-minded software engineers.
Prerequisites
- Comfortable with Python functions, modules, and virtual environments.
- Basic understanding of data pipelines or model training loops.
- Ability to run commands in a terminal (e.g., pip install pytest; pytest).
Why this matters
In ML systems, most failures come from data drift, edge cases in preprocessing, or brittle glue code—not only from the model. Pytest helps you:
- Lock down feature logic so refactors don’t silently change model inputs.
- Validate train/infer parity so production predictions match expectations.
- Test file I/O and config wiring with temporary paths and monkeypatching.
- Catch regressions quickly with fast feedback and focused error messages.
Real tasks you will handle with pytest
- Guarantee that a scaling function never divides by zero and handles NaNs.
- Ensure feature columns are present, ordered, and dtype-safe before inference.
- Verify saved model artifacts have expected files, metadata, and version tags.
- Mock environment variables and external calls (e.g., to secrets/config services).
Concept explained simply
Pytest finds test_*.py files, runs functions starting with test_, and shows what failed and why. Use fixtures to prepare reusable data/resources. Use parametrization to run the same test against multiple inputs. Use tmp_path to safely write files. Use monkeypatch to replace environment variables or attributes during tests.
Mental model
- Arrange: set up inputs/resources (fixtures).
- Act: call the function under test.
- Assert: verify outputs, side effects, or raised exceptions.
Keep tests small and specific. If a test fails, you should immediately know what broke.
Core pytest features for ML code
- Test discovery and asserts: plain assert with rich diff.
- Parametrization: run one test with many datasets using @pytest.mark.parametrize.
- Fixtures: reusable setup with optional scopes (function, class, module, session).
- tmp_path: create temporary files/directories safely.
- monkeypatch: override environment variables or functions during the test.
- Markers: label slow or integration tests (e.g., @pytest.mark.slow).
When to use fixtures vs helper functions
- Use fixtures for resources (paths, config dicts, seeded RNG) that many tests share.
- Use plain helper functions for pure data transformations or assertions.
Worked examples
Example 1 — Parametrized test for a feature scaler
We test a simple normalize function: it should map a list to [0, 1].
# file: features.py
def normalize(values):
if not values:
raise ValueError("empty input")
mn = min(values)
mx = max(values)
if mx == mn:
return [0.0 for _ in values]
return [(v - mn) / (mx - mn) for v in values]
# file: test_features.py
import pytest
from features import normalize
@pytest.mark.parametrize(
"inp, expected",
[([0, 10], [0.0, 1.0]), ([2, 2, 2], [0.0, 0.0, 0.0]), ([1, 4, 7], [0.0, 0.5, 1.0])],
)
def test_normalize_basic(inp, expected):
assert normalize(inp) == expected
def test_normalize_empty_raises():
with pytest.raises(ValueError):
normalize([])
Example 2 — Testing file outputs with tmp_path
We ensure predictions are saved as a JSONL file in a directory.
# file: io_utils.py
import json
def save_predictions(out_dir, preds):
p = out_dir / "preds.jsonl"
with p.open("w", encoding="utf-8") as f:
for item in preds:
f.write(json.dumps(item) + "\n")
return p
# file: test_io_utils.py
from io_utils import save_predictions
def test_save_predictions_writes_jsonl(tmp_path):
preds = [{"id": 1, "y": 0.8}, {"id": 2, "y": 0.2}]
p = save_predictions(tmp_path, preds)
assert p.exists()
lines = p.read_text(encoding="utf-8").strip().splitlines()
assert len(lines) == 2
assert "\"id\": 1" in lines[0]
Example 3 — monkeypatch to control environment/config
We test that a function reads an environment variable with a default. We use monkeypatch to avoid touching global state permanently.
# file: config.py
import os
def get_model_version():
return os.getenv("MODEL_VERSION", "latest")
# file: test_config.py
def test_get_model_version_default(monkeypatch):
monkeypatch.delenv("MODEL_VERSION", raising=False)
assert __import__("config").get_model_version() == "latest"
def test_get_model_version_env(monkeypatch):
monkeypatch.setenv("MODEL_VERSION", "v3")
assert __import__("config").get_model_version() == "v3"
Why these tests help ML systems
- Feature stability: normalize remains consistent across refactors.
- Reproducible artifacts: saved predictions have expected structure for downstream consumers.
- Safe configuration: environment-driven behavior is deterministic and testable.
Hands-on exercises
Run tests with: pytest -q
Exercise 1 — Robust normalization
Create tests for normalize that validate typical cases, identical values, negatives, and bad inputs.
Instructions
- Create features.py with normalize as in Example 1 (copy or implement).
- Create test_features.py and add parametrized tests for at least three input/expected pairs.
- Add a test that empty input raises ValueError.
- Add a test that negative inputs are handled (e.g., [-5, 5] maps to [0.0, 1.0]).
Hints
- Use @pytest.mark.parametrize for multiple (input, expected) pairs.
- Use with pytest.raises(ValueError) for error paths.
Expected when you run: 4 passed
Exercise 2 — Test file I/O and idempotence
Write tests for a saver function that must overwrite existing files cleanly.
Starter code
# file: saver.py
from pathlib import Path
def save_text(out_path: Path, content: str) -> Path:
out_path.parent.mkdir(parents=True, exist_ok=True)
out_path.write_text(content, encoding="utf-8")
return out_path
Instructions
- Create test_saver.py.
- Test that save_text(tmp_path/"a"/"b.txt", "hello") creates the file with exact content.
- Call save_text again with different content and assert the file contains only the new content (no duplication).
- Assert parent directories are created automatically.
Hints
- Use tmp_path / "a" / "b.txt" to compose paths.
- Read back with out_path.read_text(encoding="utf-8").
Expected when you run: 2 passed
Checklist: before you ship tests
- Each test has a single clear purpose and name.
- Edge cases covered: empty, single-item, identical values, negatives, unexpected types.
- File tests use tmp_path; no writes to the repo.
- Environment/config-dependent code uses monkeypatch.
- Parametrization used for repetitive scenarios.
- Fast unit tests: each runs under a few milliseconds where possible.
Common mistakes and self-check
- Over-mocking: If you mock too much, tests become meaningless. Self-check: Does at least one test hit the real code path and I/O you actually depend on?
- Gluing many assertions into one test: Hard to diagnose failures. Self-check: Would a failure tell me exactly what broke?
- Not testing error handling: Real data is messy. Self-check: Do you assert on ValueError/TypeError cases you intentionally raise?
- Leaking state across tests: Use fixtures and tmp_path. Self-check: Can tests run in any order and still pass?
Practical projects
- Feature store sanity suite: Write tests for 5–10 feature functions (range checks, no-NaN guarantees, column order).
- Artifact contract tests: Verify a training job writes model.bin, metrics.json, and schema.json with expected keys.
- Inference parity check: Given a small frozen input batch and a stub model (e.g., deterministic), assert the outputs remain identical across refactors.
Learning path
- Start: Write unit tests for pure functions (today).
- Next: Add fixtures for datasets/config and use tmp_path for files.
- Then: Use monkeypatch for env/config and label slow/integration tests with markers.
- Later: Add coverage and pre-commit hooks to run pytest on each change.
Next steps
- Complete the exercises above and ensure all tests pass.
- Take the quick test below to check understanding. Everyone can take it; log in to save your progress.
- Apply the checklist to one of your current ML components.
Mini challenge
Write a fixture that returns a deterministic list of random numbers by seeding random.random. Test that two calls within the same test yield the same sequence, while separate tests get fresh sequences.
Tip
Use a fixture that sets random.seed(0) and returns [random.random() for _ in range(3)]. Consider function scope so each test is independent.
Testing With Pytest — Quick Test
Available to everyone; log in to save your progress.