How to learn Testing With Pytest for Python in Machine Learning Engineer for free

Who this is for

You write Python for data pipelines, feature engineering, or ML training/inference and need fast, reliable tests. Ideal for Machine Learning Engineers and data-minded software engineers.

Prerequisites

Comfortable with Python functions, modules, and virtual environments.
Basic understanding of data pipelines or model training loops.
Ability to run commands in a terminal (e.g., pip install pytest; pytest).

Why this matters

In ML systems, most failures come from data drift, edge cases in preprocessing, or brittle glue code—not only from the model. Pytest helps you:

Lock down feature logic so refactors don’t silently change model inputs.
Validate train/infer parity so production predictions match expectations.
Test file I/O and config wiring with temporary paths and monkeypatching.
Catch regressions quickly with fast feedback and focused error messages.

Real tasks you will handle with pytest

Guarantee that a scaling function never divides by zero and handles NaNs.
Ensure feature columns are present, ordered, and dtype-safe before inference.
Verify saved model artifacts have expected files, metadata, and version tags.
Mock environment variables and external calls (e.g., to secrets/config services).

Concept explained simply

Pytest finds test_*.py files, runs functions starting with test_, and shows what failed and why. Use fixtures to prepare reusable data/resources. Use parametrization to run the same test against multiple inputs. Use tmp_path to safely write files. Use monkeypatch to replace environment variables or attributes during tests.

Mental model

Arrange: set up inputs/resources (fixtures).
Act: call the function under test.
Assert: verify outputs, side effects, or raised exceptions.

Keep tests small and specific. If a test fails, you should immediately know what broke.

Core pytest features for ML code

Test discovery and asserts: plain assert with rich diff.
Parametrization: run one test with many datasets using @pytest.mark.parametrize.
Fixtures: reusable setup with optional scopes (function, class, module, session).
tmp_path: create temporary files/directories safely.
monkeypatch: override environment variables or functions during the test.
Markers: label slow or integration tests (e.g., @pytest.mark.slow).

When to use fixtures vs helper functions

Use fixtures for resources (paths, config dicts, seeded RNG) that many tests share.
Use plain helper functions for pure data transformations or assertions.

Worked examples

Example 1 — Parametrized test for a feature scaler

We test a simple normalize function: it should map a list to [0, 1].

# file: features.py
def normalize(values):
    if not values:
        raise ValueError("empty input")
    mn = min(values)
    mx = max(values)
    if mx == mn:
        return [0.0 for _ in values]
    return [(v - mn) / (mx - mn) for v in values]

# file: test_features.py
import pytest
from features import normalize

@pytest.mark.parametrize(
    "inp, expected",
    [([0, 10], [0.0, 1.0]), ([2, 2, 2], [0.0, 0.0, 0.0]), ([1, 4, 7], [0.0, 0.5, 1.0])],
)
def test_normalize_basic(inp, expected):
    assert normalize(inp) == expected

def test_normalize_empty_raises():
    with pytest.raises(ValueError):
        normalize([])

Example 2 — Testing file outputs with tmp_path

We ensure predictions are saved as a JSONL file in a directory.

# file: io_utils.py
import json

def save_predictions(out_dir, preds):
    p = out_dir / "preds.jsonl"
    with p.open("w", encoding="utf-8") as f:
        for item in preds:
            f.write(json.dumps(item) + "\n")
    return p

# file: test_io_utils.py
from io_utils import save_predictions

def test_save_predictions_writes_jsonl(tmp_path):
    preds = [{"id": 1, "y": 0.8}, {"id": 2, "y": 0.2}]
    p = save_predictions(tmp_path, preds)
    assert p.exists()
    lines = p.read_text(encoding="utf-8").strip().splitlines()
    assert len(lines) == 2
    assert "\"id\": 1" in lines[0]

Example 3 — monkeypatch to control environment/config

We test that a function reads an environment variable with a default. We use monkeypatch to avoid touching global state permanently.

# file: config.py
import os

def get_model_version():
    return os.getenv("MODEL_VERSION", "latest")

# file: test_config.py
def test_get_model_version_default(monkeypatch):
    monkeypatch.delenv("MODEL_VERSION", raising=False)
    assert __import__("config").get_model_version() == "latest"

def test_get_model_version_env(monkeypatch):
    monkeypatch.setenv("MODEL_VERSION", "v3")
    assert __import__("config").get_model_version() == "v3"

Why these tests help ML systems

Feature stability: normalize remains consistent across refactors.
Reproducible artifacts: saved predictions have expected structure for downstream consumers.
Safe configuration: environment-driven behavior is deterministic and testable.

Hands-on exercises

Run tests with: pytest -q

Exercise 1 — Robust normalization

Create tests for normalize that validate typical cases, identical values, negatives, and bad inputs.

Instructions

Create features.py with normalize as in Example 1 (copy or implement).
Create test_features.py and add parametrized tests for at least three input/expected pairs.
Add a test that empty input raises ValueError.
Add a test that negative inputs are handled (e.g., [-5, 5] maps to [0.0, 1.0]).

Hints

Use @pytest.mark.parametrize for multiple (input, expected) pairs.
Use with pytest.raises(ValueError) for error paths.

Expected when you run: 4 passed

Exercise 2 — Test file I/O and idempotence

Write tests for a saver function that must overwrite existing files cleanly.

Starter code

# file: saver.py
from pathlib import Path

def save_text(out_path: Path, content: str) -> Path:
    out_path.parent.mkdir(parents=True, exist_ok=True)
    out_path.write_text(content, encoding="utf-8")
    return out_path

Instructions

Create test_saver.py.
Test that save_text(tmp_path/"a"/"b.txt", "hello") creates the file with exact content.
Call save_text again with different content and assert the file contains only the new content (no duplication).
Assert parent directories are created automatically.

Hints

Use tmp_path / "a" / "b.txt" to compose paths.
Read back with out_path.read_text(encoding="utf-8").

Expected when you run: 2 passed

Checklist: before you ship tests

Each test has a single clear purpose and name.
Edge cases covered: empty, single-item, identical values, negatives, unexpected types.
File tests use tmp_path; no writes to the repo.
Environment/config-dependent code uses monkeypatch.
Parametrization used for repetitive scenarios.
Fast unit tests: each runs under a few milliseconds where possible.

Common mistakes and self-check

Over-mocking: If you mock too much, tests become meaningless. Self-check: Does at least one test hit the real code path and I/O you actually depend on?
Gluing many assertions into one test: Hard to diagnose failures. Self-check: Would a failure tell me exactly what broke?
Not testing error handling: Real data is messy. Self-check: Do you assert on ValueError/TypeError cases you intentionally raise?
Leaking state across tests: Use fixtures and tmp_path. Self-check: Can tests run in any order and still pass?

Practical projects

Feature store sanity suite: Write tests for 5–10 feature functions (range checks, no-NaN guarantees, column order).
Artifact contract tests: Verify a training job writes model.bin, metrics.json, and schema.json with expected keys.
Inference parity check: Given a small frozen input batch and a stub model (e.g., deterministic), assert the outputs remain identical across refactors.

Learning path

Start: Write unit tests for pure functions (today).
Next: Add fixtures for datasets/config and use tmp_path for files.
Then: Use monkeypatch for env/config and label slow/integration tests with markers.
Later: Add coverage and pre-commit hooks to run pytest on each change.

Next steps

Complete the exercises above and ensure all tests pass.
Take the quick test below to check understanding. Everyone can take it; log in to save your progress.
Apply the checklist to one of your current ML components.

Mini challenge

Write a fixture that returns a deterministic list of random numbers by seeding random.random. Test that two calls within the same test yield the same sequence, while separate tests get fresh sequences.

Tip

Use a fixture that sets random.seed(0) and returns [random.random() for _ in range(3)]. Consider function scope so each test is independent.

Testing With Pytest — Quick Test

Available to everyone; log in to save your progress.

Menu

Testing With Pytest

Table of Contents

Who this is for

Prerequisites

Why this matters

Concept explained simply

Mental model

Core pytest features for ML code

Worked examples

Example 1 — Parametrized test for a feature scaler

Example 2 — Testing file outputs with tmp_path

Example 3 — monkeypatch to control environment/config

Hands-on exercises

Exercise 1 — Robust normalization

Exercise 2 — Test file I/O and idempotence

Checklist: before you ship tests

Common mistakes and self-check

Practical projects

Learning path

Next steps

Mini challenge

Testing With Pytest — Quick Test

Practice Exercises

Robust normalization with parametrized tests

Instructions

Expected Output

Test file I/O with tmp_path and idempotence

Testing With Pytest — Quick Test

Have questions about Testing With Pytest?

AI Assistant