Who this is for
Machine Learning Engineers who write Python for data processing, training, and serving. Ideal if you collaborate in teams, work with notebooks and packages, and want safer refactoring.
Prerequisites
- Comfortable with Python functions, classes, and modules
- Basic NumPy and pandas usage
- Familiar with training and using ML models
Why this matters
- Stops subtle bugs early: wrong shapes/dtypes break training at runtime—types catch this earlier.
- Safer refactors: adding features or changing pipelines without guessing data contracts.
- Better collaboration: teammates see function expectations instantly.
- Production-ready code: consistent style and fewer defects under review.
Mental model: contracts + guardrails
Type hints are contracts that describe what your code expects and returns. Linters are guardrails that keep style and simple logic errors in check. Formatters are auto-cleaners that make code uniform so your brain focuses on logic. Together they make your ML code easier to change without fear.
Concept explained simply
Type hints describe values. They don’t change runtime behavior; they help tools (and humans) reason about code.
- Built-ins:
str,int,float,bool,list[str],dict[str, float] - Optional:
T | None(meaning value may be missing) - Union:
int | float(accept multiple types) - Callable:
Callable[[int, int], float] - TypeVar/Generics: reusable type placeholders for containers/utilities
- Protocol: duck-typed interfaces (e.g., any object with
predict) - TypedDict: for dicts with known keys/typed values
- NumPy:
numpy.typing.NDArray[np.float64] - pandas: use
pd.DataFrameandpd.Seriesas coarse types - Annotated:
Annotated[NDArray[np.float64], "shape=(n, d)"]for extra human hints
Linting enforces consistency and catches common errors:
- Unused imports/variables
- Shadowed names and ambiguous single-letter vars
- Comparisons to None should use
is/is not - Complex functions get flagged for refactoring
Minimal tool stack you can adopt today
- Type checker: mypy or pyright
- Linter: ruff (fast, covers many rules)
- Formatter: black; Import sorter: isort (or ruff's import rules)
# Example: pyproject.toml (conceptual snippet)
[tool.black]
line-length = 100
[tool.ruff]
line-length = 100
select = ["E", "F", "I", "UP", "B"] # pycodestyle, pyflakes, isort, pyupgrade, bugbear
ignore = []
[tool.mypy]
python_version = "3.11"
warn_unused_ignores = true
warn_return_any = true
strict_optional = true
check_untyped_defs = true
Worked examples
Example 1 — Typed CSV loader with column filter
from __future__ import annotations
from pathlib import Path
from typing import Iterable
import pandas as pd
PathLike = str | Path
def load_csv(path: PathLike, keep_cols: Iterable[str] | None = None) -> pd.DataFrame:
"""Load a CSV and optionally select columns.
path: file path
keep_cols: subset of columns to keep
"""
df = pd.read_csv(path)
if keep_cols is not None:
df = df.loc[:, list(keep_cols)]
return df
- Type clarity: callers know they can pass str or Path.
- Linter tip: avoid ambiguous names; use descriptive
keep_cols.
Example 2 — Predict Protocol for interchangeable models
from typing import Protocol
import numpy as np
import numpy.typing as npt
ArrayF = npt.NDArray[np.float64]
ArrayI = npt.NDArray[np.int64]
class Classifier(Protocol):
def predict(self, X: ArrayF) -> ArrayI: ...
def accuracy(model: Classifier, X: ArrayF, y_true: ArrayI) -> float:
y_pred = model.predict(X)
return float((y_pred == y_true).mean())
Any object with predict(X) returning class ids will work, even without inheritance. That’s powerful for swapping models in experiments.
Example 3 — TypedDict for inference response
from typing import TypedDict
class Pred(TypedDict):
label: int
prob: float
def to_response(label: int, prob: float) -> Pred:
return {"label": label, "prob": prob}
Client code now knows the response shape exactly.
Practical workflow
- Add type hints as you write functions. Prefer precise container types like
list[str]. - Run a type checker regularly. Fix narrowest issues first (e.g., wrong return types).
- Run linter and formatter. Commit only clean code.
- When unsure, start broad (
Any) but add TODOs and refine later.
Exercises
Do these locally or in a notebook. Aim to pass a type checker and get zero linter violations.
Exercise 1 — Clean DataFrame with clear types
# Instructions:
# 1) Add type hints so a type checker finds no issues.
# 2) Ensure variables have descriptive names and no unused imports.
# 3) Return type should make sense for callers.
import pandas as pd
# Given skeleton
def clean_data(df, keep_cols, min_age):
df = df.dropna(subset=keep_cols)
df = df[df["age"] >= min_age]
return df
# Example usage (for your own check):
# demo = pd.DataFrame({"age": [20, None, 35], "city": ["NY", "SF", "LA"]})
# print(clean_data(demo, ["age", "city"], 21))
- Checklist:
- Function arguments and return annotated
keep_colsaccepts an iterable of strings- No linter warnings about style or unused names
Exercise 2 — Protocol-based evaluator
# Instructions:
# 1) Define a Protocol named Classifier with predict(X) -> int array.
# 2) Type alias ArrayF (float64) and ArrayI (int64) using numpy.typing.
# 3) Implement evaluate(model, X, y) -> float returning accuracy.
# 4) Ensure linting and typing pass.
import numpy as np
import numpy.typing as npt
# Your code here
# ...
# Example dummy model for manual testing:
# class Majority:
# def __init__(self, cls: int):
# self.cls = cls
# def predict(self, X: ArrayF) -> ArrayI:
# return np.full(X.shape[0], self.cls, dtype=np.int64)
#
# X = np.array([[0.0, 1.0], [1.0, 1.0]], dtype=np.float64)
# y = np.array([1, 1], dtype=np.int64)
# print(evaluate(Majority(1), X, y)) # expect 1.0
- Checklist:
- Correct Protocol definition
- Correct dtypes in arrays
- No unused imports or variables
Common mistakes and self-check
- Using
Anyeverywhere. Self-check: can you replace it with a concrete type in 1–2 minutes? If yes, do it now. - Forgetting
| Nonefor optional params. Self-check: where do you checkis None? Annotate accordingly. - Typing
np.arraywithout dtype. Self-check: pin toNDArray[np.float64]orNDArray[np.int64]. - Comparing to None with
==. Useisandis not. - Wide functions without contracts. Extract helpers with clear types.
Practical projects
- Annotate a feature engineering module (joins, encoders, scalers), add a Classifier Protocol, and ensure mypy/ruff pass.
- Create an inference module returning a TypedDict response; add basic validation and types for arrays.
- Refactor one training notebook into a typed script with a small
pyproject.tomlconfig and pre-commit steps.
Mini challenge
Add types to a function that standardizes features and returns both transformed data and the per-feature stats.
import numpy as np
import numpy.typing as npt
# Add precise types and make linter happy
def standardize(X):
mean = X.mean(axis=0)
std = X.std(axis=0)
Z = (X - mean) / np.where(std == 0, 1.0, std)
return Z, {"mean": mean, "std": std}
Hint
- Use
NDArray[np.float64]for arrays. - Consider
TypedDictfor the stats dict.
Learning path
- Now: Type hints and linting (this lesson)
- Next: Packaging and project structure to keep types across modules
- Then: Testing strategies with typed fixtures
- Later: Data validation libraries to complement static types
Next steps
- Add types to your most-used data functions
- Configure ruff and a type checker; run them before each commit
- Extend Protocols for your common model interfaces
Quick Test and progress saving
The quick test below is available to everyone. If you are logged in, your progress will be saved automatically.