How to learn Type Hints And Linting for Python in Machine Learning Engineer for free

Who this is for

Machine Learning Engineers who write Python for data processing, training, and serving. Ideal if you collaborate in teams, work with notebooks and packages, and want safer refactoring.

Prerequisites

Comfortable with Python functions, classes, and modules
Basic NumPy and pandas usage
Familiar with training and using ML models

Why this matters

Stops subtle bugs early: wrong shapes/dtypes break training at runtime—types catch this earlier.
Safer refactors: adding features or changing pipelines without guessing data contracts.
Better collaboration: teammates see function expectations instantly.
Production-ready code: consistent style and fewer defects under review.

Mental model: contracts + guardrails

Type hints are contracts that describe what your code expects and returns. Linters are guardrails that keep style and simple logic errors in check. Formatters are auto-cleaners that make code uniform so your brain focuses on logic. Together they make your ML code easier to change without fear.

Concept explained simply

Type hints describe values. They don’t change runtime behavior; they help tools (and humans) reason about code.

Built-ins: str, int, float, bool, list[str], dict[str, float]
Optional: T | None (meaning value may be missing)
Union: int | float (accept multiple types)
Callable: Callable[[int, int], float]
TypeVar/Generics: reusable type placeholders for containers/utilities
Protocol: duck-typed interfaces (e.g., any object with predict)
TypedDict: for dicts with known keys/typed values
NumPy: numpy.typing.NDArray[np.float64]
pandas: use pd.DataFrame and pd.Series as coarse types
Annotated: Annotated[NDArray[np.float64], "shape=(n, d)"] for extra human hints

Linting enforces consistency and catches common errors:

Unused imports/variables
Shadowed names and ambiguous single-letter vars
Comparisons to None should use is/is not
Complex functions get flagged for refactoring

Minimal tool stack you can adopt today

Type checker: mypy or pyright
Linter: ruff (fast, covers many rules)
Formatter: black; Import sorter: isort (or ruff's import rules)

# Example: pyproject.toml (conceptual snippet)
[tool.black]
line-length = 100

[tool.ruff]
line-length = 100
select = ["E", "F", "I", "UP", "B"]  # pycodestyle, pyflakes, isort, pyupgrade, bugbear
ignore = []

[tool.mypy]
python_version = "3.11"
warn_unused_ignores = true
warn_return_any = true
strict_optional = true
check_untyped_defs = true

Worked examples

Example 1 — Typed CSV loader with column filter

from __future__ import annotations
from pathlib import Path
from typing import Iterable
import pandas as pd

PathLike = str | Path

def load_csv(path: PathLike, keep_cols: Iterable[str] | None = None) -> pd.DataFrame:
    """Load a CSV and optionally select columns.

    path: file path
    keep_cols: subset of columns to keep
    """
    df = pd.read_csv(path)
    if keep_cols is not None:
        df = df.loc[:, list(keep_cols)]
    return df

Type clarity: callers know they can pass str or Path.
Linter tip: avoid ambiguous names; use descriptive keep_cols.

Example 2 — Predict Protocol for interchangeable models

from typing import Protocol
import numpy as np
import numpy.typing as npt

ArrayF = npt.NDArray[np.float64]
ArrayI = npt.NDArray[np.int64]

class Classifier(Protocol):
    def predict(self, X: ArrayF) -> ArrayI: ...

def accuracy(model: Classifier, X: ArrayF, y_true: ArrayI) -> float:
    y_pred = model.predict(X)
    return float((y_pred == y_true).mean())

Any object with predict(X) returning class ids will work, even without inheritance. That’s powerful for swapping models in experiments.

Example 3 — TypedDict for inference response

from typing import TypedDict

class Pred(TypedDict):
    label: int
    prob: float

def to_response(label: int, prob: float) -> Pred:
    return {"label": label, "prob": prob}

Client code now knows the response shape exactly.

Practical workflow

Add type hints as you write functions. Prefer precise container types like list[str].
Run a type checker regularly. Fix narrowest issues first (e.g., wrong return types).
Run linter and formatter. Commit only clean code.
When unsure, start broad (Any) but add TODOs and refine later.

Exercises

Do these locally or in a notebook. Aim to pass a type checker and get zero linter violations.

Exercise 1 — Clean DataFrame with clear types

# Instructions:
# 1) Add type hints so a type checker finds no issues.
# 2) Ensure variables have descriptive names and no unused imports.
# 3) Return type should make sense for callers.

import pandas as pd

# Given skeleton

def clean_data(df, keep_cols, min_age):
    df = df.dropna(subset=keep_cols)
    df = df[df["age"] >= min_age]
    return df

# Example usage (for your own check):
# demo = pd.DataFrame({"age": [20, None, 35], "city": ["NY", "SF", "LA"]})
# print(clean_data(demo, ["age", "city"], 21))

Checklist:
- Function arguments and return annotated
- keep_cols accepts an iterable of strings
- No linter warnings about style or unused names

Exercise 2 — Protocol-based evaluator

# Instructions:
# 1) Define a Protocol named Classifier with predict(X) -> int array.
# 2) Type alias ArrayF (float64) and ArrayI (int64) using numpy.typing.
# 3) Implement evaluate(model, X, y) -> float returning accuracy.
# 4) Ensure linting and typing pass.

import numpy as np
import numpy.typing as npt

# Your code here
# ...

# Example dummy model for manual testing:
# class Majority:
#     def __init__(self, cls: int):
#         self.cls = cls
#     def predict(self, X: ArrayF) -> ArrayI:
#         return np.full(X.shape[0], self.cls, dtype=np.int64)
#
# X = np.array([[0.0, 1.0], [1.0, 1.0]], dtype=np.float64)
# y = np.array([1, 1], dtype=np.int64)
# print(evaluate(Majority(1), X, y))  # expect 1.0

Checklist:
- Correct Protocol definition
- Correct dtypes in arrays
- No unused imports or variables

Common mistakes and self-check

Using Any everywhere. Self-check: can you replace it with a concrete type in 1–2 minutes? If yes, do it now.
Forgetting | None for optional params. Self-check: where do you check is None? Annotate accordingly.
Typing np.array without dtype. Self-check: pin to NDArray[np.float64] or NDArray[np.int64].
Comparing to None with ==. Use is and is not.
Wide functions without contracts. Extract helpers with clear types.

Practical projects

Annotate a feature engineering module (joins, encoders, scalers), add a Classifier Protocol, and ensure mypy/ruff pass.
Create an inference module returning a TypedDict response; add basic validation and types for arrays.
Refactor one training notebook into a typed script with a small pyproject.toml config and pre-commit steps.

Mini challenge

Add types to a function that standardizes features and returns both transformed data and the per-feature stats.

import numpy as np
import numpy.typing as npt

# Add precise types and make linter happy

def standardize(X):
    mean = X.mean(axis=0)
    std = X.std(axis=0)
    Z = (X - mean) / np.where(std == 0, 1.0, std)
    return Z, {"mean": mean, "std": std}

Hint

Use NDArray[np.float64] for arrays.
Consider TypedDict for the stats dict.

Learning path

Now: Type hints and linting (this lesson)
Next: Packaging and project structure to keep types across modules
Then: Testing strategies with typed fixtures
Later: Data validation libraries to complement static types

Next steps

Add types to your most-used data functions
Configure ruff and a type checker; run them before each commit
Extend Protocols for your common model interfaces

Quick Test and progress saving

The quick test below is available to everyone. If you are logged in, your progress will be saved automatically.

Menu

Type Hints And Linting

Table of Contents

Who this is for

Prerequisites

Why this matters

Concept explained simply

Worked examples

Practical workflow

Exercises

Exercise 1 — Clean DataFrame with clear types

Exercise 2 — Protocol-based evaluator

Common mistakes and self-check

Practical projects

Mini challenge

Learning path

Next steps

Quick Test and progress saving

Practice Exercises

Clean DataFrame with clear types

Instructions

Expected Output

Protocol-based evaluator

Type Hints And Linting — Quick Test

Have questions about Type Hints And Linting?

AI Assistant