luvv to helpDiscover the Best Free Online Tools
Topic 3 of 10

Type Hints And Linting

Learn Type Hints And Linting for free with explanations, exercises, and a quick test (for Machine Learning Engineer).

Published: January 1, 2026 | Updated: January 1, 2026

Who this is for

Machine Learning Engineers who write Python for data processing, training, and serving. Ideal if you collaborate in teams, work with notebooks and packages, and want safer refactoring.

Prerequisites

  • Comfortable with Python functions, classes, and modules
  • Basic NumPy and pandas usage
  • Familiar with training and using ML models

Why this matters

  • Stops subtle bugs early: wrong shapes/dtypes break training at runtime—types catch this earlier.
  • Safer refactors: adding features or changing pipelines without guessing data contracts.
  • Better collaboration: teammates see function expectations instantly.
  • Production-ready code: consistent style and fewer defects under review.
Mental model: contracts + guardrails

Type hints are contracts that describe what your code expects and returns. Linters are guardrails that keep style and simple logic errors in check. Formatters are auto-cleaners that make code uniform so your brain focuses on logic. Together they make your ML code easier to change without fear.

Concept explained simply

Type hints describe values. They don’t change runtime behavior; they help tools (and humans) reason about code.

  • Built-ins: str, int, float, bool, list[str], dict[str, float]
  • Optional: T | None (meaning value may be missing)
  • Union: int | float (accept multiple types)
  • Callable: Callable[[int, int], float]
  • TypeVar/Generics: reusable type placeholders for containers/utilities
  • Protocol: duck-typed interfaces (e.g., any object with predict)
  • TypedDict: for dicts with known keys/typed values
  • NumPy: numpy.typing.NDArray[np.float64]
  • pandas: use pd.DataFrame and pd.Series as coarse types
  • Annotated: Annotated[NDArray[np.float64], "shape=(n, d)"] for extra human hints

Linting enforces consistency and catches common errors:

  • Unused imports/variables
  • Shadowed names and ambiguous single-letter vars
  • Comparisons to None should use is/is not
  • Complex functions get flagged for refactoring
Minimal tool stack you can adopt today
  • Type checker: mypy or pyright
  • Linter: ruff (fast, covers many rules)
  • Formatter: black; Import sorter: isort (or ruff's import rules)
# Example: pyproject.toml (conceptual snippet)
[tool.black]
line-length = 100

[tool.ruff]
line-length = 100
select = ["E", "F", "I", "UP", "B"]  # pycodestyle, pyflakes, isort, pyupgrade, bugbear
ignore = []

[tool.mypy]
python_version = "3.11"
warn_unused_ignores = true
warn_return_any = true
strict_optional = true
check_untyped_defs = true

Worked examples

Example 1 — Typed CSV loader with column filter
from __future__ import annotations
from pathlib import Path
from typing import Iterable
import pandas as pd

PathLike = str | Path

def load_csv(path: PathLike, keep_cols: Iterable[str] | None = None) -> pd.DataFrame:
    """Load a CSV and optionally select columns.

    path: file path
    keep_cols: subset of columns to keep
    """
    df = pd.read_csv(path)
    if keep_cols is not None:
        df = df.loc[:, list(keep_cols)]
    return df
  • Type clarity: callers know they can pass str or Path.
  • Linter tip: avoid ambiguous names; use descriptive keep_cols.
Example 2 — Predict Protocol for interchangeable models
from typing import Protocol
import numpy as np
import numpy.typing as npt

ArrayF = npt.NDArray[np.float64]
ArrayI = npt.NDArray[np.int64]

class Classifier(Protocol):
    def predict(self, X: ArrayF) -> ArrayI: ...

def accuracy(model: Classifier, X: ArrayF, y_true: ArrayI) -> float:
    y_pred = model.predict(X)
    return float((y_pred == y_true).mean())

Any object with predict(X) returning class ids will work, even without inheritance. That’s powerful for swapping models in experiments.

Example 3 — TypedDict for inference response
from typing import TypedDict

class Pred(TypedDict):
    label: int
    prob: float

def to_response(label: int, prob: float) -> Pred:
    return {"label": label, "prob": prob}

Client code now knows the response shape exactly.

Practical workflow

  1. Add type hints as you write functions. Prefer precise container types like list[str].
  2. Run a type checker regularly. Fix narrowest issues first (e.g., wrong return types).
  3. Run linter and formatter. Commit only clean code.
  4. When unsure, start broad (Any) but add TODOs and refine later.

Exercises

Do these locally or in a notebook. Aim to pass a type checker and get zero linter violations.

Exercise 1 — Clean DataFrame with clear types

# Instructions:
# 1) Add type hints so a type checker finds no issues.
# 2) Ensure variables have descriptive names and no unused imports.
# 3) Return type should make sense for callers.

import pandas as pd

# Given skeleton

def clean_data(df, keep_cols, min_age):
    df = df.dropna(subset=keep_cols)
    df = df[df["age"] >= min_age]
    return df

# Example usage (for your own check):
# demo = pd.DataFrame({"age": [20, None, 35], "city": ["NY", "SF", "LA"]})
# print(clean_data(demo, ["age", "city"], 21))
  • Checklist:
    • Function arguments and return annotated
    • keep_cols accepts an iterable of strings
    • No linter warnings about style or unused names

Exercise 2 — Protocol-based evaluator

# Instructions:
# 1) Define a Protocol named Classifier with predict(X) -> int array.
# 2) Type alias ArrayF (float64) and ArrayI (int64) using numpy.typing.
# 3) Implement evaluate(model, X, y) -> float returning accuracy.
# 4) Ensure linting and typing pass.

import numpy as np
import numpy.typing as npt

# Your code here
# ...

# Example dummy model for manual testing:
# class Majority:
#     def __init__(self, cls: int):
#         self.cls = cls
#     def predict(self, X: ArrayF) -> ArrayI:
#         return np.full(X.shape[0], self.cls, dtype=np.int64)
#
# X = np.array([[0.0, 1.0], [1.0, 1.0]], dtype=np.float64)
# y = np.array([1, 1], dtype=np.int64)
# print(evaluate(Majority(1), X, y))  # expect 1.0
  • Checklist:
    • Correct Protocol definition
    • Correct dtypes in arrays
    • No unused imports or variables

Common mistakes and self-check

  • Using Any everywhere. Self-check: can you replace it with a concrete type in 1–2 minutes? If yes, do it now.
  • Forgetting | None for optional params. Self-check: where do you check is None? Annotate accordingly.
  • Typing np.array without dtype. Self-check: pin to NDArray[np.float64] or NDArray[np.int64].
  • Comparing to None with ==. Use is and is not.
  • Wide functions without contracts. Extract helpers with clear types.

Practical projects

  • Annotate a feature engineering module (joins, encoders, scalers), add a Classifier Protocol, and ensure mypy/ruff pass.
  • Create an inference module returning a TypedDict response; add basic validation and types for arrays.
  • Refactor one training notebook into a typed script with a small pyproject.toml config and pre-commit steps.

Mini challenge

Add types to a function that standardizes features and returns both transformed data and the per-feature stats.

import numpy as np
import numpy.typing as npt

# Add precise types and make linter happy

def standardize(X):
    mean = X.mean(axis=0)
    std = X.std(axis=0)
    Z = (X - mean) / np.where(std == 0, 1.0, std)
    return Z, {"mean": mean, "std": std}
Hint
  • Use NDArray[np.float64] for arrays.
  • Consider TypedDict for the stats dict.

Learning path

  • Now: Type hints and linting (this lesson)
  • Next: Packaging and project structure to keep types across modules
  • Then: Testing strategies with typed fixtures
  • Later: Data validation libraries to complement static types

Next steps

  • Add types to your most-used data functions
  • Configure ruff and a type checker; run them before each commit
  • Extend Protocols for your common model interfaces

Quick Test and progress saving

The quick test below is available to everyone. If you are logged in, your progress will be saved automatically.

Practice Exercises

2 exercises to complete

Instructions

Add type hints and make linter-friendly improvements to the given function.

import pandas as pd

def clean_data(df, keep_cols, min_age):
    df = df.dropna(subset=keep_cols)
    df = df[df["age"] >= min_age]
    return df
  • Annotate parameters and return type.
  • Allow any iterable of strings for keep_cols.
  • Ensure no linter warnings remain.
Expected Output
Type checker: 0 issues. Linter: 0 violations.

Type Hints And Linting — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Type Hints And Linting?

AI Assistant

Ask questions about this tool