luvv to helpDiscover the Best Free Online Tools
Topic 1 of 10

Production Quality Code Style

Learn Production Quality Code Style for free with explanations, exercises, and a quick test (for Machine Learning Engineer).

Published: January 1, 2026 | Updated: January 1, 2026

Who this is for

Machine Learning Engineers and Data Scientists moving code from notebooks to production. If you hand code to others (data platform, backend, DevOps) or deploy batch/online ML, this is for you.

Prerequisites

  • Comfortable with Python basics (functions, modules, virtual environments).
  • Basic familiarity with pandas, scikit-learn, and reading/writing files.
  • Know how to run a script from the command line.

Why this matters

  • Fewer bugs and outages: clear names, types, and tests make failures obvious.
  • Faster reviews: consistent style means reviewers focus on logic, not formatting.
  • Reproducibility: deterministic, well-logged code is easier to rerun and debug.
  • Handoffs: platform and backend teams can integrate your code without guessing intent.
Real tasks where this shows up
  • Turning a notebook feature engineering block into a reusable module used by training and inference.
  • Writing a batch inference script that ops runs nightly with logs and clear exit codes.
  • Fixing a production bug quickly because logs, types, and small functions isolate the issue.

Concept explained simply

Production code style is the set of readable, repeatable rules that make your Python code predictable for humans and safe for systems. Think of it as traffic rules for your project: consistent lanes (formatting), clear signs (names, docstrings), seatbelts (types, tests), and a dashboard (logging).

Mental model: Code should be easy to read first, then easy to change, and finally easy to run. If someone new can guess what a function does without running it, you’re doing it right.

Core guidelines to follow

1. Structure

Imports at top, one purpose per module, short functions. Keep data IO and business logic separated.

2. Naming

Descriptive, consistent: lower_snake_case for functions/variables, UpperCamelCase for classes, UPPER_SNAKE_CASE for constants.

3. Documentation

Docstrings explain intent, inputs, outputs, and errors. Keep them close to code. Prefer Google/NumPy style consistently.

4. Types

Type hints for function boundaries and tricky variables. They communicate contracts and prevent common mistakes.

5. Logging

Use logging instead of print. Include context (counts, paths, parameters). Choose sensible levels: DEBUG, INFO, WARNING, ERROR.

6. Errors

Fail fast with clear exceptions. Validate inputs at boundaries.

7. Formatting & lint

Auto-format (e.g., Black) and lint (e.g., Ruff). Keep imports ordered and unused code out.

Worked examples

Example 1 — Refactor a messy feature function

Before:

def prep(df):
    df=df.copy()
    df["age"]=df["age"].fillna(0)
    df['country']=df['country'].fillna('UNK')
    from sklearn.preprocessing import StandardScaler
    s=StandardScaler()
    df['x_norm'] = s.fit_transform(df[['x']])
    return df

After (production-style):

from __future__ import annotations

import logging
from typing import Tuple

import pandas as pd
from sklearn.preprocessing import StandardScaler

logger = logging.getLogger(__name__)

def prepare_features(
    df: pd.DataFrame,
    *,
    scaler: StandardScaler | None = None,
) -> tuple[pd.DataFrame, StandardScaler]:
    """
    Clean and transform features for model input.

    Args:
        df: DataFrame with columns ['age', 'country', 'x'].
        scaler: Optional fitted StandardScaler to reuse.

    Returns:
        (df_out, scaler): df_out includes 'x_norm'.

    Raises:
        KeyError: if required columns are missing.
    """
    required = {"age", "country", "x"}
    missing = required.difference(df.columns)
    if missing:
        raise KeyError(f"Missing required columns: {sorted(missing)}")

    out = df.copy()
    out["age"] = out["age"].fillna(0)
    out["country"] = out["country"].fillna("UNK")

    sc = scaler or StandardScaler()
    out["x_norm"] = (
        sc.fit_transform(out[["x"]]) if scaler is None else sc.transform(out[["x"]])
    )

    logger.debug("Prepared %d rows", len(out))
    return out, sc
  • Imports at top
  • Type hints and docstring
  • Input validation and logging
  • Pure transformation (no prints, no hidden globals)
Example 2 — CLI predict script with logging
from __future__ import annotations

import argparse
import logging
import sys
from pathlib import Path

import pandas as pd
from joblib import load

LOGGER_NAME = "predict_cli"

def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(description="Batch predict")
    parser.add_argument("--model", type=Path, required=True)
    parser.add_argument("--data", type=Path, required=True)
    parser.add_argument("--out", type=Path, required=True)
    parser.add_argument("--log-level", default="INFO", choices=["DEBUG","INFO","WARNING","ERROR"]) 
    return parser.parse_args()

def configure_logging(level: str) -> None:
    logging.basicConfig(
        level=getattr(logging, level),
        format="%(asctime)s %(levelname)s %(name)s — %(message)s",
    )

def main() -> int:
    args = parse_args()
    configure_logging(args.log_level)
    logger = logging.getLogger(LOGGER_NAME)

    logger.info("Loading model: %s", args.model)
    model = load(args.model)

    logger.info("Reading data: %s", args.data)
    df = pd.read_csv(args.data)

    preds = model.predict(df)
    args.out.write_text("\n".join(map(str, preds)))

    logger.info("Wrote %d predictions to %s", len(preds), args.out)
    return 0

if __name__ == "__main__":
    sys.exit(main())
  • Clear entrypoint and exit code
  • Configurable logging level
  • Path-safe IO and no prints
Example 3 — Lightweight project layout and imports
my_project/
  pyproject.toml
  README.md
  src/
    my_project/
      __init__.py
      features.py
      predict.py
  tests/
    test_features.py
"""Feature engineering utilities."""
from __future__ import annotations

from dataclasses import dataclass
from typing import Iterable

import numpy as np
import pandas as pd

@dataclass(frozen=True)
class Binner:
    bins: list[float]

    def transform(self, x: pd.Series) -> pd.Series:
        """Bucketize a numeric series into integer bins."""
        return pd.cut(x, bins=self.bins, labels=False, include_lowest=True)

def to_float(s: pd.Series) -> pd.Series:
    """Convert a series to float, coercing errors to NaN."""
    return pd.to_numeric(s, errors="coerce").astype(float)
  • Imports grouped: stdlib, third-party, local
  • Short, focused modules
  • Docstrings on public API

Exercises

Do these locally or in a notebook cell. Then compare with solutions below. Use the checklist to self-review.

Exercise 1 (matches ex1)

Refactor a function to production quality:

# Given
import pandas as pd

def bad_fn(df):
    if 'x' not in df: print('no x!')
    df['y']=df['x']*2
    return df

# Task:
# - Rename to something descriptive
# - Add type hints and a docstring
# - Validate input and raise a clear exception if 'x' missing
# - Make a copy, avoid mutating caller data
# - Add logging at DEBUG level with row count
Peek a hint
  • Use logging.getLogger(__name__) to create a module logger.
  • Return the transformed DataFrame; avoid prints.

Exercise 2 (matches ex2)

Create a minimal CLI that reads a CSV, selects a column, and writes its mean to a text file with logging and proper exit code.

# Requirements
# - argparse for --data, --column, --out, --log-level
# - logging with a basicConfig format including level and name
# - Validate that the column exists; raise SystemExit(2) on error
# - Write a single float to the output path
Checklist (tick as you finish)
  • Imports at top, grouped by standard/third-party/local
  • Descriptive names and constants for magic values
  • Docstrings on public functions
  • Type hints on function signatures
  • Logging instead of print
  • Clear exceptions and validation
  • No hidden side effects; functions return values

Common mistakes and self-check

  • Prints in library code. Self-check: “Could a scheduled job parse these messages?” Fix: use logging with levels.
  • Mutating input DataFrames. Self-check: “Does caller’s df change?” Fix: always copy when transforming.
  • Long, mixed-responsibility functions. Self-check: “Can I name this in 5 words?” Fix: split into smaller functions.
  • Hidden imports inside functions. Self-check: “Are imports at top?” Fix: move to module top for speed and clarity.
  • Ambiguous names. Self-check: “Could a new teammate guess the purpose?” Fix: rename to business vocabulary.
  • Silent failures. Self-check: “Do I raise on invalid input?” Fix: validate and raise clear exceptions.

Practical projects

  • Turn your notebook feature block into a reusable module with docstrings, types, and unit tests.
  • Write a batch inference CLI with logging and exit codes that your scheduler can run nightly.
  • Create a small "utils" package inside src/ with clear API and import order, plus tests/ for it.

Learning path

  • Before this: Python functions and modules, pandas basics.
  • Now: Production style (this lesson) — focus on readability, logging, and deterministic functions.
  • Next: Packaging, tests, and CI basics; config management; data and model versioning.

Quick Test

Take the short test below to check your understanding. Everyone can take it for free. If you log in, your score and progress will be saved.

Mini challenge

Pick one of your recent notebooks. Extract two reusable functions into a module with types, docstrings, and logging. Replace the notebook cells with calls to your new module. Timebox to 45 minutes.

Optional stretch goal
  • Add a CLI that runs the notebook’s preprocessing and writes a features.parquet file.
  • Write one unit test for each extracted function.

Next steps

  • Finish Exercises 1–2 and ensure every checklist item is ticked.
  • Take the Quick Test to confirm mastery, then move to the next subskill.

Practice Exercises

2 exercises to complete

Instructions

Refactor the given function so it is safe, readable, and reviewable.

import logging
import pandas as pd

# Given

def bad_fn(df):
    if 'x' not in df: print('no x!')
    df['y']=df['x']*2
    return df

# Your tasks:
# 1) Rename function appropriately
# 2) Add type hints and a docstring explaining args/returns/errors
# 3) Validate required column 'x' and raise a clear exception
# 4) Do not mutate caller input
# 5) Add a DEBUG log with resulting row count
Expected Output
A function prepare_y(df: pd.DataFrame) -> pd.DataFrame that returns a copy with a new 'y' column, logs a DEBUG message like 'Prepared 100 rows', and raises KeyError if 'x' is missing.

Production Quality Code Style — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Production Quality Code Style?

AI Assistant

Ask questions about this tool