How to learn Production Quality Code Style for Python in Machine Learning Engineer for free

Who this is for

Machine Learning Engineers and Data Scientists moving code from notebooks to production. If you hand code to others (data platform, backend, DevOps) or deploy batch/online ML, this is for you.

Prerequisites

Comfortable with Python basics (functions, modules, virtual environments).
Basic familiarity with pandas, scikit-learn, and reading/writing files.
Know how to run a script from the command line.

Why this matters

Fewer bugs and outages: clear names, types, and tests make failures obvious.
Faster reviews: consistent style means reviewers focus on logic, not formatting.
Reproducibility: deterministic, well-logged code is easier to rerun and debug.
Handoffs: platform and backend teams can integrate your code without guessing intent.

Real tasks where this shows up

Turning a notebook feature engineering block into a reusable module used by training and inference.
Writing a batch inference script that ops runs nightly with logs and clear exit codes.
Fixing a production bug quickly because logs, types, and small functions isolate the issue.

Concept explained simply

Production code style is the set of readable, repeatable rules that make your Python code predictable for humans and safe for systems. Think of it as traffic rules for your project: consistent lanes (formatting), clear signs (names, docstrings), seatbelts (types, tests), and a dashboard (logging).

Mental model: Code should be easy to read first, then easy to change, and finally easy to run. If someone new can guess what a function does without running it, you’re doing it right.

Core guidelines to follow

1. Structure

Imports at top, one purpose per module, short functions. Keep data IO and business logic separated.

2. Naming

Descriptive, consistent: lower_snake_case for functions/variables, UpperCamelCase for classes, UPPER_SNAKE_CASE for constants.

3. Documentation

Docstrings explain intent, inputs, outputs, and errors. Keep them close to code. Prefer Google/NumPy style consistently.

4. Types

Type hints for function boundaries and tricky variables. They communicate contracts and prevent common mistakes.

5. Logging

Use logging instead of print. Include context (counts, paths, parameters). Choose sensible levels: DEBUG, INFO, WARNING, ERROR.

6. Errors

Fail fast with clear exceptions. Validate inputs at boundaries.

7. Formatting & lint

Auto-format (e.g., Black) and lint (e.g., Ruff). Keep imports ordered and unused code out.

Worked examples

Example 1 — Refactor a messy feature function

Before:

def prep(df):
    df=df.copy()
    df["age"]=df["age"].fillna(0)
    df['country']=df['country'].fillna('UNK')
    from sklearn.preprocessing import StandardScaler
    s=StandardScaler()
    df['x_norm'] = s.fit_transform(df[['x']])
    return df

After (production-style):

from __future__ import annotations

import logging
from typing import Tuple

import pandas as pd
from sklearn.preprocessing import StandardScaler

logger = logging.getLogger(__name__)

def prepare_features(
    df: pd.DataFrame,
    *,
    scaler: StandardScaler | None = None,
) -> tuple[pd.DataFrame, StandardScaler]:
    """
    Clean and transform features for model input.

    Args:
        df: DataFrame with columns ['age', 'country', 'x'].
        scaler: Optional fitted StandardScaler to reuse.

    Returns:
        (df_out, scaler): df_out includes 'x_norm'.

    Raises:
        KeyError: if required columns are missing.
    """
    required = {"age", "country", "x"}
    missing = required.difference(df.columns)
    if missing:
        raise KeyError(f"Missing required columns: {sorted(missing)}")

    out = df.copy()
    out["age"] = out["age"].fillna(0)
    out["country"] = out["country"].fillna("UNK")

    sc = scaler or StandardScaler()
    out["x_norm"] = (
        sc.fit_transform(out[["x"]]) if scaler is None else sc.transform(out[["x"]])
    )

    logger.debug("Prepared %d rows", len(out))
    return out, sc

Imports at top
Type hints and docstring
Input validation and logging
Pure transformation (no prints, no hidden globals)

Example 2 — CLI predict script with logging

from __future__ import annotations

import argparse
import logging
import sys
from pathlib import Path

import pandas as pd
from joblib import load

LOGGER_NAME = "predict_cli"

def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(description="Batch predict")
    parser.add_argument("--model", type=Path, required=True)
    parser.add_argument("--data", type=Path, required=True)
    parser.add_argument("--out", type=Path, required=True)
    parser.add_argument("--log-level", default="INFO", choices=["DEBUG","INFO","WARNING","ERROR"]) 
    return parser.parse_args()

def configure_logging(level: str) -> None:
    logging.basicConfig(
        level=getattr(logging, level),
        format="%(asctime)s %(levelname)s %(name)s — %(message)s",
    )

def main() -> int:
    args = parse_args()
    configure_logging(args.log_level)
    logger = logging.getLogger(LOGGER_NAME)

    logger.info("Loading model: %s", args.model)
    model = load(args.model)

    logger.info("Reading data: %s", args.data)
    df = pd.read_csv(args.data)

    preds = model.predict(df)
    args.out.write_text("\n".join(map(str, preds)))

    logger.info("Wrote %d predictions to %s", len(preds), args.out)
    return 0

if __name__ == "__main__":
    sys.exit(main())

Clear entrypoint and exit code
Configurable logging level
Path-safe IO and no prints

Example 3 — Lightweight project layout and imports

my_project/
  pyproject.toml
  README.md
  src/
    my_project/
      __init__.py
      features.py
      predict.py
  tests/
    test_features.py

"""Feature engineering utilities."""
from __future__ import annotations

from dataclasses import dataclass
from typing import Iterable

import numpy as np
import pandas as pd

@dataclass(frozen=True)
class Binner:
    bins: list[float]

    def transform(self, x: pd.Series) -> pd.Series:
        """Bucketize a numeric series into integer bins."""
        return pd.cut(x, bins=self.bins, labels=False, include_lowest=True)

def to_float(s: pd.Series) -> pd.Series:
    """Convert a series to float, coercing errors to NaN."""
    return pd.to_numeric(s, errors="coerce").astype(float)

Imports grouped: stdlib, third-party, local
Short, focused modules
Docstrings on public API

Exercises

Do these locally or in a notebook cell. Then compare with solutions below. Use the checklist to self-review.

Exercise 1 (matches ex1)

Refactor a function to production quality:

# Given
import pandas as pd

def bad_fn(df):
    if 'x' not in df: print('no x!')
    df['y']=df['x']*2
    return df

# Task:
# - Rename to something descriptive
# - Add type hints and a docstring
# - Validate input and raise a clear exception if 'x' missing
# - Make a copy, avoid mutating caller data
# - Add logging at DEBUG level with row count

Peek a hint

Use logging.getLogger(__name__) to create a module logger.
Return the transformed DataFrame; avoid prints.

Exercise 2 (matches ex2)

Create a minimal CLI that reads a CSV, selects a column, and writes its mean to a text file with logging and proper exit code.

# Requirements
# - argparse for --data, --column, --out, --log-level
# - logging with a basicConfig format including level and name
# - Validate that the column exists; raise SystemExit(2) on error
# - Write a single float to the output path

Checklist (tick as you finish)

Imports at top, grouped by standard/third-party/local
Descriptive names and constants for magic values
Docstrings on public functions
Type hints on function signatures
Logging instead of print
Clear exceptions and validation
No hidden side effects; functions return values

Common mistakes and self-check

Prints in library code. Self-check: “Could a scheduled job parse these messages?” Fix: use logging with levels.
Mutating input DataFrames. Self-check: “Does caller’s df change?” Fix: always copy when transforming.
Long, mixed-responsibility functions. Self-check: “Can I name this in 5 words?” Fix: split into smaller functions.
Hidden imports inside functions. Self-check: “Are imports at top?” Fix: move to module top for speed and clarity.
Ambiguous names. Self-check: “Could a new teammate guess the purpose?” Fix: rename to business vocabulary.
Silent failures. Self-check: “Do I raise on invalid input?” Fix: validate and raise clear exceptions.

Practical projects

Turn your notebook feature block into a reusable module with docstrings, types, and unit tests.
Write a batch inference CLI with logging and exit codes that your scheduler can run nightly.
Create a small "utils" package inside src/ with clear API and import order, plus tests/ for it.

Learning path

Before this: Python functions and modules, pandas basics.
Now: Production style (this lesson) — focus on readability, logging, and deterministic functions.
Next: Packaging, tests, and CI basics; config management; data and model versioning.

Quick Test

Take the short test below to check your understanding. Everyone can take it for free. If you log in, your score and progress will be saved.

Mini challenge

Pick one of your recent notebooks. Extract two reusable functions into a module with types, docstrings, and logging. Replace the notebook cells with calls to your new module. Timebox to 45 minutes.

Optional stretch goal

Add a CLI that runs the notebook’s preprocessing and writes a features.parquet file.
Write one unit test for each extracted function.

Next steps

Finish Exercises 1–2 and ensure every checklist item is ticked.
Take the Quick Test to confirm mastery, then move to the next subskill.

Menu

Production Quality Code Style

Table of Contents

Who this is for

Prerequisites

Why this matters

Concept explained simply

Core guidelines to follow

Worked examples

Exercises

Exercise 1 (matches ex1)

Exercise 2 (matches ex2)

Common mistakes and self-check

Practical projects

Learning path

Quick Test

Mini challenge

Next steps

Practice Exercises

Refactor a transformation function to production quality

Instructions

Expected Output

Write a tiny CLI with logging and validation

Production Quality Code Style — Quick Test

Have questions about Production Quality Code Style?

AI Assistant