How to learn Logging And Config Management for Python in Machine Learning Engineer for free

Why this matters

As a Machine Learning Engineer, your models run in long training jobs, pipelines, and production services. When things go wrong or performance drifts, you must answer: What happened? With which parameters? On which data? Logging gives you an auditable trail; configuration management ensures runs are reproducible and controllable.

Training: capture hyperparameters, seeds, dataset versions, metrics, and time-to-epoch.
Batch inference: trace which model and config processed which batch, with correlation IDs.
Online inference: structured logs for each request, latency breakdown, and error details without leaking sensitive data.
On-call: quickly filter logs by level, component, or request ID to locate issues.

Concept explained simply

Logging is your systems diary. Config management is how you decide what your system should do before it runs.

Logging: record significant events with levels (DEBUG, INFO, WARNING, ERROR, CRITICAL) and context (model version, run_id).
Config management: load parameters from defaults, files, environment variables, and CLI flags, in a clear precedence order.

Mental model: Flight recorder + switchboard

Think of logging as a flight recorder: it continuously writes structured facts you can replay later. Config is the switchboard where you set dials before takeoffdataset path, learning rate, log levelso runs are deliberate and repeatable.

Essential pieces you will use

Logger hierarchy: loggers (getLogger(__name__)), handlers (Console, File), formatters (text/JSON), filters, levels.
Configuration: logging.config.dictConfig for centralized setup; avoid ad hoc basicConfig in production.
Structured logs: key=value or JSON fields for easy parsing (e.g., run_id, request_id, model_version).
Config sources: defaults in code, file (JSON/YAML), environment variables, CLI flags. Prefer precedence: CLI > ENV > file > defaults.
Secrets: never log API keys, tokens, passwords. Scrub before logging.
Reproducibility: always log seeds, data snapshot IDs, code version (git SHA), and config used for a run.

Worked examples

Example 1: Minimal robust logger (console + file, level from ENV)

Goal: human-readable console logs and a rotating file for deeper debugging.

import logging, os
from logging.config import dictConfig

LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO")
LOG_FILE = os.getenv("LOG_FILE", "logs/train.log")

LOGGING = {
    "version": 1,
    "disable_existing_loggers": False,
    "formatters": {
        "console_fmt": {"format": "%(levelname)s %(name)s: %(message)s"},
        "file_fmt": {"format": "%(asctime)s %(levelname)s %(name)s run=%(run_id)s: %(message)s"}
    },
    "filters": {
        "run_ctx": {
            "()": lambda: type("F", (), {"filter": lambda s, r: (r.__dict__.setdefault("run_id", os.getenv("RUN_ID", "local")) or True)})()
        }
    },
    "handlers": {
        "console": {
            "class": "logging.StreamHandler",
            "level": LOG_LEVEL,
            "formatter": "console_fmt"
        },
        "file": {
            "class": "logging.handlers.TimedRotatingFileHandler",
            "level": "DEBUG",
            "formatter": "file_fmt",
            "filename": LOG_FILE,
            "when": "midnight",
            "backupCount": 7
        }
    },
    "root": {
        "level": "DEBUG",
        "handlers": ["console", "file"]
    }
}

dictConfig(LOGGING)
logger = logging.getLogger(__name__)

logger.info("Starting training")
logger.debug("Batch loaded: size=128")
logger.warning("Validation accuracy plateaued")

What you get

Console (filtered by LOG_LEVEL):

INFO __main__: Starting training
WARNING __main__: Validation accuracy plateaued

File (DEBUG+ with run_id and timestamp):

2026-01-01 00:00:00,000 INFO __main__ run=local: Starting training
2026-01-01 00:00:00,001 DEBUG __main__ run=local: Batch loaded: size=128
2026-01-01 00:00:00,002 WARNING __main__ run=local: Validation accuracy plateaued

Example 2: Structured JSON logs for services

Goal: machine-parseable logs. Implement a tiny JSON formatter.

import logging, json, time

class JsonFormatter(logging.Formatter):
    def format(self, record):
        payload = {
            "ts": time.strftime("%Y-%m-%dT%H:%M:%S", time.gmtime(record.created)),
            "level": record.levelname,
            "logger": record.name,
            "msg": record.getMessage(),
        }
        # Add custom fields if present
        for key in ("run_id", "request_id", "model_version"):
            if key in record.__dict__:
                payload[key] = record.__dict__[key]
        return json.dumps(payload)

handler = logging.StreamHandler()
handler.setFormatter(JsonFormatter())
logger = logging.getLogger("inference")
logger.setLevel(logging.INFO)
logger.handlers = [handler]

logger.info("predict start", extra={"request_id": "abc123", "model_version": "v5"})

What you get

{"ts": "2026-01-01T00:00:00", "level": "INFO", "logger": "inference", "msg": "predict start", "request_id": "abc123", "model_version": "v5"}

Example 3: Simple config system (defaults + file + ENV + CLI)

Goal: predictable overrides without external dependencies.

import os, json, argparse
from dataclasses import dataclass

@dataclass
class TrainingConfig:
    data_path: str = "./data/train.csv"
    lr: float = 1e-3
    epochs: int = 10
    seed: int = 42
    log_level: str = "INFO"

    @staticmethod
    def from_sources(file_path: str | None, args: list[str] | None = None) -> "TrainingConfig":
        cfg = TrainingConfig()
        # 1) File (JSON) if provided
        if file_path and os.path.exists(file_path):
            with open(file_path) as f:
                d = json.load(f)
            for k, v in d.items():
                if hasattr(cfg, k):
                    setattr(cfg, k, v)
        # 2) ENV (prefix: APP_)
        env_map = {
            "data_path": os.getenv("APP_DATA_PATH"),
            "lr": os.getenv("APP_LR"),
            "epochs": os.getenv("APP_EPOCHS"),
            "seed": os.getenv("APP_SEED"),
            "log_level": os.getenv("APP_LOG_LEVEL"),
        }
        for k, v in env_map.items():
            if v is not None:
                cast = type(getattr(cfg, k))
                setattr(cfg, k, cast(v))
        # 3) CLI overrides
        parser = argparse.ArgumentParser(add_help=False)
        parser.add_argument("--data_path")
        parser.add_argument("--lr", type=float)
        parser.add_argument("--epochs", type=int)
        parser.add_argument("--seed", type=int)
        parser.add_argument("--log_level")
        ns, _ = parser.parse_known_args(args)
        for k, v in vars(ns).items():
            if v is not None:
                setattr(cfg, k, v)
        return cfg

# Example usage:
# cfg = TrainingConfig.from_sources("config.json", ["--lr", "0.0005", "--epochs", "20"])

Behavior

Defaults fill missing values.
config.json overrides defaults.
Environment (APP_*) overrides file.
CLI overrides everything else.

Example 4: Correlation ID across a service

Goal: attach request_id to every log line without repeating extra={...} each time.

import logging

class RequestAdapter(logging.LoggerAdapter):
    def process(self, msg, kwargs):
        kwargs.setdefault("extra", {})
        kwargs["extra"]["request_id"] = self.extra.get("request_id", "-")
        return msg, kwargs

base = logging.getLogger("svc")
base.setLevel(logging.INFO)
base.addHandler(logging.StreamHandler())

logger = RequestAdapter(base, {"request_id": "abc123"})
logger.info("decode input")
logger.warning("timeout talking to feature store")

What you get

INFO svc: decode input
WARNING svc: timeout talking to feature store

Internally, each record carries request_id, which formatters can print or serialize.

Exercises

Do these in order. The Quick Test at the end checks your understanding.

Exercise 1 (ex1): Configure logging with dictConfig to log INFO to console and DEBUG+ to a file logs/train.log. Add run_id to file logs. Prove that DEBUG appears only in file.
Exercise 2 (ex2): Build a TrainingConfig (dataclass) that loads from config.json, then APP_* env vars, then CLI flags, in that precedence. Print the final config and log it at startup.
Exercise 3 (ex3): Use LoggerAdapter (or a Filter) to add request_id to all logs in a function handling inference. Show two different request IDs across two calls.

Checklist before taking the test

I can explain the difference between logger, handler, and formatter.
I can switch log levels via an environment variable.
I can persist DEBUG logs to a file while keeping console cleaner.
I can load config from file and override it with ENV and CLI.
I can attach a run_id or request_id to every log line.

Common mistakes and self-checks

Using print instead of logging: prints lack levels, handlers, and context. Replace prints with logger calls.
No single place to configure logging: spread basicConfig calls lead to duplicates. Centralize with dictConfig.
Logging secrets: API keys or tokens must be redacted. Add scrubbing or avoid logging raw payloads.
Unstructured logs: hard to filter or analyze. Prefer key=value or JSON fields for identifiers and metrics.
Wrong precedence of config sources: document and enforce CLI > ENV > file > defaults.
Too verbose in production: set appropriate level (INFO/WARNING) and route DEBUG to files only.
Handler duplication: adding handlers each import/run. Clear or guard against duplicates before adding.

Self-check prompts

If latency spikes, can you filter logs by request_id and see where time was spent?
Given a run from last week, can you reconstruct lr, data snapshot, and seed from logs/config alone?
Can you turn on DEBUG without code changes (via env/CLI)?

Practical projects

Training pipeline logger: add JSON logs with run_id, epoch, loss, lr, and time per step; write to rotating files.
Inference microservice: implement LoggerAdapter for request_id and model_version; console JSON logs only.
Config pack: a small library that loads defaults + file + ENV + CLI with type casting and prints a redacted config summary at startup.

Who this is for

Machine Learning Engineers deploying training and inference systems.
Data Scientists transitioning to production-grade ML workflows.
Backend engineers integrating ML services.

Prerequisites

Comfortable with Python basics (functions, modules, imports).
Familiar with the standard library (os, argparse, dataclasses).
Basic command-line usage and environment variables.

Learning path

Step 1: Recreate Example 1 and change LOG_LEVEL/LOG_FILE via environment variables.

Step 2: Implement Example 3, verify precedence (defaults < file < ENV < CLI).

Step 3: Add structured JSON logging (Example 2) to your inference script.

Step 4: Introduce correlation IDs with LoggerAdapter (Example 4) and simulate concurrent requests.

Step 5: Run the exercises and take the Quick Test.

Next steps

Add metrics (timers/counters) alongside logs to observe performance.
Integrate model/version info into every log line for auditability.
Automate log rotation and retention policies in your deployment environment.

Mini challenge

Create a small CLI training script that:

Loads config with the described precedence.
Logs INFO to console, DEBUG to a rotating file.
Includes run_id and seed in every line.
Can toggle JSON logs on/off via a CLI flag.

Quick Test

The test is available to everyone. Log in to save your progress so you can resume later.

Menu

Logging And Config Management

Table of Contents

Why this matters

Concept explained simply

Essential pieces you will use

Worked examples

Example 1: Minimal robust logger (console + file, level from ENV)

Example 2: Structured JSON logs for services

Example 3: Simple config system (defaults + file + ENV + CLI)

Example 4: Correlation ID across a service

Exercises

Common mistakes and self-checks

Practical projects

Who this is for

Prerequisites

Learning path

Next steps

Mini challenge

Quick Test

Practice Exercises

Dual-output logging with dictConfig

Instructions

Expected Output

Config with precedence: defaults < file < ENV < CLI

Attach request_id to every log without repeating extra={}

Logging And Config Management — Quick Test

Have questions about Logging And Config Management?

AI Assistant