luvv to helpDiscover the Best Free Online Tools
Topic 10 of 10

Packaging And Project Structure

Learn Packaging And Project Structure for free with explanations, exercises, and a quick test (for Machine Learning Engineer).

Published: January 1, 2026 | Updated: January 1, 2026

Why this matters

As a Machine Learning Engineer, your code must be reproducible, testable, and easy for teammates to use. Good packaging and project structure lets you:

  • Ship reusable training/inference code as an installable package.
  • Run experiments consistently across machines and CI.
  • Avoid import path chaos in notebooks and scripts.
  • Make model training and serving entry points discoverable.

Real tasks you'll face:

  • Turn a research notebook into a pip-installable module for the team.
  • Expose a CLI (e.g., train-model) to run experiments with config files.
  • Separate package code from data, models, and notebooks to keep repos clean.
  • Version your package and model artifacts reliably.

Concept explained simply

Packaging is how you bundle your Python code so others can install and use it. Project structure is how you arrange files and folders so the project is predictable and maintainable.

Mental model

  • Your package is the product: everything inside src/your_pkg/ is what users import.
  • Your repo is the workshop: docs, tests, data samples, notebooks, and scripts live here but are not necessarily part of the installable package.
  • Entry points are doors: CLIs or functions that teammates can reliably call.

Core components of a solid ML package

  • Use a src layout
    project-root/
      pyproject.toml
      src/
        my_package/
          __init__.py
          dataio.py
          features/
            __init__.py
            build.py
          models/
            __init__.py
            train.py
            predict.py
      tests/
        test_train.py
      notebooks/
        01_exploration.ipynb
      scripts/
        run_training.py
      README.md
    

    Why: prevents accidental imports from the repo root and keeps imports consistent across dev and CI.

  • Declare metadata and dependencies in pyproject.toml

    Define name, version, dependencies, and console scripts. This is the single source of truth for packaging.

  • Prefer absolute imports

    Use from my_package.models import train instead of complex relative imports.

  • Separate code from data

    Never package large datasets or credentials. Keep data in external storage or data/ ignored by packaging.

  • Provide entry points

    Expose training and inference commands via console_scripts so teammates can run them without hunting for files.

  • Versioning

    Use semantic versioning (e.g., 0.1.0). Mirror version in pyproject.toml and src/my_package/__init__.py.

  • Tests

    Place tests in tests/, keep them importable after pip install -e ..

What goes into pyproject.toml?

At minimum:

  • [build-system] with a backend such as setuptools or hatchling.
  • [project] with name, version, dependencies, and optional scripts.
  • [tool.setuptools] (if using setuptools) to configure package discovery from src.

Worked examples

Example 1: Minimal ML package with setuptools

# pyproject.toml
[build-system]
requires = ["setuptools>=68", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "churn_model"
version = "0.1.0"
description = "Churn prediction package"
authors = [{name = "Your Team"}]
requires-python = ">=3.9"
dependencies = [
  "pandas>=1.5",
  "scikit-learn>=1.2",
  "numpy>=1.23"
]

[project.scripts]
train-model = "churn_model.models.train:main"
predict = "churn_model.models.predict:main"

[tool.setuptools.packages.find]
where = ["src"]
# src/churn_model/__init__.py
__version__ = "0.1.0"
# src/churn_model/models/train.py
def main():
    print("Training started...")
    # load data, train, save model
    print("Training finished.")

Install locally (editable) and run:

pip install -e .
train-model

Example 2: src layout prevents import confusion

With src/, importing from a notebook uses the installed package, not local files:

# In a notebook after `pip install -e .`
from churn_model.features.build import make_features

This avoids accidental relative imports like from ..features or relying on the notebook's working directory.

Example 3: Keep data out of the package

Do not include large data or models in the package. Instead, reference paths via config:

# src/churn_model/dataio.py
from pathlib import Path

def get_data_path(env_var="DATA_DIR") -> Path:
    import os
    root = os.environ.get(env_var, "data")
    p = Path(root)
    p.mkdir(parents=True, exist_ok=True)
    return p

Package installs instantly; runtime downloads or loads from data/ after installation.

Step-by-step: turn a notebook into a package

  1. Create structure: add src/your_pkg/ with __init__.py, move reusable code into modules.
  2. Add pyproject.toml: define name, version, dependencies, and console_scripts.
  3. Refactor imports: use absolute imports (from your_pkg.models import train).
  4. Create entry points: small main() functions for train/predict.
  5. Install editable: pip install -e ., then import in notebooks.
  6. Add tests: write basic tests in tests/ to verify imports and core functions.

Exercises

Do these in a fresh folder. Mirror of the Exercises panel below. Note: The quick test is available to everyone; only logged-in users get saved progress.

  1. Exercise 1 — Create pyproject.toml

    Make a package named churn_model (version 0.1.0) using setuptools. Include dependencies: pandas, scikit-learn, numpy. Add scripts: train-model and predict.

  2. Exercise 2 — Add entry points and fix imports

    Create src/churn_model/models/train.py with main() that prints "Training finished." Install in editable mode and run train-model. Use absolute imports if needed.

  3. Exercise 3 — Add version and expose it

    Put __version__ = "0.1.0" in src/churn_model/__init__.py. From Python, verify import churn_model; print(churn_model.__version__) prints 0.1.0.

Exercise checklist

  • pyproject.toml exists with correct build-system and project fields.
  • Package installs with pip install -e . without errors.
  • Entry points run: train-model prints expected lines.
  • Absolute imports succeed from a notebook or REPL.
  • __version__ is accessible.

Common mistakes and self-check

  • No src layout: Imports work locally but fail in CI. Fix: move code to src/ and enable package discovery.
  • Relative import chains: Hard to read and brittle. Fix: use absolute imports from the top-level package.
  • Data included in package: Bloats install. Fix: load data at runtime from external paths.
  • Unpinned critical deps: Repro breaks later. Fix: specify minimal versions; pin in lockfiles or CI as needed.
  • Missing entry points: Teammates guess how to run things. Fix: provide console_scripts with clear names.

Self-check questions:

  • Can a teammate install your package in a clean environment and run training with a single command?
  • Do imports work from any working directory?
  • Is your package free of large data and secrets?

Practical projects

  • Turn a churn prediction notebook into a package with train-model and predict commands.
  • Create a feature engineering subpackage (features/) and add tests validating transformations.
  • Add a simple config system (YAML or JSON) and pass path via CLI flag.

Who this is for

  • ML/AI practitioners moving from notebooks to production-grade code.
  • Engineers who collaborate across teams and need reproducible pipelines.

Prerequisites

  • Comfortable with Python functions, modules, and virtual environments.
  • Basic terminal usage and pip.
  • Familiarity with ML libraries such as scikit-learn.

Learning path

  1. Organize code into src/ and refactor to absolute imports.
  2. Create pyproject.toml and install locally.
  3. Add entry points and basic tests.
  4. Refine structure (features, data IO, models) and document usage in README.

Mini challenge

In under 30 minutes, create a package iris_pipeline with:

  • fit-model CLI that trains a simple classifier on Iris and saves a model file.
  • predict-iris CLI that loads the model and predicts from a CSV.
  • Both commands should print concise status messages and exit cleanly.

Next steps

  • Add tests that run in CI.
  • Introduce type hints and a linter configuration via pyproject.toml.
  • Document how to run training and prediction with examples.

Practice Exercises

3 exercises to complete

Instructions

Initialize a Python package named churn_model using setuptools. Create pyproject.toml with:

  • name: churn_model
  • version: 0.1.0
  • dependencies: pandas, scikit-learn, numpy
  • console scripts: train-model and predict
  • package discovery from src/
Expected Output
pip install -e . completes without errors; import churn_model works.

Packaging And Project Structure — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Packaging And Project Structure?

AI Assistant

Ask questions about this tool