Why this matters
As a Machine Learning Engineer, you maintain and ship models in production. OOP and design patterns help you:
- Keep training and inference code clean, testable, and reusable.
- Swap components (preprocessors, models, metrics, storage) without rewriting pipelines.
- Encapsulate complexity behind simple interfaces for faster iteration and safer changes.
Real ML tasks made easier by OOP
- Switching from a linear model to a tree model with minimal code edits.
- Trying different scalers/encoders via a single configuration flag.
- Wrapping a custom deep-learning trainer to look like a scikit-learn estimator.
- Exposing a simple "train-evaluate-serve" facade to orchestrate complex pipelines.
Concept explained simply
Object-Oriented Programming (OOP) is about organizing code around objects that bundle data with behavior.
- Encapsulation: Keep related data and methods together; hide internals.
- Abstraction: Expose only what callers need; hide the rest behind a clear interface.
- Inheritance: Reuse and specialize behavior from a base class.
- Polymorphism: Different classes share the same interface, so you can swap them easily.
Mental model
Think of your ML system as interchangeable Lego blocks: data loaders, preprocessors, models, metrics. If each block exposes a common shape (interface), you can swap blocks without breaking the build.
Key OOP terms at a glance
- Class: A blueprint for objects.
- Object: An instance of a class.
- Interface (in Python, usually protocols/ABCs): A contract of methods to implement.
- Composition: Building complex objects by combining simpler ones.
Design patterns for ML codebases
- Strategy: Swap algorithms via a shared interface (e.g., different scalers or optimizers).
- Factory: Create objects from config values without spreading "if/else" across code.
- Adapter: Wrap a third-party tool to match your project’s interface.
- Facade: Provide a simple method (e.g., run_pipeline()) that orchestrates many steps.
- Builder: Construct complex objects step-by-step (e.g., model + preprocessors + metrics from config).
Worked examples
1) Strategy pattern for preprocessing
from abc import ABC, abstractmethod
class Scaler(ABC):
@abstractmethod
def transform(self, xs):
pass
class Standardize(Scaler):
def transform(self, xs):
m = sum(xs) / len(xs)
return [x - m for x in xs]
class MinMax(Scaler):
def transform(self, xs):
mn, mx = min(xs), max(xs)
rng = (mx - mn) or 1
return [(x - mn) / rng for x in xs]
class Preprocessor:
def __init__(self, scaler: Scaler):
self.scaler = scaler
def run(self, xs):
return self.scaler.transform(xs)
xs = [0, 5, 10]
print(Preprocessor(Standardize()).run(xs)) # [-5, 0, 5]
print(Preprocessor(MinMax()).run(xs)) # [0.0, 0.5, 1.0]
Swapable scaler strategies allow experimentation with a single line change.
2) Factory to build models from config
class LinearModel:
def fit(self, X, y):
print("Fitting LinearModel")
def predict(self, X):
return [0 for _ in X]
class TreeModel:
def fit(self, X, y):
print("Fitting TreeModel")
def predict(self, X):
return [1 for _ in X]
def create_model(kind: str):
registry = {
"linear": LinearModel,
"tree": TreeModel,
}
try:
return registry[kind]()
except KeyError:
raise ValueError(f"Unknown model kind: {kind}")
for k in ["linear", "tree"]:
m = create_model(k)
m.fit([[1],[2],[3]],[0,1,0])
print(type(m).__name__, m.predict([[10],[20],[30]]))
Centralized creation simplifies config-driven pipelines.
3) Adapter to unify third-party interfaces
# Third-party style
class ThirdPartyNet:
def train(self, X, y):
print("Training net")
def infer(self, X):
return [42 for _ in X]
# Our project expects fit/predict
class NetAdapter:
def __init__(self, net: ThirdPartyNet):
self.net = net
def fit(self, X, y):
return self.net.train(X, y)
def predict(self, X):
return self.net.infer(X)
model = NetAdapter(ThirdPartyNet())
model.fit([[1],[2]], [0,1])
print(model.predict([[3],[4]])) # [42, 42]
Adapters reduce refactoring when tools use different method names.
Step-by-step practice (15–25 min)
- Define interfaces: Sketch abstract base classes for preprocessor and model with minimal methods.
- Implement two strategies: E.g., Standardize and MinMax scalers.
- Add a factory: Map a config string to a model class.
- Write a tiny demo: Fit, predict, and print outputs for a small dataset.
Tip: keep inputs tiny
Use small lists (e.g., [0, 5, 10]) and short prints so you can visually verify behavior quickly.
Exercises
These exercises mirror the tasks below. You can run them in your IDE or a notebook.
Exercise 1: Strategy for scaling
Implement two scalers (Standardize, MinMax) sharing a common interface and demonstrate swapping them with the same preprocessor context. See the Exercises list for full instructions.
Exercise 2: Model factory
Create a factory that returns a simple model by name ("linear" or "tree") and print predictions to verify correct object creation.
Checklist before you submit
- Interfaces are minimal and focused.
- Swapping the strategy changes output without changing the caller code.
- Factory raises a helpful error for unknown names.
- Printed outputs match the expected output exactly.
Common mistakes and self-check
- Mistake: Overusing inheritance. Prefer composition unless you truly need specialization.
- Mistake: Leaky abstractions (callers need internal details). Self-check: Can you swap implementations without changing caller code?
- Mistake: God classes doing everything. Self-check: Each class has a single clear purpose.
- Mistake: Factories hidden in random places. Self-check: Centralize object creation.
How to self-review
- Rename a class and swap it with another implementation. Does the rest of the code still run?
- Remove a method from the interface. Do only the implementers break (good) or does calling code also break (bad)?
Mini challenge
Wrap a simple PyTorch-like or custom net (train/infer) with an Adapter that exposes fit/predict. Add a Factory entry to create it by name ("net"). Verify that a pipeline function using fit/predict works unchanged for both a scikit-style model and your adapted net.
Who this is for
- Aspiring ML Engineers wanting cleaner, testable pipelines.
- Data Scientists moving prototypes into production.
- Python developers supporting ML systems.
Prerequisites
- Python basics: functions, classes, modules.
- Comfort with lists/dicts and simple math operations.
- Familiarity with NumPy/Pandas is helpful but not required.
Learning path
- OOP foundations: classes, objects, inheritance, polymorphism.
- Core patterns: Strategy, Factory, Adapter, Facade.
- Typing and interfaces: ABCs and Protocols for safer APIs.
- Packaging and structure: modules, folders, and imports for clarity.
- Testing: unit tests for each block; integration tests for the pipeline.
Practical projects
- Refactor a notebook into a package with Strategy scalers and a Factory model builder.
- Build a tiny training Facade: load data, preprocess, train, evaluate in a single function.
- Create an Adapter for a custom model to match scikit-learn’s fit/predict API.
Next steps
- Apply Strategy and Factory to one of your past ML scripts.
- Add minimal tests for each class. Aim for fast feedback.
- Take the Quick Test below. Everyone can take it; logged-in users get saved progress.
Quick Test
Answer a few questions to check your understanding. Pass mark: 70%. Everyone can take the test; only logged-in users will have progress saved.