What is Model Artifact Packaging?
Model artifact packaging is the practice of bundling a trained model together with everything needed to run it reliably: files, code entrypoints, dependencies, metadata, and interface contracts. Think of it as the model's "carry-on bag" that works the same on any machine or cloud.
Why this matters
- You deploy models to different environments (local, CI, staging, prod). Packaging guarantees consistency.
- Engineers need a clear interface: inputs/outputs, versions, and how to load/use the model.
- Security and compliance require traceability: who trained it, with what data/code, and when.
- Rollbacks and A/B tests need reproducible builds and versioned artifacts.
Who this is for
- MLOps Engineers and ML Engineers integrating models into services.
- Data Scientists preparing models for handoff to engineering.
- Platform Engineers maintaining CI/CD for ML.
Prerequisites
- Comfort with Python packaging and virtual environments.
- Basic Docker knowledge (build/run images).
- Familiarity with at least one ML framework (scikit-learn, PyTorch, TensorFlow, or XGBoost).
Concept explained simply
Package = model file + loader + dependencies + a promise about inputs/outputs + metadata that explains the model. When you hand this package to anyone, they can run the model and get the same results.
Mental model
Imagine a lunchbox: it includes the food (model weights), utensils (loader/inference code), nutrition label (metadata), diet notes (dependencies), and portion size (input/output schema). Anyone who opens the lunchbox knows exactly how to eat it and what to expect.
Core components of a good model package
- Model files: e.g., model.joblib, model.pkl, model.ts (TorchScript), model.onnx, or SavedModel/.
- Environment: requirements.txt/conda.yaml/poetry.lock with pinned versions.
- Entrypoint: how to load and run inference (e.g., predict.py or a package module).
- Signature: input/output schema, dtypes, shapes, and example payload.
- Metadata: version, training code commit, framework, Python version, license, and a short model card.
- Integrity: checksums (e.g., SHA256) and possibly a content-addressed path.
Step-by-step: a minimal, portable model artifact
- Create a folder named like model-name/1.0.0 or a commit/SHA.
- Put the model file inside (e.g., model.joblib or model.onnx).
- Add requirements.txt or conda.yaml with pinned versions.
- Add predict.py with a load_model() and predict(payload) function.
- Add signature.json describing expected inputs/outputs.
- Add model-card.md for summary, training data note, and intended use.
- Add checksum.txt with hashes of all files for integrity.
Example structure (expand)
churn_model/
1.0.0/
model.onnx
requirements.txt
predict.py
signature.json
model-card.md
checksum.txt
Worked examples
Example 1 — scikit-learn model to portable artifact
# train_and_package.py
import joblib
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
X, y = load_breast_cancer(return_X_y=True)
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression(max_iter=1000).fit(Xtr, ytr)
joblib.dump(model, "model.joblib")
# requirements.txt (pin versions)
scikit-learn==1.3.2
joblib==1.3.2
numpy==1.26.4
# signature.json (contract)
{
"inputs": [{"name": "X", "type": "float32", "shape": [null, 30]}],
"outputs": [{"name": "proba", "type": "float32", "shape": [null, 2]}]
}
# predict.py (entrypoint)
import json, joblib, numpy as np
def load_model(path="model.joblib"):
return joblib.load(path)
def predict(payload, model=None):
if model is None:
model = load_model()
X = np.array(payload["X"], dtype=np.float32)
proba = model.predict_proba(X)
return {"proba": proba.tolist()}
# usage
# python -c "import json;from predict import predict;print(predict({'X': [[0]*30]}))"
Example 2 — PyTorch model to TorchScript + minimal service
# export_torchscript.py
import torch, torch.nn as nn
class M(nn.Module):
def __init__(self):
super().__init__()
self.fc = nn.Linear(10, 1)
def forward(self, x):
return self.fc(x)
m = M().eval()
example = torch.randn(1, 10)
scripted = torch.jit.trace(m, example)
scripted.save("model.ts")
# requirements.txt
torch==2.1.2+cpu
fastapi==0.109.0
uvicorn==0.25.0
# app.py (entrypoint service)
import torch
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
model = torch.jit.load("model.ts")
class Inp(BaseModel):
X: list[list[float]]
@app.get("/health")
def health():
return {"status": "ok"}
@app.post("/predict")
def predict(inp: Inp):
x = torch.tensor(inp.X, dtype=torch.float32)
y = model(x).detach().numpy().tolist()
return {"y": y}
# Run locally
# uvicorn app:app --host 0.0.0.0 --port 8000
Example 3 — Convert a tree model to ONNX
# xgb_to_onnx.py
import numpy as np
import xgboost as xgb
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from xgboost import XGBClassifier
X, y = load_breast_cancer(return_X_y=True)
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2, random_state=42)
pipe = Pipeline([("scaler", StandardScaler()), ("clf", XGBClassifier(n_estimators=50))])
pipe.fit(Xtr, ytr)
print("AUC:", roc_auc_score(yte, pipe.predict_proba(Xte)[:,1]))
onnx_model = convert_sklearn(
pipe,
initial_types=[("X", FloatTensorType([None, X.shape[1]]))]
)
with open("model.onnx", "wb") as f:
f.write(onnx_model.SerializeToString())
# signature.json
{
"inputs": [{"name": "X", "type": "float32", "shape": [null, 30]}],
"outputs": [{"name": "proba", "type": "float32", "shape": [null, 2]}]
}
Formats and when to use them
- Pickle/joblib: quick for Python-only teams; unsafe to load from untrusted sources.
- ONNX: cross-language, hardware-accelerated runtimes available.
- TorchScript: portable PyTorch runtime without Python source at inference.
- TensorFlow SavedModel: standard for TF with high tooling support.
Common mistakes and how to self-check
- Unpinned dependencies: fix by pinning exact versions in requirements.txt or lockfiles.
- Missing signature: add signature.json and verify with a sample payload.
- Pickle from untrusted sources: avoid or sandbox; prefer safer formats for distribution.
- GPU/CPU mismatch: choose correct framework wheels; document device requirements.
- Oversized images: use slim base images and multi-stage builds to reduce size.
- No integrity checks: include checksums and validate at load time.
- Hidden preprocessing: bundle preprocessing code or parameters alongside the model.
Practical projects
- Build a portable artifact for a classifier with ONNX, include signature and tests.
- Create a Dockerized FastAPI service that loads a TorchScript model and exposes /predict.
- Implement a validation script that checks artifact integrity (hash), dependencies, and signature compliance.
Exercises
These mirror the tasks below in the Exercises section. Do them to reinforce learning.
- Design a portable model artifact (ex1): Create the folder layout, add pinned dependencies, a signature, and a checksum. Validate with a dummy predict.py.
- Containerize and serve (ex2): Write a minimal Dockerfile and serve a predict endpoint for a TorchScript or ONNX model.
Self-checklist
- I can explain the difference between pickle, ONNX, TorchScript, and SavedModel.
- I can show a working entrypoint that loads the model and runs inference.
- All dependencies are pinned; the package runs in a clean environment.
- A signature file documents expected inputs/outputs.
- The artifact has a clear version and checksum.
Learning path
- Start: Package a simple scikit-learn model with joblib and a signature.
- Intermediate: Convert to ONNX or TorchScript and validate outputs match the original.
- Advanced: Containerize the artifact with a REST entrypoint and add integrity checks.
Mini challenge
Replace pickle/joblib with a framework-native or portable format (TorchScript/ONNX), keep the same API, and verify predictions differ by less than 1e-6 on a test batch. Document any numeric drift you observe and why.
Next steps
- Build model servers that scale (async I/O, batching, concurrency).
- Add observability: request logging, latency metrics, and schema validation in production.
- Automate packaging in CI with versioning and a model registry.
Ready? Take the Quick Test
Anyone can take the test. Sign in to have your progress saved automatically.