Why this matters
As a Machine Learning Engineer, your models, code, and data transformations must move reliably from development to production. Packaging and publishing artifacts lets teams:
- Reproduce exact training and inference environments
- Promote tested builds across dev → staging → production
- Roll back safely when something breaks
- Share components (models, features, pipelines) across teams
Typical on-the-job tasks:
- Build a Python wheel for a feature library and publish it to an internal package index
- Bundle a trained model with metadata and checksums, then push to a model registry or artifact repository
- Create a Docker image for inference and tag it with version and git SHA
- Automate promotion of signed, scanned artifacts through environments
Note on progress
You can take the quick test without logging in. Only logged-in users will have their progress saved.
Who this is for
- Machine Learning Engineers and Data Scientists moving from notebooks to production
- DevOps/Platform Engineers supporting ML services
- Anyone building reproducible ML pipelines
Prerequisites
- Basic Python packaging (setup.cfg/pyproject.toml) and virtual environments
- Familiarity with Docker images and tags
- Comfort with Git and semantic versioning (e.g., 1.4.2)
- Knowing what an artifact repository or container registry is
Concept explained simply
An artifact is a packaged, versioned output you can store and reuse: a Python wheel, a Docker image, a model file (.pt/.pkl/.onnx), or a dataset snapshot. In CI/CD, you build artifacts once, test them, sign/scan them, then publish them to a repository. Deployments pull exactly those versions so what you tested is what you run.
Mental model
Think of artifacts as sealed boxes with labels:
- The box: a wheel, a container image, or a model bundle
- Labels: version, commit SHA, build time, metadata (framework, metrics)
- Seals: checksum/signature to ensure integrity
- Warehouse: artifact repository or model registry
CI builds the box, applies labels and seals, and puts it in the warehouse. Deployments only take boxes from the warehouse, not from someone’s laptop.
Core principles
- Immutability: once published, an artifact with a tag/version never changes
- Determinism: the same source and config should produce the same artifact
- Traceability: every artifact links to its Git commit, build logs, and tests
- Promotion: move the exact artifact across environments
- Security: scan, sign, and verify before releasing
Worked examples
Example 1: Package a Python feature library as a wheel
- Create pyproject.toml and a src layout
- Build wheel and publish to an internal index
# pyproject.toml (minimal sample)
[build-system]
requires = ["setuptools>=68", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "featurekit"
version = "0.1.3"
description = "Reusable feature transforms"
authors = [{name = "Your Team"}]
readme = "README.md"
requires-python = ">=3.9"
dependencies = ["pandas>=2.0", "numpy>=1.24"]
# Build
python -m build --wheel
# Publish (example command; configure your repository URL and token in env)
twine upload --repository-url $PYPI_URL -u $USER -p $TOKEN dist/*
Result: an immutable wheel like featurekit-0.1.3-py3-none-any.whl is available for pipelines to install with pip.
Example 2: Bundle a trained model with metadata and checksum
- Save model and attach metadata in a manifest
- Compute a checksum for integrity
- Push the bundle to an artifact repository/model registry
# directory structure
model_bundle/
model.onnx
manifest.json
metrics.json
sha256.txt
# manifest.json (sample)
{
"name": "churn-model",
"version": "1.5.0",
"git_sha": "<commit>",
"framework": "onnx-1.15",
"python": "3.10",
"train_time": "2025-09-14T10:20:00Z",
"features": ["tenure", "monthly_charges", "contract_type"],
"intended_use": "batch_inference",
"notes": "calibrated with temperature scaling"
}
# checksum
printf "$(sha256sum model_bundle/model.onnx)\n" > model_bundle/sha256.txt
Store model_bundle as a single archive (e.g., churn-model-1.5.0.tgz). Your CI job uploads it to an artifact store. Downstream jobs verify the checksum before deploying.
Example 3: Build and tag an inference Docker image
# Dockerfile (minimal)
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENV PORT=8080
CMD ["python", "serve.py"]
# Build and tag with version and commit
VERSION=1.5.0
GIT_SHA=$(git rev-parse --short HEAD)
IMAGE="registry.example.com/ml/churn-infer:${VERSION}-${GIT_SHA}"
docker build -t "$IMAGE" .
# Optional: sign/scan steps go here
# Push
docker push "$IMAGE"
Downstream environments deploy the exact tag ${VERSION}-${GIT_SHA} to guarantee traceability.
Example 4: Promote the exact artifact
Promotion should not rebuild. Instead, retag the already-pushed image (or mark the model version) after staging tests pass:
# Retag for production without rebuilding
docker pull "$IMAGE"
docker tag "$IMAGE" "registry.example.com/ml/churn-infer:${VERSION}-prod"
docker push "registry.example.com/ml/churn-infer:${VERSION}-prod"
Example 5: SBOM and provenance (optional)
Generate a software bill of materials (SBOM) and attach it as an artifact. Store build provenance (who built, when, from which commit). These boost compliance and debugging.
Minimum viable artifact pipeline (step-by-step)
- Build artifacts once: wheel, model bundle, container image
- Attach metadata: version, git SHA, build time, metrics
- Verify: run tests; compute checksum; optionally scan/sign
- Publish: push to artifact and container registries
- Promote: retag or mark versions across environments
- Deploy: downstream pulls by exact version/tag only
Common mistakes and self-check
- Mistake: Rebuilding during promotion. Fix: Promote by retagging or marking an existing artifact only.
- Mistake: Floating tags like latest. Fix: Require immutable tags (version+git SHA).
- Mistake: Missing metadata. Fix: Enforce manifest fields in CI.
- Mistake: No integrity check. Fix: Store and verify checksums/signatures.
- Mistake: Hidden dependencies. Fix: lock files or explicit versions; include runtime system deps in the image.
Self-check:
- Can you trace a production artifact back to its commit and tests?
- If staging passes, can you promote to production without rebuilding?
- Can a teammate reproduce the artifact locally using the manifest?
Exercises
Try these hands-on tasks. The same exercises are listed below with solutions and expected outputs.
Exercise 1: Build and verify a Python wheel
- Create a minimal package (src layout) with pyproject.toml
- Build the wheel
- Install it in a fresh venv to verify import
# expected: a file like dist/featurekit-0.1.0-py3-none-any.whl
Hints
- Use python -m build --wheel
- Use python -m venv .venv and pip install dist/*.whl
Show solution
See the Exercise 1 solution in the Exercises section below.
Exercise 2: Build and tag a Docker image with version+git SHA
- Set VERSION and GIT_SHA variables
- Build, tag, and run docker image locally
- List images and confirm the tag includes both
# expected: an image like registry.local/app:0.1.0-a1b2c3d
Hints
- Use git rev-parse --short HEAD
- docker build -t <image:tag> . then docker images
Show solution
See the Exercise 2 solution in the Exercises section below.
Checklist: good artifacts before publish
- [ ] Version and git SHA embedded in name or labels
- [ ] Manifest with framework, Python version, and intended use
- [ ] Tests passed and results archived
- [ ] Checksum/signature generated and stored
- [ ] Image/package scanned (if available)
- [ ] Publication to registry completes with immutable tags
Practical projects
- Project A: Create a reusable feature library as a wheel and deploy it in two pipelines (training and inference)
- Project B: Train a model, bundle with manifest and metrics, and publish; write a small script to verify checksum and load the model
- Project C: Build an inference image with health endpoint; implement retag-based promotion from staging to production
Learning path
- Now: Packaging and publishing artifacts (this page)
- Next: Environment promotion and release strategies (blue/green, canary)
- Then: Continuous monitoring, rollbacks, and incident response
- Security: Scanning, signing, and SBOM generation
Next steps
- Automate the build-and-publish flow in your CI
- Enforce immutable tags and metadata checks
- Add integrity checks and optional signing before promotion
Mini challenge
Within your current project, pick one artifact (wheel, model bundle, or Docker image). Add missing metadata, embed git SHA in its tag, generate a checksum, and publish it once. Demonstrate promotion to the next environment without rebuilding.
Quick test
Take the test below to confirm understanding. Anyone can take it; saved progress is available to logged-in users.