Why this matters
As a Machine Learning Engineer, you will evolve features over time: fix bugs, change time windows, swap data sources, or optimize transforms. Without clear versioning, you risk training/serving mismatches, broken models, and unreproducible results. Proper feature versioning lets you safely iterate, roll back, A/B test, and audit past model behavior.
- Real tasks you will face:
- Rolling out a new definition for a user activity metric while keeping old models stable.
- Running an A/B compare of two feature logic variants before full cutover.
- Reproducing a model from 6 months ago for an audit, using the exact feature definitions.
Concept explained simply
Feature versioning means treating each meaningful change to a feature's logic, window, schema, or source as a new version. You keep multiple versions side-by-side and record the metadata to reproduce results later.
Mental model
Think of features like APIs. A version guarantees a contract: input keys, transformation logic, and output schema. When the contract changes, publish a new version instead of silently changing the old one.
Core components of a feature version
- Feature identity: name, entity key(s), and version (e.g., v1.2.0).
- Transformation signature: code hash (e.g., Git SHA), parameters, and dependency versions.
- Data sources: source identifiers and snapshot/backfill references.
- Temporal semantics: window length, aggregation function, timestamp column, time zone.
- Schema: data type, nullability, default handling.
- Operational: owner, description, freshness/TTL, expected update cadence.
- Lineage and changelog: what changed, why, who approved.
When to bump the version
Use semantic versioning as a simple policy:
- MAJOR (X.0.0) — behavior/meaning changes:
- Change in time window (7d to 30d), different join keys, or new data source with different coverage.
- Aggregation switch (mean to sum) or unit change (USD to EUR).
- MINOR (0.X.0) — same meaning with bounded output drift:
- Bug fix in edge cases, improved null handling, stricter filtering while preserving semantics.
- Adding optional fields in a struct while keeping existing fields intact.
- PATCH (0.0.X) — no output change by design:
- Performance refactor, internal code reorganization, infra-only updates.
Tip: set numeric drift thresholds
Define a policy: if new vs old feature shows more than N% distribution shift (e.g., KS statistic or PSI threshold), treat as MAJOR; if small but non-zero, MINOR; zero by design, PATCH.
Naming strategies
- Separate name + version field (preferred): feature name is stable (e.g., user_txn_amount) and version is a stored attribute (v1.1.0).
- Version suffix in name: encode version into the name (user_txn_amount__v2). Simple for consumers, but names proliferate quickly.
Guideline to choose
- If your tooling supports native version metadata and aliases, use separate version fields.
- If your registry is simple or you need easy A/B wiring, suffix naming can be practical.
Worked examples
Example 1: Window change 7d → 30d
- Old: user_pageviews_7d v1.2.0 (sum over last 7 days).
- New: user_pageviews_30d (sum over last 30 days).
- Impact: distribution and meaning change. Bump MAJOR: v2.0.0.
- Rollout: dual-write v1 and v2, backfill 30d offline, canary in serving, evaluate lift, switch consumers, deprecate v1 after stability period.
Example 2: Bug fix in null handling
- Old: avg_spend_90d v1.0.0 treats null amounts as zero.
- Fix: exclude null transactions from denominator.
- Impact: same meaning, mild drift. Bump MINOR: v1.1.0.
- Test: correlation and drift checks vs v1.0.0; document change and expected sign of drift.
Example 3: Infra refactor with UDF optimization
- Refactor UDF for speed; outputs must remain identical.
- Impact: by-design no change. Bump PATCH: v1.1.1 → v1.1.2.
- Guardrail: golden test set must match bit-for-bit before release.
Step-by-step release workflow
- Plan — define change type (MAJOR/MINOR/PATCH), owners, acceptance criteria, and deprecation plan.
- Implement — update code; record code hash, parameters, and source IDs.
- Backfill — produce offline data for the training window; snapshot with timestamp and metadata.
- Validate — run drift and consistency tests:
- For MINOR/PATCH: high correlation and small drift.
- For MAJOR: document expected change and performance impacts.
- Dual-write — publish old and new versions in parallel; monitor serving metrics.
- Canary — route a small percentage of traffic to models using the new version; watch latency and stability.
- Cutover — switch consumers or update aliases to new version.
- Deprecate — announce freeze date; delete after retention window.
Exercises
Complete these to apply the concepts. A quick test is also available below. Everyone can take the test; only logged-in users will have saved progress.
Exercise 1 (mirrors ex1): Decide the version bump
Scenario: You maintain feature user_avg_order_value_30d v1.3.2. You now:
- Switch currency normalization from USD to local currency per user.
- Keep 30d window, same aggregation, new exchange rate source.
Task: Choose MAJOR/MINOR/PATCH, propose the next version string, and list 3 rollout steps.
Exercise 2 (mirrors ex2): Draft a migration plan
Scenario: You replace user_sessions_7d v1.2.0 with user_sessions_14d. Training must stay on 7d for two weeks while you experiment in serving.
Task: Write a short plan with: backfill, dual-write, canary, aliasing/consumer updates, and deprecation timing.
Exercise checklist before you submit
- You justified the version bump type with semantics and expected drift.
- You included code/source metadata (hash, source IDs) in the plan.
- You described dual-write and canary, not just a hard switch.
- You stated a deprecation/retention period.
Common mistakes and how to self-check
- Silent updates without a new version. Self-check: Can you reproduce last month's training dataset bit-for-bit? If not, you likely changed a version in place.
- Using MINOR when semantics changed. Self-check: Did units, window, or keys change? If yes, MAJOR.
- No backfill before serving cutover. Self-check: Can the next model train immediately on the new version? If not, you skipped backfill.
- Missing code hash and parameters. Self-check: Could you rebuild the exact transform container or job? If unsure, metadata is incomplete.
- Deleting old versions too early. Self-check: Any active consumer or audit requirement depends on the old version?
Practical projects
- Versioning policy document: Write a one-pager policy for MAJOR/MINOR/PATCH with numeric thresholds and examples.
- Golden set validator: Create a small script that computes correlation, KS statistic, and PSI between two feature versions.
- Dual-write pipeline: Implement a job that writes both v1 and v2, tagging records with version and job run ID.
Learning path
- Before this: Basics of feature stores and offline/online consistency.
- Now: Feature versioning semantics, metadata, rollout strategies.
- Next: Feature lineage and monitoring (drift, freshness, null rates) to maintain versions in production.
Who this is for
- Machine Learning Engineers deploying features to production.
- Data Scientists who own feature logic and need reproducibility.
- Data/ML Platform Engineers designing feature registries.
Prerequisites
- Basic understanding of feature stores (offline/online, entities, timestamps).
- Ability to read ETL/ELT code and simple aggregations.
- Familiarity with A/B testing or canary releases.
Next steps
- Adopt a versioning policy in your team and run a pilot on one feature.
- Automate metadata capture (code hash, parameters) in your pipelines.
- Set up dashboards for drift and adoption of new versions.
Mini challenge
You discover that user_country inferred from IP was replaced by billing_country for GDPR reasons. Window and aggregations unchanged, but segment distributions shift. What version bump and why? Write 3 bullet points on rollout risk mitigation.
Suggested direction
- Likely MAJOR: source and coverage semantics changed; downstream models may shift.
- Dual-write and monitor segment-level metrics; add guardrails for regional bias.
- Communicate change to stakeholders and set deprecation window for the IP-based version.
Quick test note
Take the Feature Versioning — Quick Test below to check your understanding. Everyone can take it for free; only logged-in users will have saved progress.