luvv to helpDiscover the Best Free Online Tools
Topic 5 of 9

Feature Versioning

Learn Feature Versioning for free with explanations, exercises, and a quick test (for Machine Learning Engineer).

Published: January 1, 2026 | Updated: January 1, 2026

Why this matters

As a Machine Learning Engineer, you will evolve features over time: fix bugs, change time windows, swap data sources, or optimize transforms. Without clear versioning, you risk training/serving mismatches, broken models, and unreproducible results. Proper feature versioning lets you safely iterate, roll back, A/B test, and audit past model behavior.

  • Real tasks you will face:
    • Rolling out a new definition for a user activity metric while keeping old models stable.
    • Running an A/B compare of two feature logic variants before full cutover.
    • Reproducing a model from 6 months ago for an audit, using the exact feature definitions.

Concept explained simply

Feature versioning means treating each meaningful change to a feature's logic, window, schema, or source as a new version. You keep multiple versions side-by-side and record the metadata to reproduce results later.

Mental model

Think of features like APIs. A version guarantees a contract: input keys, transformation logic, and output schema. When the contract changes, publish a new version instead of silently changing the old one.

Core components of a feature version

  • Feature identity: name, entity key(s), and version (e.g., v1.2.0).
  • Transformation signature: code hash (e.g., Git SHA), parameters, and dependency versions.
  • Data sources: source identifiers and snapshot/backfill references.
  • Temporal semantics: window length, aggregation function, timestamp column, time zone.
  • Schema: data type, nullability, default handling.
  • Operational: owner, description, freshness/TTL, expected update cadence.
  • Lineage and changelog: what changed, why, who approved.

When to bump the version

Use semantic versioning as a simple policy:

  • MAJOR (X.0.0) — behavior/meaning changes:
    • Change in time window (7d to 30d), different join keys, or new data source with different coverage.
    • Aggregation switch (mean to sum) or unit change (USD to EUR).
  • MINOR (0.X.0) — same meaning with bounded output drift:
    • Bug fix in edge cases, improved null handling, stricter filtering while preserving semantics.
    • Adding optional fields in a struct while keeping existing fields intact.
  • PATCH (0.0.X) — no output change by design:
    • Performance refactor, internal code reorganization, infra-only updates.
Tip: set numeric drift thresholds

Define a policy: if new vs old feature shows more than N% distribution shift (e.g., KS statistic or PSI threshold), treat as MAJOR; if small but non-zero, MINOR; zero by design, PATCH.

Naming strategies

  • Separate name + version field (preferred): feature name is stable (e.g., user_txn_amount) and version is a stored attribute (v1.1.0).
  • Version suffix in name: encode version into the name (user_txn_amount__v2). Simple for consumers, but names proliferate quickly.
Guideline to choose
  • If your tooling supports native version metadata and aliases, use separate version fields.
  • If your registry is simple or you need easy A/B wiring, suffix naming can be practical.

Worked examples

Example 1: Window change 7d → 30d
  • Old: user_pageviews_7d v1.2.0 (sum over last 7 days).
  • New: user_pageviews_30d (sum over last 30 days).
  • Impact: distribution and meaning change. Bump MAJOR: v2.0.0.
  • Rollout: dual-write v1 and v2, backfill 30d offline, canary in serving, evaluate lift, switch consumers, deprecate v1 after stability period.
Example 2: Bug fix in null handling
  • Old: avg_spend_90d v1.0.0 treats null amounts as zero.
  • Fix: exclude null transactions from denominator.
  • Impact: same meaning, mild drift. Bump MINOR: v1.1.0.
  • Test: correlation and drift checks vs v1.0.0; document change and expected sign of drift.
Example 3: Infra refactor with UDF optimization
  • Refactor UDF for speed; outputs must remain identical.
  • Impact: by-design no change. Bump PATCH: v1.1.1 → v1.1.2.
  • Guardrail: golden test set must match bit-for-bit before release.

Step-by-step release workflow

  1. Plan — define change type (MAJOR/MINOR/PATCH), owners, acceptance criteria, and deprecation plan.
  2. Implement — update code; record code hash, parameters, and source IDs.
  3. Backfill — produce offline data for the training window; snapshot with timestamp and metadata.
  4. Validate — run drift and consistency tests:
    • For MINOR/PATCH: high correlation and small drift.
    • For MAJOR: document expected change and performance impacts.
  5. Dual-write — publish old and new versions in parallel; monitor serving metrics.
  6. Canary — route a small percentage of traffic to models using the new version; watch latency and stability.
  7. Cutover — switch consumers or update aliases to new version.
  8. Deprecate — announce freeze date; delete after retention window.

Exercises

Complete these to apply the concepts. A quick test is also available below. Everyone can take the test; only logged-in users will have saved progress.

Exercise 1 (mirrors ex1): Decide the version bump

Scenario: You maintain feature user_avg_order_value_30d v1.3.2. You now:

  • Switch currency normalization from USD to local currency per user.
  • Keep 30d window, same aggregation, new exchange rate source.

Task: Choose MAJOR/MINOR/PATCH, propose the next version string, and list 3 rollout steps.

Exercise 2 (mirrors ex2): Draft a migration plan

Scenario: You replace user_sessions_7d v1.2.0 with user_sessions_14d. Training must stay on 7d for two weeks while you experiment in serving.

Task: Write a short plan with: backfill, dual-write, canary, aliasing/consumer updates, and deprecation timing.

Exercise checklist before you submit

  • You justified the version bump type with semantics and expected drift.
  • You included code/source metadata (hash, source IDs) in the plan.
  • You described dual-write and canary, not just a hard switch.
  • You stated a deprecation/retention period.

Common mistakes and how to self-check

  • Silent updates without a new version. Self-check: Can you reproduce last month's training dataset bit-for-bit? If not, you likely changed a version in place.
  • Using MINOR when semantics changed. Self-check: Did units, window, or keys change? If yes, MAJOR.
  • No backfill before serving cutover. Self-check: Can the next model train immediately on the new version? If not, you skipped backfill.
  • Missing code hash and parameters. Self-check: Could you rebuild the exact transform container or job? If unsure, metadata is incomplete.
  • Deleting old versions too early. Self-check: Any active consumer or audit requirement depends on the old version?

Practical projects

  • Versioning policy document: Write a one-pager policy for MAJOR/MINOR/PATCH with numeric thresholds and examples.
  • Golden set validator: Create a small script that computes correlation, KS statistic, and PSI between two feature versions.
  • Dual-write pipeline: Implement a job that writes both v1 and v2, tagging records with version and job run ID.

Learning path

  • Before this: Basics of feature stores and offline/online consistency.
  • Now: Feature versioning semantics, metadata, rollout strategies.
  • Next: Feature lineage and monitoring (drift, freshness, null rates) to maintain versions in production.

Who this is for

  • Machine Learning Engineers deploying features to production.
  • Data Scientists who own feature logic and need reproducibility.
  • Data/ML Platform Engineers designing feature registries.

Prerequisites

  • Basic understanding of feature stores (offline/online, entities, timestamps).
  • Ability to read ETL/ELT code and simple aggregations.
  • Familiarity with A/B testing or canary releases.

Next steps

  • Adopt a versioning policy in your team and run a pilot on one feature.
  • Automate metadata capture (code hash, parameters) in your pipelines.
  • Set up dashboards for drift and adoption of new versions.

Mini challenge

You discover that user_country inferred from IP was replaced by billing_country for GDPR reasons. Window and aggregations unchanged, but segment distributions shift. What version bump and why? Write 3 bullet points on rollout risk mitigation.

Suggested direction
  • Likely MAJOR: source and coverage semantics changed; downstream models may shift.
  • Dual-write and monitor segment-level metrics; add guardrails for regional bias.
  • Communicate change to stakeholders and set deprecation window for the IP-based version.

Quick test note

Take the Feature Versioning — Quick Test below to check your understanding. Everyone can take it for free; only logged-in users will have saved progress.

Practice Exercises

2 exercises to complete

Instructions

You maintain user_avg_order_value_30d v1.3.2. You switch normalization from USD to per-user local currency using a new FX source. Window and aggregation stay the same.

  • Choose MAJOR, MINOR, or PATCH and justify.
  • Propose the next version string.
  • List 3 rollout steps that minimize risk.
Expected Output
Clear justification of MAJOR/MINOR/PATCH, a version like v2.0.0 or v1.4.0, and a 3–5 step rollout (backfill, dual-write, canary, cutover, deprecation).

Feature Versioning — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Feature Versioning?

AI Assistant

Ask questions about this tool