luvv to helpDiscover the Best Free Online Tools
Topic 4 of 6

Schema Versioning And Contracts

Learn Schema Versioning And Contracts for free with explanations, exercises, and a quick test (for MLOps Engineer).

Published: January 4, 2026 | Updated: January 4, 2026

Why this matters

MLOps Engineers rely on stable data and predictable model interfaces. When schemas change without a plan, pipelines break, models drift, and dashboards show wrong numbers. Schema versioning and clear contracts let teams evolve fast without fear.

  • Keep feature stores and training pipelines compatible across releases.
  • Prevent breaking changes from upstream data producers.
  • Deploy new model inputs/outputs safely (e.g., add confidence scores).

Concept explained simply

A schema is the agreed shape and meaning of data (fields, types, constraints). A contract is the written agreement about that schema plus operational rules (who owns it, SLAs, allowed changes, versioning policy).

Mental model

Think of a schema contract like a public API for data. Producers publish versions; consumers declare which versions they support. Compatibility guarantees guide safe changes.

Compatibility types — quick reference
  • Backward compatible: New producer works with old consumers. Typical changes: add optional field with default; add enum value if consumers ignore unknowns.
  • Forward compatible: Old producer works with new consumers. Typical changes: consumers treat missing fields as optional.
  • Full compatible: Both backward and forward. Achieved by careful optional/defaulted additions and no removals.

Versioning basics

  1. Use semantic versioning (MAJOR.MINOR.PATCH)
    • MAJOR: breaking change (e.g., remove/rename a required field).
    • MINOR: backward-compatible addition (e.g., new optional field with default).
    • PATCH: fixes that do not change the schema meaning (e.g., tighten description, add non-breaking constraint).
  2. Document allowed changes in the contract: what’s safe, what’s not, and deprecation policy.
  3. Automate validation: schema diff in CI, sample data validation, contract checks before release.

Worked examples

Example 1 — Add an optional field safely (backward compatible)

Current user events schema:

{
  "version": "1.2.0",
  "type": "record",
  "name": "UserEvent",
  "fields": [
    {"name": "user_id", "type": "string"},
    {"name": "event", "type": "string"},
    {"name": "ts", "type": "long"}
  ]
}

We want to add country. Make it optional with a default:

{
  "version": "1.3.0",
  "fields": [
    {"name": "user_id", "type": "string"},
    {"name": "event", "type": "string"},
    {"name": "ts", "type": "long"},
    {"name": "country", "type": ["null", "string"], "default": null}
  ]
}

Reasoning: Old consumers can ignore the new field. Version bump: MINOR.

Example 2 — Renaming a field without breaking consumers

Current field: signup_ts. Desire: rename to created_at.

  1. Add created_at as a new field, keep signup_ts, write both for a deprecation window.
  2. Ask consumers to switch to created_at (announce deadline in contract).
  3. After the window, remove signup_ts and bump MAJOR.

This phased approach avoids breaking existing consumers.

Example 3 — Model I/O contract change

Prediction API v1 response:

{
  "version": "1.0.0",
  "output": {
    "label": "string"
  }
}

Need to add probability. Plan:

{
  "version": "1.1.0",
  "output": {
    "label": "string",
    "probability": {"type": "number", "minimum": 0, "maximum": 1, "default": 0.5}
  }
}

Deploy v1.1 that still allows clients to read only label. Communicate the optional field and monitor adoption. If later you change types or remove fields, release v2.0.

How to implement in your workflow

  1. Define a contract template
    • Owner and contact
    • Schema (fields, types, required, defaults)
    • SLAs (freshness, volume ranges)
    • Allowed changes + versioning policy
    • Validation rules (constraints, nullability)
    • Deprecation timelines
  2. Add CI checks
    • Schema diff vs last released version (detect breaking changes).
    • Sample data validation against schema.
    • Version bump verification based on change type.
  3. Release management
    • Tag versions and publish a changelog.
    • Use canary or shadow writes for new fields.
    • Backfill historical data if consumers require it.

Contracts in practice (data and model)

Example contract snippet — data set
{
  "name": "user_events",
  "owner": "growth-data@company",
  "version": "1.3.0",
  "schema": {
    "fields": [
      {"name": "user_id", "type": "string", "required": true},
      {"name": "event", "type": "string", "required": true},
      {"name": "ts", "type": "long", "required": true},
      {"name": "country", "type": "string", "required": false, "default": null}
    ]
  },
  "validation": {
    "constraints": [
      {"field": "ts", "rule": "> 0"},
      {"field": "event", "rule": "in [signup, purchase, click]"}
    ]
  },
  "sla": {"freshness_minutes": 10, "max_lag_minutes": 30},
  "allowed_changes": ["add_optional", "tighten_constraint_if_backward_compatible"],
  "deprecation_policy": {"notice_days": 30}
}
Example contract snippet — model API
{
  "name": "churn_predictor",
  "owner": "ml-platform@company",
  "version": "1.1.0",
  "input": {
    "features": [
      {"name": "tenure_days", "type": "number", "required": true},
      {"name": "plan_type", "type": "string", "required": true},
      {"name": "country", "type": "string", "required": false, "default": "unknown"}
    ]
  },
  "output": {
    "label": {"type": "string"},
    "probability": {"type": "number", "min": 0, "max": 1, "default": 0.5}
  },
  "validation": {
    "feature_ranges": [{"name": "tenure_days", "min": 0, "max": 3650}]
  },
  "compatibility": "backward",
  "deprecation_policy": {"notice_days": 45}
}

Exercises

Complete these tasks. Then check solutions below each exercise or in the Exercises section at the bottom of the page.

  1. Exercise 1 (ex1): Update a schema by adding a field without breaking existing consumers. State the version bump and why.
  2. Exercise 2 (ex2): Draft a minimal data contract for a metrics table with owner, schema, validation, and allowed changes.

Release checklist

  • Change is categorized (breaking vs non-breaking)
  • Semantic version updated correctly
  • Schema diff reviewed and approved
  • Sample data validates against the new schema
  • Changelog updated and consumers notified
  • Monitoring/alerts for rollout in place

Common mistakes and how to self-check

  • Silent renames: Renaming a field without dual-writing and deprecation. Self-check: Are both names produced for a period? Is the removal date announced?
  • Missing defaults: Adding a new field as required. Self-check: Can old consumers read records without that field?
  • Implicit type changes: Changing number to string or int to float. Self-check: Run a schema diff; require MAJOR bump if types change.
  • Ignoring nullability: Flipping nullable to required. Self-check: Scan historical data for nulls before tightening.
  • No ownership: Contracts with no owner. Self-check: Does the contract list a team and contact?

Mini challenge

Your event schema has an enum field event with values [signup, purchase]. You need to add refund. What compatibility type is this, what version bump, and what consumer risk remains?

Show a possible answer

Backward-compatible in most systems if consumers ignore unknown enum values or treat them as strings. Use MINOR bump. Risk: hardcoded switches on known values may fail. Communicate the new value and monitor downstream logic.

Who this is for

  • MLOps Engineers integrating data producers, feature stores, and model services.
  • Data/ML Engineers responsible for stable pipelines.
  • Analysts building dashboards on evolving datasets.

Prerequisites

  • Basic understanding of data types (string, number, boolean, timestamp).
  • Familiarity with JSON or similar schema formats.
  • CI basics (running checks on pull requests).

Learning path

  1. Learn data validation and profiling.
  2. Master schema versioning and contracts (this lesson).
  3. Set up CI/CD checks for data and model artifacts.
  4. Rollout strategies (canary, shadow, deprecation windows).

Practical projects

  • Implement schema diff + semantic version guard in CI for one dataset.
  • Create a model I/O contract for an existing model and enforce it in a pre-deploy check.
  • Run a two-week deprecation: dual-write a renamed field and migrate one consumer.

Ready to assess?

Take the Quick Test for this subskill. It is available to everyone; only logged-in users get saved progress.

Next steps

  • Adopt a single contract template across teams.
  • Automate data validation on sample batches before merging.
  • Publish versioned changelogs and deprecation calendars.

Practice Exercises

2 exercises to complete

Instructions

You maintain a JSON-based schema for transactions with fields: id: string, amount: number, ts: long. Add a new field currency so old consumers are not broken. Provide the updated schema snippet for fields only, specify the version change (from 1.2.0 to ?), and explain your choice.

Expected Output
Fields array includes currency as optional with a default; version bumped to 1.3.0 (MINOR) with a short justification.

Schema Versioning And Contracts — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Schema Versioning And Contracts?

AI Assistant

Ask questions about this tool