luvv to helpDiscover the Best Free Online Tools
Topic 7 of 8

Versioned Endpoints

Learn Versioned Endpoints for free with explanations, exercises, and a quick test (for MLOps Engineer).

Published: January 4, 2026 | Updated: January 4, 2026

Why this matters

MLOps engineers ship models that must evolve without breaking users. Versioned endpoints let you release new models safely, compare performance, roll back instantly, and deprecate old behavior predictably. Real tasks you will face:

  • Expose /v1 and /v2 predict endpoints while keeping contract guarantees.
  • Canary release a new model (10% traffic), monitor, then promote to 100%.
  • Pin batch jobs to a specific model version for reproducibility.
  • Deprecate old versions with clear timelines and non-breaking changes.

Concept explained simply

A versioned endpoint is an API you can call for predictions that clearly states which behavior (model + contract) you will get. When you change the API contract, you bump the API version. When you change the model but keep the same contract, you bump the model version. This separation lets clients choose stability or novelty.

Mental model

  • API version (v1, v2, …): The shape of requests/responses. Changing fields = API version change.
  • Model version (1.3.2, 2.0.0): The model artifact or weights behind the API. Swap models safely if the API contract stays the same.
  • Traffic router: A gate that splits traffic across versions (canary, blue/green, shadow).
  • Policy: Rules for compatibility, deprecation, and rollback.

Key components and decisions

API versioning strategies
  • Path-based: /v1/predict, /v2/predict (most explicit for clients).
  • Header-based: X-API-Version: 1 (clean URLs; requires client header support).
  • Rule of thumb: Use path for breaking changes (major), headers for optional minor behavior flags.
Semantic versioning (SemVer) adapted to ML
  • API: v1, v2 for breaking changes.
  • Model: MAJOR.MINOR.PATCH (e.g., 1.4.0). Minor = quality/tuning; Patch = bugfix; Major = incompatible model signature (rare if API abstracts it).
Traffic strategies
  • Blue/Green: Two identical environments. Switch all traffic at once. Fast rollback.
  • Canary: Gradually shift traffic (e.g., 1% → 10% → 50% → 100%). Guard with SLOs.
  • Shadow: Send a copy of requests to the new version for observation only. No user impact.
Compatibility rules
  • Non-breaking: Add optional fields, add new endpoint, improve defaults.
  • Breaking: Remove/rename fields, change types, change semantics. Requires new API version.
Deprecation policy
  • Announce deprecation with a timeline (e.g., 90 days).
  • Expose deprecation headers (e.g., X-Deprecated: true; Sunset: YYYY-MM-DD).
  • Provide migration notes and example requests/responses.
Monitoring and rollback
  • Monitor latency (p95/p99), error rate, timeouts, and quality metrics (where possible).
  • Define rollback triggers (e.g., error rate > 1% for 10 min).
  • Keep the last good version warm for instant rollback.

Worked examples

Example 1 — Path-based API versions

Scenario: v1 returns a single label; v2 returns top-k labels with scores.

POST /v1/predict
{
  "text": "I love this!"
}
-->
{
  "label": "positive"
}

POST /v2/predict
{
  "text": "I love this!",
  "top_k": 3
}
-->
{
  "labels": [
    {"label": "positive", "score": 0.97},
    {"label": "joy", "score": 0.71},
    {"label": "neutral", "score": 0.18}
  ]
}

Contract change is breaking, so create /v2. Maintain /v1 until clients migrate. Add response header on /v1:

X-Deprecated: true
Sunset: 2026-06-01
Deprecation-Note: Use /v2/predict with top_k.

Example 2 — Canary rollout with guardrails

Initial: /v2 uses model 2.0.0
Plan: 1% → 10% → 50% → 100%
Guardrails (5 min windows):
- p95 latency <= 250 ms
- Error rate <= 0.5%
- QoS metric (if online labels exist): not worse than v1 by more than 2%
Rollback trigger: any violation sustained > 10 min.

Execution:

  1. Route 1% to v2; watch dashboards.
  2. Increase to 10%, then 50%, then 100% if stable.
  3. If violation occurs, route 0% to v2 and investigate.

Example 3 — Pin batch jobs to a model version

Batch scoring must be reproducible. Keep API contract same (/v2), but pin model:

POST /v2/predict
Headers:
  X-Model-Version: 2.1.4
Body:
  {"records": [...]} 

If the header is absent, the router uses the currently promoted model (e.g., 2.2.0). Batch jobs include the header to freeze behavior across reruns.

Step-by-step: Implement a versioned endpoint

  1. Define your contracts: JSON schemas for v1, v2. Mark optional vs required fields.
  2. Separate API vs model versions: e.g., /v2 path + X-Model-Version header.
  3. Build a router: Route by API path; inside, pick model by header or default.
  4. Add observability: Log API version, model version, latency, errors, and sampling IDs.
  5. Prepare blue/green: Keep old and new live; warm load models to avoid cold starts.
  6. Canary controls: Config for traffic percentages; verify metrics before each step.
  7. Deprecation headers: Set X-Deprecated and Sunset for old versions.
  8. Rollback plan: One-click switch to last good combo; keep configs versioned.

Exercises (hands-on)

These mirror the exercises below. Do them here, then open each exercise to check your solution.

  1. Design a contract change (Ex1): Propose /v2 for a change from {label} to {labels: [{label, score}]}. Include a deprecation timeline and example requests/responses.
  2. Plan a canary rollout (Ex2): Define a traffic schedule, SLO guardrails, and rollback triggers for promoting v2.
  3. Pin batch jobs (Ex3): Show how a batch client calls /v2 with a fixed X-Model-Version and handles missing versions gracefully.

Self-checklist

  • I can explain the difference between API version and model version.
  • I know when a change is breaking and requires a new API version.
  • I can run a canary rollout with clear guardrails and rollback.
  • I can pin a batch job to a specific model version.
  • I can communicate deprecation timelines in headers and docs.

Common mistakes and how to self-check

  • Mixing API and model versions: If a model swap requires clients to change their code, you likely needed a new API version. Self-check: Can an old client call the new model without changes?
  • No rollback path: Releasing without a warm old version. Self-check: Can you restore traffic in under 1 minute?
  • Silent breaking changes: Renaming fields in-place. Self-check: Schema diff tool shows field removals/renames? That’s breaking.
  • Ignoring monitoring: No guardrails during canary. Self-check: Do you track p95 latency, error rate, and a simple quality proxy?
  • Undefined deprecation: Removing v1 without notice. Self-check: Are X-Deprecated and Sunset headers set at least 60–90 days ahead?

Practical projects

  • Build a local service exposing /v1 and /v2 predict endpoints. v2 adds fields but remains backward compatible via defaults. Add a simple in-process router that accepts X-Model-Version.
  • Implement a canary controller: a configuration file with target percentages and minimal SLO checks (mock metrics). Simulate violations and rollback.
  • Create a batch script that reads input records, calls /v2 with X-Model-Version, retries transient errors with backoff, and writes outputs with a run ID.

Who this is for

  • MLOps engineers deploying models behind HTTP/gRPC APIs.
  • Backend/Platform engineers integrating ML inference into services.
  • Data scientists who own model releases and need safe rollouts.

Prerequisites

  • Basic HTTP/JSON familiarity (methods, headers, status codes).
  • Understanding of model packaging and a simple inference server.
  • Basic observability concepts (latency, error rate).

Learning path

  1. Model packaging basics → containerize inference service.
  2. Versioned endpoints → separate API vs model versions.
  3. Traffic management → canary, blue/green, shadow.
  4. Monitoring and rollback → SLOs and automated safeguards.
  5. Deprecation and migration playbooks → stable client experience.

Mini challenge

Design a migration note for moving clients from /v1 to /v2 where v2 adds a top_k parameter and returns ranked labels. Include: version timeline, example requests/responses, and how clients can pin a specific model version.

Next steps

  • Draft your versioning policy (API vs model) and share with your team.
  • Template your canary rollout plan with guardrails and rollback triggers.
  • Instrument your service to log API version, model version, and latency.

Ready to test yourself?

Take the Quick Test below. Everyone can take it for free. If you log in, your progress will be saved automatically.

Practice Exercises

3 exercises to complete

Instructions

Your current endpoint is:

POST /v1/predict
{ "text": "Great product" } --> { "label": "positive" }

New requirement: return top-k labels with scores and accept an optional top_k parameter. Produce:

  • New endpoint path or version strategy.
  • Example request/response for the new version.
  • Deprecation headers and timeline for v1.
  • One-paragraph migration note for clients.
Expected Output
A /v2 design with request including optional top_k and response containing an array of {label, score}. Deprecation headers set on /v1 with a clear sunset date and a concise migration note.

Versioned Endpoints — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Versioned Endpoints?

AI Assistant

Ask questions about this tool