Why this matters
Clear request/response schemas and strong validation make your ML APIs reliable, safe, and easy to integrate. As a Machine Learning Engineer, you will: ship models behind HTTP endpoints, prevent bad inputs from crashing models, return predictable outputs for clients, and evolve schemas without breaking existing users.
- Real tasks you will face: define JSON for prediction endpoints; validate inputs (types, ranges, enums, formats); handle errors with clear messages; version schemas; ensure backward compatibility.
Concept explained simply
A schema is a contract for your API: what fields exist, their types, constraints, and examples. Validation checks incoming requests against this contract before the model runs, and ensures responses match the documented shape.
Mental model
Think of your API like a form with a bouncer. The form (schema) lists exactly what to fill in. The bouncer (validator) checks every field. If anything is missing or malformed, the bouncer stops it at the door and explains why. Only valid inputs reach your model, and outputs are formatted before they leave.
Core parts of ML API schemas
- Request schema: inputs, types, constraints (e.g., text length, image size), optional vs required, mutually exclusive fields.
- Response schema: predictions, confidences, class labels, metadata (model_version, latency_ms), and error format.
- Validation: type checks, ranges, regex/format (email, uuid), content limits (max bytes), custom rules (probabilities sum to 1 on response).
- Compatibility: version fields, default values, deprecation strategy.
Typical status codes
- 200: success
- 400: bad request (malformed JSON, wrong content-type)
- 422: validation failed (well-formed JSON but invalid fields)
- 500: unexpected server error
Worked examples
Example 1 — Text classification endpoint
{
"request": {
"text": "I love this product!",
"language": "en",
"max_alternatives": 1
},
"constraints": {
"text": {"type": "string", "minLength": 1, "maxLength": 10000},
"language": {"enum": ["en", "es", "fr"], "default": "en"},
"max_alternatives": {"type": "integer", "minimum": 1, "maximum": 5, "default": 1}
},
"response": {
"label": "positive",
"confidence": 0.97,
"alternatives": [{"label": "neutral", "confidence": 0.02}],
"model_version": "tc-1.3.2",
"latency_ms": 35
}
}Notes: use 422 if text is empty or too long; respond with floats in [0,1].
Example 2 — Image inference with mutually exclusive inputs
Request accepts exactly one of: image_url OR image_base64
{
"image_url": "https://.../cat.jpg",
"image_base64": null,
"top_k": 3
}
Constraints:
- image_url: string uri, optional
- image_base64: base64 string, optional
- exactly one must be provided
- top_k: integer 1..5, default 3, must be \u2264 number of classes
Response:
{
"predictions": [
{"label": "cat", "score": 0.92},
{"label": "lynx", "score": 0.04},
{"label": "dog", "score": 0.03}
],
"model_version": "resnet50-2.1",
"latency_ms": 48
}
Error example (422):
{
"error": {
"code": "validation_error",
"message": "Provide exactly one of image_url or image_base64",
"fields": {"image_url": "present", "image_base64": "present"}
}
}Example 3 — Tabular regression with strong typing
Request:
{
"features": {
"age": 42,
"income_usd": 72000.5,
"employment_type": "full_time",
"zip": "94103"
}
}
Constraints:
- age: integer 0..120
- income_usd: number 0..1e7
- employment_type: enum ["full_time", "part_time", "contract", "unemployed"]
- zip: string pattern ^\u005cd{5}$
Response:
{
"prediction": 350000.22,
"prediction_interval": {"low": 310000.10, "high": 390000.34},
"model_version": "house-reg-0.9.0",
"latency_ms": 12
}
Error example (422):
{
"error": {
"code": "validation_error",
"message": "zip must match ^\\d{5}$",
"fields": {"zip": "pattern_mismatch"}
}
}Practical patterns and rules
- Make required minimal; everything else optional with safe defaults.
- Keep responses stable; only add fields or behind a version bump.
- Include model_version and latency_ms for traceability.
- Return consistent error objects with code, message, and per-field details.
- Limit payloads (text length, image size bytes) to protect compute.
Simple JSON Schema snippet
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"text": {"type": "string", "minLength": 1, "maxLength": 10000},
"language": {"type": "string", "enum": ["en", "es", "fr"]},
"max_alternatives": {"type": "integer", "minimum": 1, "maximum": 5, "default": 1}
},
"required": ["text"],
"additionalProperties": false
}Versioning and compatibility
- Start with response field model_version (string) and optionally schema_version (semver).
- Backward compatible changes: add optional fields, widen enums, raise max limits; avoid renames/removals.
- Breaking changes: bump schema_version and expose a new route or header-based version; keep old version until clients migrate.
Security and limits
- Enforce content-type (application/json) and size limits early.
- Reject executable content; for base64 images, cap decoded bytes and validate mime type.
- Never echo raw user inputs into logs; include request_id instead.
Who this is for
- Machine Learning Engineers serving models via HTTP.
- Data/ML practitioners building internal or external inference APIs.
Prerequisites
- Basic HTTP and JSON.
- One server framework (e.g., FastAPI, Flask, or similar).
- Familiarity with your model inputs/outputs.
Learning path
- Define minimal request/response for one model.
- Add validation rules (types, ranges, enums, exclusivity).
- Design error objects and status codes.
- Introduce versioning and defaults for compatibility.
- Add payload limits and performance-friendly constraints.
Exercises
Do these now. The quick test at the end reinforces these ideas. Note: anyone can take the test for free; only logged-in users will see saved progress.
Exercise 1 — Toxicity classifier schema
Create request/response schemas for a toxicity classifier that takes text and optional language. Add constraints and a clear error format.
- Required: text
- Optional: language in ["en","es"], default "en"
- Limit: text length 1..5000
- Response: label in ["toxic","non_toxic"], confidence [0,1], model_version, latency_ms
Self-check
- What happens if language is "de"?
- What if text is an empty string?
Exercise 2 — Image endpoint with mutual exclusivity
Design request validation that accepts exactly one of image_url or image_base64. Cap decoded image size at 5 MB. Include top_k 1..5 with default 3. Define 422 error for violations.
Self-check
- What 422 message do you return if both are provided?
- How do you report too-large image size?
Exercise checklist
- Requests have types, ranges, and enums.
- Exactly-one-of rule enforced where needed.
- Responses include model_version and latency_ms.
- Errors include code, message, and fields map.
- No breaking change introduced without versioning.
Common mistakes and how to self-check
- Too many required fields: make non-critical inputs optional with defaults. Self-check: can a minimal request succeed?
- Inconsistent error shapes: define one error object and reuse. Self-check: compare errors across endpoints.
- Forgetting payload limits: add max length/size. Self-check: try a huge input; ensure 413/422.
- Enum drift: document allowed labels and version changes. Self-check: add a new label without breaking old clients.
- Silent truncation of floats: round explicitly in responses. Self-check: inspect number formatting.
Practical projects
- Wrap a sentiment model behind a /predict endpoint with full validation and error handling.
- Add schema_version and a v2 route that widens an enum; keep v1 functional.
- Implement size-limited image scoring (URL or base64) with top_k and latency reporting.
Next steps
- Instrument your API with request_id and latency tracking.
- Add input/output JSON Schema files and validate them in CI.
- Document examples for success and common errors.
Mini challenge
Redesign a model response to add explanations (saliency per token) without breaking existing clients. Keep original fields stable, add a new optional explanations object, and bump schema_version if necessary.
Hint
Place explanations under an optional key, include type and version, and keep top-level prediction fields unchanged.
Check your knowledge
Ready for the quick test below? It is available to everyone for free; log in to save your progress.