Why this matters
Data platforms live and breathe on shared data. Schema registries and contracts ensure producers and consumers can evolve independently without breaking each other. As a Data Platform Engineer, you will:
- Prevent breaking data changes across Kafka topics, streams, and APIs.
- Enforce compatibility rules and versioning in CI/CD.
- Provide a single source of truth for schemas used across teams.
- Audit who changed what, when, and why for governance and compliance.
Concept explained simply
A schema registry stores versions of data schemas (Avro, Protobuf, JSON Schema) and enforces rules that define which changes are allowed. A contract is the agreement between data producers and consumers about the shape and meaning of data. Registries automate and enforce those agreements.
Mental model
- Schema = blueprint of a message.
- Subject = logical name under which schema versions are stored (e.g., topic-value, topic-key).
- Compatibility mode = rule for schema evolution (backward, forward, full, transitive).
Think of it as version-controlled interfaces for data. Just like APIs have versioned endpoints, your data messages have versioned schemas with evolution rules.
Core concepts
- Formats: Avro (compact, great with Kafka), Protobuf (IDL-first, language-neutral), JSON Schema (human-readable, flexible).
- Compatibility modes:
- Backward: New readers can read old data.
- Forward: Old readers can read new data.
- Full: Both backward and forward hold.
- Transitive: Rule must hold across all historical versions, not just the latest.
- Safe evolution patterns: add optional fields with defaults, add nullable fields, add enum symbols (with care), deprecate without removing.
- Risky changes: rename without aliases, remove required fields, change types incompatibly, reuse Protobuf field numbers without reserving.
- Subjects: typically one subject per topic-value and per topic-key in Kafka; align naming with environments and domains.
- Governance: approvals for breaking changes, lineage and audit trails, policy checks (e.g., PII tags in schema metadata).
Worked examples
Example 1: Backward-compatible Avro change (add optional field)
{
"type": "record",
"name": "User",
"namespace": "example.v1",
"fields": [
{"name": "id", "type": "string"},
{"name": "email", "type": "string"}
]
}
Evolve by adding an optional field with a default:
{
"type": "record",
"name": "User",
"namespace": "example.v1",
"fields": [
{"name": "id", "type": "string"},
{"name": "email", "type": "string"},
{"name": "phone", "type": ["null", "string"], "default": null}
]
}
With backward compatibility, consumers using the new schema can read existing data (which lacks phone) because a default is provided.
Example 2: Avoiding a breaking rename with Avro aliases
Original:
{"name":"email","type":"string"}
Renamed to contact_email breaks older consumers. Use an alias:
{
"name": "contact_email",
"type": "string",
"aliases": ["email"]
}
Aliases allow readers to map old field names to new ones, preserving compatibility.
Example 3: Protobuf field number safety
Original .proto:
message Order {
string id = 1;
string status = 2; // e.g., NEW, SHIPPED
}
Removing field 2 and reusing number 2 for a different meaning can corrupt reads. Correct approach:
message Order {
string id = 1;
reserved 2;
}
Use reserved to block reuse of old tags. Add new fields with new numbers.
Example 4: Forward compatibility with JSON Schema
Old schema (consumer):
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"required": ["id"],
"properties": {"id": {"type": "string"}}
}
Producer adds an optional field:
{
"type": "object",
"required": ["id"],
"properties": {
"id": {"type": "string"},
"note": {"type": "string"}
},
"additionalProperties": false
}
If consumers ignore unknown fields or their validators allow extra properties expected by forward mode, old consumers still function.
How to choose a format
- Avro: streaming events in Kafka, strong evolution story, compact binary.
- Protobuf: cross-language RPC and events, strict IDL, efficient.
- JSON Schema: human-friendly, great for REST payloads and configuration.
Pick one per domain for consistency. Document the default compatibility mode per domain.
Governance patterns that work
- Subject naming: domain.team.topic-value and topic-key for clarity.
- Compatibility defaults: start with backward or full; use transitive in critical domains.
- CI policy: block merges on breaking changes; require approvers for full-mode changes.
- Metadata: tag fields with sensitivity (e.g., PII) and retention hints.
- Change proposals: require a short migration note for consumers.
Step-by-step: set up a schema workflow locally
- Pick a format and define v1 of a small schema (User, Order, or Event).
- Decide a compatibility mode (try backward).
- Create v2 with one safe change (optional field with default).
- Create v3 with a risky change (rename without alias). Predict the registry decision.
- Add governance metadata: owner, description, tags.
Mini task: write a migration note
Describe in 3 sentences what changed from v1 to v2, why it is compatible, and what consumers must do (usually nothing).
Your exercises
Complete the exercises below. Then check your answers.
- Exercise 1: Avro evolution with backward compatibility.
- Exercise 2: Protobuf safety and contract decision-making.
Checklist before submitting:
- You chose a compatibility mode and justified it.
- You identified any breaking change and proposed a fix.
- You included defaults or aliases where needed.
Common mistakes and self-check
- Renaming without aliases (Avro) or without deprecation mapping.
- Reusing Protobuf field numbers; always reserve removed tags.
- Removing required fields; prefer deprecate or make optional with defaults.
- Forgetting key schemas for Kafka; key evolution rules can differ from value.
- Switching formats midstream without a migration plan.
Self-check: Given your last schema change, can all current consumers read messages produced before and after your change? If not, it is not backward compatible.
Practical projects
- Build a sample Kafka topic with Avro User events, enforce backward-TRANSITIVE compatibility, and evolve through 4 versions.
- Create a Protobuf contract for an Order service, add fields across versions, and demonstrate reserved tags for removed fields.
- Define JSON Schemas for a REST payload with a lint rule set that blocks breaking changes in CI.
Who this is for
- Data Platform Engineers managing shared event streams and APIs.
- Data Engineers producing/consuming data across teams.
- Developers integrating services via events or contracts.
Prerequisites
- Basic understanding of data serialization (JSON, binary formats).
- Familiarity with Kafka or message queues helps but is not required.
- Comfort reading simple Avro, Protobuf, or JSON Schema definitions.
Learning path
- Learn schema formats (Avro, Protobuf, JSON Schema).
- Understand registry subjects and compatibility modes.
- Practice safe evolution patterns and governance policies.
- Automate checks in CI and document contracts.
Mini challenge
You maintain a topic-value subject set to FULL-TRANSITIVE. You need to add a new optional field and remove a legacy required field. How do you achieve this without breaking compatibility? Outline the steps in 5 bullet points.
Next steps
- Finish the exercises below and take the quick test.
- Apply these patterns in one of the Practical projects.
- Add contract checks to a CI pipeline in your environment.
Note: The quick test is available to everyone; only logged-in users get saved progress.