Why this matters
On real streaming platforms, schemas define the shape of your messages so producers and consumers agree on data. A schema registry stores and versions those schemas, enforces compatibility, and prevents silent data corruption. As a Data Platform Engineer, you will: configure compatibility modes, guide teams to evolve schemas safely, debug serialization issues, and keep topics healthy during rolling deployments.
- Ensure backward/forward compatibility during deploys
- Standardize Avro/Protobuf/JSON Schema usage
- Avoid breaking consumers when adding fields
- Debug serializer/deserializer (SerDe) errors quickly
Who this is for
- Data Platform Engineers enabling Kafka/Kinesis/Pub-Sub style platforms
- Backend/Streaming engineers integrating producers and consumers
- SREs supporting streaming reliability
Prerequisites
- Basic understanding of topics, producers, consumers
- Familiarity with one schema format (Avro, Protobuf, or JSON Schema)
- Know what serialization/deserialization (SerDe) means
Concept explained simply
A schema registry is a catalog of message schemas with versioning and rules. Producers register schemas and write messages that reference a schema ID. Consumers use that ID to fetch the writer schema and read safely. The registry enforces compatibility so new schemas do not break existing readers or historical data.
Mental model
Think of it like an API contract library for events. Each topic has one or more subjects (e.g., topic-name-key, topic-name-value). Each subject has versions. Compatibility rules are the guardrails that let teams change contracts without breaking others.
Key components and terms
- Schema formats: Avro, Protobuf, JSON Schema
- Subject: a named stream of schema versions (often bound to a topic's key or value)
- Compatibility modes: NONE, BACKWARD, FORWARD, FULL (+ TRANSITIVE variants)
- Wire format: messages carry a small schema identifier so consumers can fetch the correct writer schema
- SerDes: serializers/deserializers that talk to the registry
Compatibility in one breath
- BACKWARD: new readers (new schema) can read old data
- FORWARD: old readers (old schema) can read new data
- FULL: both backward and forward with the latest
- TRANSITIVE: apply the rule across all previous versions, not just the latest
Worked examples
Example 1 — Add an optional field (safe with BACKWARD/FORWARD depending on deploy order)
Old Avro schema:
{
"type": "record",
"name": "User",
"namespace": "demo.v1",
"fields": [
{"name": "id", "type": "long"},
{"name": "name", "type": "string"}
]
}New Avro schema (adds optional email with default null):
{
"type": "record",
"name": "User",
"namespace": "demo.v1",
"fields": [
{"name": "id", "type": "long"},
{"name": "name", "type": "string"},
{"name": "email", "type": ["null","string"], "default": null}
]
}- Deploy consumers first? Use BACKWARD compatibility. New readers handle old data.
- Deploy producers first? Use FORWARD so old readers can ignore the new field.
Example 2 — Rename a field without breaking consumers
You want to rename name to full_name. Do not simply change the field name. Use aliases:
{
"type": "record",
"name": "User",
"namespace": "demo.v1",
"fields": [
{"name": "id", "type": "long"},
{"name": "full_name", "type": "string", "aliases": ["name"]}
]
}Readers using the old field name will still find the data. After all producers and readers migrate, you can remove the alias in a later version (with care if using transitive rules).
Example 3 — Subject naming strategies
- topic-name-key / topic-name-value: separate subjects for keys and values per topic; common default
- record-name: subject per record type; useful when the same record appears in multiple topics
- topic-record-name: combines both
Pick the one that fits your reuse patterns and governance model. topic-name-* is simplest to start with.
Step-by-step: Register and evolve a schema
- Model the record: write Avro/Protobuf/JSON Schema capturing required vs optional fields.
- Choose subject: usually topic-name-value for message values.
- Set compatibility: start with BACKWARD or FULL; use TRANSITIVE if you need checks across history.
- Register v1: producers start using the schema; messages include the schema ID in the wire format.
- Evolve safely: when adding fields, provide defaults or make them optional; for renames, use aliases (Avro) or keep field numbers stable (Protobuf).
- Deploy in order: consumer-first favors BACKWARD; producer-first favors FORWARD; FULL covers both (latest).
- Monitor: watch registry validations and SerDe errors in logs.
Exercises
These exercises mirror the tasks below. Try them here, then open the solutions only if needed.
Exercise 1 — Pick the right compatibility mode
Scenario: Many consumers still use the previous schema. You must deploy producers first to add an optional field with a safe default. Which compatibility mode should the subject use now?
Type your answer, then compare with the solution in the Exercises section below.
Exercise 2 — Evolve an Avro schema
Given v1:
{
"type": "record",
"name": "Order",
"namespace": "shop.v1",
"fields": [
{"name": "order_id", "type": "string"},
{"name": "amount", "type": "double"}
]
}Create v2 that:
- Adds optional currency with default "USD"
- Renames amount to total using aliases
Write the full v2 schema, then compare with the solution in the Exercises section below.
- I can explain BACKWARD vs FORWARD vs FULL to a teammate
- I know how to add a field without breaking old consumers
- I can safely rename a field (aliases or stable field numbers)
- I understand subject naming strategies
Common mistakes and self-check
- Changing a field's type without a migration path. Fix: use unions (Avro), new fields, or transformation steps.
- Removing required fields immediately. Fix: deprecate first, stop producing the field, then remove after readers migrate.
- No defaults for new fields. Fix: provide sensible defaults or make fields nullable.
- Confusing deploy order and compatibility. Fix: consumer-first -> BACKWARD; producer-first -> FORWARD; uncertainty -> FULL.
- Reusing Protobuf field numbers. Fix: never reuse; reserve/remove carefully.
Self-check prompt
Given your last schema change, could the oldest active consumer still read messages after your deploy? If not, which compatibility or change would fix it?
Practical projects
- Single-topic evolution: Start with a simple User schema, register v1, then add email and rename name to full_name using aliases. Validate compatibility at each step.
- Cross-topic reuse: Use record-name strategy for a shared Address record consumed by two services. Evolve it by adding an optional field with default.
- Producer-first rollout: Simulate old consumers reading new messages. Configure FORWARD compatibility and verify old readers keep working.
Mini challenge
You need to drop a field purchase_note that only 10% of consumers still read. Design a two-release plan that avoids breakage. Mention compatibility mode(s), producer/consumer deploy order, and when to actually remove the field.
Learning path
- Start: Understand formats (Avro/Protobuf/JSON Schema) and registry concepts
- Next: Practice compatibility by evolving schemas with defaults and aliases
- Then: Choose subject naming strategies for your org
- Finally: Automate validations in CI and monitor registry/SerDe errors
Next steps
- Do the hands-on exercises below
- Take the Quick Test at the end of this page
- Apply these patterns to a real topic in your environment (in a dev cluster)
Quick Test
Anyone can take the test. Only logged-in users have their progress saved.