luvv to helpDiscover the Best Free Online Tools
Topic 6 of 8

Schema Registry Basics

Learn Schema Registry Basics for free with explanations, exercises, and a quick test (for Data Engineer).

Published: January 8, 2026 | Updated: January 8, 2026

Why this matters

In streaming systems, producers and consumers evolve independently. Without a schema registry, a producer could change message shape and silently break consumers. A Schema Registry stores and versions schemas (Avro, Protobuf, JSON Schema), enforces compatibility rules, and lets teams evolve data safely.

  • Real tasks you will face: designing event contracts, reviewing schema changes in PRs, setting compatibility policies per topic, debugging consumer deserialization errors, and migrating fields over time.
  • Typical outcomes: fewer breaking changes, faster team autonomy, easier governance, and reliable

Concept explained simply

A schema says what a message looks like: field names, types, and defaults. The Schema Registry is a catalog for these schemas with rules about how they can change.

Mental model: a contract office for your events

Imagine every topic or record has a contract file. The registry is the office that stores the contract versions and checks new drafts. Compatibility modes are the office policies (e.g., "new drafts must work with all previous clients"). The message on the wire includes a tiny label (schema id) so readers know exactly which contract version was used.

Core concepts

  • Serialization formats: Avro, Protobuf, JSON Schema.
  • Subject: the name under which a schema is versioned (e.g., topic-name-value).
  • Schema id: a number the registry issues; the id goes into messages so consumers fetch the right schema.
  • Compatibility modes: NONE, BACKWARD, BACKWARD_TRANSITIVE, FORWARD, FORWARD_TRANSITIVE, FULL, FULL_TRANSITIVE.
  • Schema evolution: changing schemas safely with defaults, nullability, aliases, and type rules.
  • References: one schema importing another (common with Protobuf or shared Avro types).
  • Transitive vs non-transitive checks: compare against just the previous version or all historical versions.
Wire format, simply

Many registries use a small framing: magic byte (marks SR framing), then a 4‑byte schema id, then the serialized payload. Consumers read the id, fetch schema (or from cache), then decode.

Worked examples

Example 1: Adding a new field

v1 has fields id (long), email (string). v2 adds country (string) with default "US".

  • Backward: OK (new readers can read old data; default fills missing field).
  • Forward: OK (old readers ignore extra fields they don’t know).
  • Full: OK (both directions hold).
Why this works

Avro’s resolution allows missing reader fields if a default exists, and it ignores writer fields the reader doesn’t know.

Example 2: Renaming a field

v1 field email (string). v2 renames to contact_email (string) without alias.

  • Backward: Breaks (reader expects email; writer has contact_email).
  • Fix: Add alias so the reader can map old to new.
// Avro snippet
{"name":"contact_email","type":"string","aliases":["email"]}

Example 3: Choosing compatibility mode

You have many consumer teams at different upgrade speeds. Picking BACKWARD or FULL reduces breakage risk:

  • BACKWARD: safe when you deploy consumers first.
  • FORWARD: safe when you deploy producers first.
  • FULL: safe in both directions, stricter but reliable for heterogeneous fleets.

Hands-on: do it on paper

  1. Pick a subject strategy: If the same record type is shared across topics, record-name may be best. If each topic evolves independently, topic-name-value is simple and clear.
  2. Write v1: List fields, types, and which can be null.
  3. Plan evolution: For each future change, specify default or alias, and whether it passes your chosen compatibility mode.
  4. Simulate a change: “Add optional field X with default.” Check backward, forward, full.
  5. Simulate a breaking change: “Rename field without alias.” Decide the mitigation plan.

Exercises

Exercise 1 — Check compatibility: add a field

Avro v1:

{
  "type":"record",
  "name":"User",
  "fields":[
    {"name":"id","type":"long"},
    {"name":"email","type":"string"}
  ]
}

Proposed v2 (adds country with default):

{
  "type":"record",
  "name":"User",
  "fields":[
    {"name":"id","type":"long"},
    {"name":"email","type":"string"},
    {"name":"country","type":"string","default":"US"}
  ]
}
  • Decide: Backward? Forward? Full?
  • State your reasoning in 2–3 sentences.
Show solution

Backward: compatible (reader v2 can read writer v1; default fills country). Forward: compatible (reader v1 ignores extra writer field). Full: compatible since both directions hold. Reason: Avro resolution rules for added fields with defaults and ignoring unknown fields.

Exercise 2 — Safe rename with aliases

Avro v1:

{
  "type":"record",
  "name":"User",
  "fields":[
    {"name":"id","type":"long"},
    {"name":"email","type":"string"}
  ]
}

Proposed v2 (renames email to contact_email):

{
  "type":"record",
  "name":"User",
  "fields":[
    {"name":"id","type":"long"},
    {"name":"contact_email","type":"string"}
  ]
}
  • Is this compatible under BACKWARD? Under FULL?
  • Provide a corrected v2 that is compatible and explain why.
Show solution

Original change is incompatible (reader/writer field mismatch). Corrected v2:

{
  "type":"record",
  "name":"User",
  "fields":[
    {"name":"id","type":"long"},
    {"name":"contact_email","type":"string","aliases":["email"]}
  ]
}

With alias, BACKWARD passes; with additional checks in place, FULL can also pass (since old readers ignore unknown fields, and alias resolves names when new readers read old data).

  • Self-check checklist
    • I can explain what subject, schema id, and compatibility mode mean.
    • I know how to add a field safely (default + nullability when needed).
    • I can rename a field using aliases (or a producer/consumer migration plan).
    • I understand when to pick BACKWARD vs FORWARD vs FULL.
    • I can choose a subject naming strategy for a scenario.

Common mistakes and self-check

  • Adding a field without a default under BACKWARD mode.
    Fix: Provide a sensible default or make it nullable with a default null where appropriate.
  • Renaming fields without aliases (Avro) or without field numbers care (Protobuf).
    Fix: Use aliases in Avro; in Protobuf never reuse field numbers and prefer deprecation over deletion.
  • Switching compatibility modes mid-stream without plan.
    Fix: Announce changes; use staged rollouts; prefer FULL or BACKWARD for mixed fleets.
  • Subject strategy mismatch: same record used across topics but subject is topic-based.
    Fix: Use record-name strategy when sharing a common type across topics.
  • Assuming JSON without schema is fine for evolution.
    Fix: Use JSON Schema in registry to validate and evolve safely.
Quick self-audit
  • Can I simulate deserialization both ways (old reader/new writer; new reader/old writer)?
  • Do all added fields have defaults or are nullable by design?
  • Are breaking changes behind flags or versioned topics when needed?

Practical projects

  • Design an Avro User v1 and evolve to v3: add country (default), rename email with alias, deprecate age. Record which compatibility modes pass each step.
  • Create a shared CommonAddress type referenced by two records. Practice changing CommonAddress with transitive checks.
  • Subject strategy drill: Write a one-pager picking strategies for three scenarios (shared record across topics; per-topic independence; multi-language microservices).

Who this is for

  • Data Engineers building event-driven pipelines.
  • Backend Engineers producing/consuming Kafka-like streams.

Prerequisites

  • Basic understanding of producers/consumers and topics.
  • Comfort with JSON and one schema format (Avro or Protobuf) basics.

Learning path

  • Before this: Message formats and serialization basics.
  • This lesson: Schema Registry and compatibility.
  • Next: Stream contracts in practice (versioning strategies, rollout plans), then stream processing with exactly-once semantics and idempotency.

Next steps

  • Adopt a default compatibility policy (e.g., FULL_TRANSITIVE) and document exceptions.
  • Introduce schema change checks in CI with a small sample catalog.
  • Practice wire-level thinking: how ids map to cached schemas in your client library.

Quick Test

Note: Anyone can take this quick test for free. Only logged-in users will have their progress saved.

Mini challenge

Your team plans to split a monolithic topic into payments_eu and payments_us, both carrying the same Payment record. Choose a subject strategy and a compatibility mode, and write two rules for safe evolution over the next six months. Keep it under 6 bullet points.

Practice Exercises

2 exercises to complete

Instructions

Avro v1:

{
  "type":"record",
  "name":"User",
  "fields":[
    {"name":"id","type":"long"},
    {"name":"email","type":"string"}
  ]
}

Proposed v2 (adds country with default):

{
  "type":"record",
  "name":"User",
  "fields":[
    {"name":"id","type":"long"},
    {"name":"email","type":"string"},
    {"name":"country","type":"string","default":"US"}
  ]
}
  • Decide: Backward? Forward? Full?
  • State your reasoning in 2–3 sentences.
Expected Output
Backward: compatible; Forward: compatible; Full: compatible. Reason: defaults fill missing fields; unknown fields are ignored by older readers.

Schema Registry Basics — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Schema Registry Basics?

AI Assistant

Ask questions about this tool