How to learn Schema Registry And Contracts for Data Catalog And Governance in Data Platform Engineer for free

Why this matters

Data platforms live and breathe on shared data. Schema registries and contracts ensure producers and consumers can evolve independently without breaking each other. As a Data Platform Engineer, you will:

Prevent breaking data changes across Kafka topics, streams, and APIs.
Enforce compatibility rules and versioning in CI/CD.
Provide a single source of truth for schemas used across teams.
Audit who changed what, when, and why for governance and compliance.

Concept explained simply

A schema registry stores versions of data schemas (Avro, Protobuf, JSON Schema) and enforces rules that define which changes are allowed. A contract is the agreement between data producers and consumers about the shape and meaning of data. Registries automate and enforce those agreements.

Mental model

Schema = blueprint of a message.
Subject = logical name under which schema versions are stored (e.g., topic-value, topic-key).
Compatibility mode = rule for schema evolution (backward, forward, full, transitive).

Think of it as version-controlled interfaces for data. Just like APIs have versioned endpoints, your data messages have versioned schemas with evolution rules.

Core concepts

Formats: Avro (compact, great with Kafka), Protobuf (IDL-first, language-neutral), JSON Schema (human-readable, flexible).
Compatibility modes:
- Backward: New readers can read old data.
- Forward: Old readers can read new data.
- Full: Both backward and forward hold.
- Transitive: Rule must hold across all historical versions, not just the latest.
Safe evolution patterns: add optional fields with defaults, add nullable fields, add enum symbols (with care), deprecate without removing.
Risky changes: rename without aliases, remove required fields, change types incompatibly, reuse Protobuf field numbers without reserving.
Subjects: typically one subject per topic-value and per topic-key in Kafka; align naming with environments and domains.
Governance: approvals for breaking changes, lineage and audit trails, policy checks (e.g., PII tags in schema metadata).

Worked examples

Example 1: Backward-compatible Avro change (add optional field)

{
  "type": "record",
  "name": "User",
  "namespace": "example.v1",
  "fields": [
    {"name": "id", "type": "string"},
    {"name": "email", "type": "string"}
  ]
}

Evolve by adding an optional field with a default:

{
  "type": "record",
  "name": "User",
  "namespace": "example.v1",
  "fields": [
    {"name": "id", "type": "string"},
    {"name": "email", "type": "string"},
    {"name": "phone", "type": ["null", "string"], "default": null}
  ]
}

With backward compatibility, consumers using the new schema can read existing data (which lacks phone) because a default is provided.

Example 2: Avoiding a breaking rename with Avro aliases

Original:

{"name":"email","type":"string"}

Renamed to contact_email breaks older consumers. Use an alias:

{
  "name": "contact_email",
  "type": "string",
  "aliases": ["email"]
}

Aliases allow readers to map old field names to new ones, preserving compatibility.

Example 3: Protobuf field number safety

Original .proto:

message Order {
  string id = 1;
  string status = 2; // e.g., NEW, SHIPPED
}

Removing field 2 and reusing number 2 for a different meaning can corrupt reads. Correct approach:

message Order {
  string id = 1;
  reserved 2;
}

Use reserved to block reuse of old tags. Add new fields with new numbers.

Example 4: Forward compatibility with JSON Schema

Old schema (consumer):

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "type": "object",
  "required": ["id"],
  "properties": {"id": {"type": "string"}}
}

Producer adds an optional field:

{
  "type": "object",
  "required": ["id"],
  "properties": {
    "id": {"type": "string"},
    "note": {"type": "string"}
  },
  "additionalProperties": false
}

If consumers ignore unknown fields or their validators allow extra properties expected by forward mode, old consumers still function.

How to choose a format

Avro: streaming events in Kafka, strong evolution story, compact binary.
Protobuf: cross-language RPC and events, strict IDL, efficient.
JSON Schema: human-friendly, great for REST payloads and configuration.

Pick one per domain for consistency. Document the default compatibility mode per domain.

Governance patterns that work

Subject naming: domain.team.topic-value and topic-key for clarity.
Compatibility defaults: start with backward or full; use transitive in critical domains.
CI policy: block merges on breaking changes; require approvers for full-mode changes.
Metadata: tag fields with sensitivity (e.g., PII) and retention hints.
Change proposals: require a short migration note for consumers.

Step-by-step: set up a schema workflow locally

Pick a format and define v1 of a small schema (User, Order, or Event).
Decide a compatibility mode (try backward).
Create v2 with one safe change (optional field with default).
Create v3 with a risky change (rename without alias). Predict the registry decision.
Add governance metadata: owner, description, tags.

Mini task: write a migration note

Describe in 3 sentences what changed from v1 to v2, why it is compatible, and what consumers must do (usually nothing).

Your exercises

Complete the exercises below. Then check your answers.

Exercise 1: Avro evolution with backward compatibility.
Exercise 2: Protobuf safety and contract decision-making.

Checklist before submitting:

You chose a compatibility mode and justified it.
You identified any breaking change and proposed a fix.
You included defaults or aliases where needed.

Common mistakes and self-check

Renaming without aliases (Avro) or without deprecation mapping.
Reusing Protobuf field numbers; always reserve removed tags.
Removing required fields; prefer deprecate or make optional with defaults.
Forgetting key schemas for Kafka; key evolution rules can differ from value.
Switching formats midstream without a migration plan.

Self-check: Given your last schema change, can all current consumers read messages produced before and after your change? If not, it is not backward compatible.

Practical projects

Build a sample Kafka topic with Avro User events, enforce backward-TRANSITIVE compatibility, and evolve through 4 versions.
Create a Protobuf contract for an Order service, add fields across versions, and demonstrate reserved tags for removed fields.
Define JSON Schemas for a REST payload with a lint rule set that blocks breaking changes in CI.

Who this is for

Data Platform Engineers managing shared event streams and APIs.
Data Engineers producing/consuming data across teams.
Developers integrating services via events or contracts.

Prerequisites

Basic understanding of data serialization (JSON, binary formats).
Familiarity with Kafka or message queues helps but is not required.
Comfort reading simple Avro, Protobuf, or JSON Schema definitions.

Learning path

Learn schema formats (Avro, Protobuf, JSON Schema).
Understand registry subjects and compatibility modes.
Practice safe evolution patterns and governance policies.
Automate checks in CI and document contracts.

Mini challenge

You maintain a topic-value subject set to FULL-TRANSITIVE. You need to add a new optional field and remove a legacy required field. How do you achieve this without breaking compatibility? Outline the steps in 5 bullet points.

Next steps

Finish the exercises below and take the quick test.
Apply these patterns in one of the Practical projects.
Add contract checks to a CI pipeline in your environment.

Note: The quick test is available to everyone; only logged-in users get saved progress.

Menu

Schema Registry And Contracts

Table of Contents

Why this matters

Concept explained simply

Mental model

Core concepts

Worked examples

How to choose a format

Governance patterns that work

Step-by-step: set up a schema workflow locally

Your exercises

Common mistakes and self-check

Practical projects

Who this is for

Prerequisites

Learning path

Mini challenge

Next steps

Practice Exercises

Avro evolution with backward compatibility

Instructions

Expected Output

Protobuf contract safety and decisions

Schema Registry And Contracts — Quick Test

Have questions about Schema Registry And Contracts?

AI Assistant