Menu

Topic 2 of 8

Data Contracts Concepts

Learn Data Contracts Concepts for free with explanations, exercises, and a quick test (for Data Architect).

Published: January 18, 2026 | Updated: January 18, 2026

Why this matters

Data contracts turn implicit expectations into explicit, testable agreements between data producers and consumers. As a Data Architect, you use contracts to reduce breaking changes, improve data quality, and enable independent evolution of services and pipelines.

  • Real task: Define a stable event schema for order events across microservices.
  • Real task: Specify SLAs/SLOs for a nightly batch feed and enforce validation on arrival.
  • Real task: Design versioning and deprecation rules so downstream dashboards don’t break.

Who this is for

  • Data Architects who define integration standards and guardrails.
  • Data Engineers building ingestion, ETL/ELT, and event pipelines.
  • Platform/Integration Engineers responsible for schema registries and validation.
  • Analytics Engineers/Consumers who rely on stable, trusted inputs.

Prerequisites

  • Basic data modeling (entities, attributes, types, nullability).
  • Familiarity with ETL/ELT patterns and batch vs streaming.
  • Awareness of common formats (CSV, JSON, Avro, Parquet) and APIs/events.

Concept explained simply

A data contract is a human-readable and machine-validated agreement about the structure, meaning, and delivery of data between a producer and a consumer. It answers: what fields exist, what they mean, when data arrives, how complete/accurate it must be, how changes happen, and who owns it.

Mental model

Think of a data contract as a “service contract for data.” Like an API contract, it defines inputs/outputs and reliability. But it also covers semantics (business meaning), quality expectations, and the change process. If you change the contract, you negotiate and version it—never surprise consumers.

Core elements of a data contract

  1. Structure: schema, types, nullability, constraints, examples.
  2. Semantics: field definitions, units, reference data, lineage notes.
  3. Operational SLOs: delivery cadence/window, latency, completeness, freshness.
  4. Quality rules: uniqueness, referential integrity, valid ranges/patterns.
  5. Change management: versioning policy, deprecation windows, approval workflow.
  6. Ownership & support: producer owner, consumer list, escalation channel.
  7. Security & privacy: PII flags, masking rules, access classification.
Tip: Versioning policy in one glance
  • Non-breaking: add optional field, widen enum (if consumers ignore unknowns), relax constraint.
  • Breaking: rename/remove field, change type narrowing, tighten constraint, alter semantics.
  • Policy: use semantic versioning (MAJOR.MINOR.PATCH), keep at least one version overlap during migration.

Worked examples

Example 1 — Event contract (Kafka: order_created)

Scenario: E-commerce service emits order_created events used by fulfillment and BI.

{
  "name": "order_created",
  "version": "1.2.0",
  "schema": {
    "order_id": {"type": "string", "nullable": false},
    "customer_id": {"type": "string", "nullable": false},
    "total_amount": {"type": "number", "nullable": false, "unit": "USD"},
    "items": {"type": "array", "items": {
      "sku": {"type": "string"},
      "qty": {"type": "integer", "min": 1},
      "price": {"type": "number"}
    }},
    "coupon_code": {"type": "string", "nullable": true}
  },
  "semantics": {
    "total_amount": "Sum of item price * qty after discounts, before tax"
  },
  "slo": {"latency_ms_p95": 5000, "availability": "99.9%"},
  "quality": {"order_id_unique": true, "customer_id_exists_in_crm": true},
  "change_management": {"non_breaking": ["add optional fields"], "breaking": ["rename fields"]}
}

Evolution: Adding optional coupon_code is non-breaking; renaming total_amount is breaking and requires a major version.

Example 2 — Batch file contract (CSV to object storage)

Scenario: Finance sends daily payments_YYYYMMDD.csv by 02:00 UTC.

{
  "dataset": "payments_daily",
  "format": {"type": "csv", "delimiter": ",", "header": true, "encoding": "utf-8"},
  "columns": [
    {"name": "payment_id", "type": "string", "nullable": false},
    {"name": "order_id", "type": "string", "nullable": false},
    {"name": "amount_usd", "type": "number", "nullable": false, "min": 0},
    {"name": "paid_at", "type": "timestamp", "nullable": false, "timezone": "UTC"}
  ],
  "delivery": {"cadence": "daily", "deadline_utc": "02:00", "partitioning": "dt=YYYY-MM-DD"},
  "quality": {"completeness": ">= 99.5% rows vs source", "duplicates": "0 by payment_id"},
  "security": {"pii": false},
  "ownership": {"producer": "FinanceOps", "consumers": ["DataPlatform", "BI"]}
}

Validation on arrival: check header order, types, duplicates, and completeness metric before loading.

Example 3 — Reference data API contract (REST)

Scenario: Catalog service exposes /v1/categories.

  • Structure: id (string), name (string), parent_id (nullable string).
  • SLO: p95 latency < 300 ms, 99.9% uptime.
  • Change policy: Add optional fields in v1; breaking changes move to v2 with 90-day deprecation notice.

Evolution: Adding display_order (optional) in v1 is safe; changing id to integer requires v2.

Exercises and design checklist

Use this checklist before drafting a contract:

  • Structure: types, nullability, constraints, examples provided
  • Semantics: clear definitions and units
  • SLOs: delivery time, latency/freshness, availability
  • Quality: uniqueness, completeness, valid ranges, referential rules
  • Change policy: versioning, deprecation window, approval
  • Ownership: producer, consumers, escalation
  • Security/privacy classification

Exercise 1 — Draft a minimum viable data contract

Write a minimal data contract for a daily Payments CSV feed into your lake. Include structure, semantics, delivery SLO, quality rules, versioning, and ownership.

Exercise 2 — Change impact assessment

Given proposed changes to an existing contract, categorize each as Non-breaking or Breaking and suggest the action (same version vs new major and deprecation):

  • Add field refund_reason (nullable string)
  • Change amount_usd from number to string
  • Tighten amount_usd min from 0 to 1
  • Rename paid_at to payment_timestamp
  • Add enum value "MANUAL" to payment_method

Common mistakes and self-check

  • Mistake: Only documenting schema, ignoring semantics and quality. Self-check: Does every critical field have a clear definition and unit?
  • Mistake: Treating versioning ad hoc. Self-check: Can you explain what triggers a major vs minor version in one sentence?
  • Mistake: No validation at the boundary. Self-check: Is there an automated check that blocks bad data before it lands?
  • Mistake: Hidden ownership. Self-check: Is there a named producer owner and escalation channel?
  • Mistake: Instant deprecations. Self-check: Is there a published deprecation window and overlap period?

Practical projects

  • Project 1: Create a reusable JSON template for data contracts covering structure, semantics, SLOs, quality, ownership, and change policy.
  • Project 2: Build a CI step that validates sample payloads/files against the contract (e.g., JSON Schema/CSV schema) and fails on violations.
  • Project 3: Implement a contract change proposal form with automatic diff detection and a deprecation timeline generator.

Learning path

  1. Start: Learn schema fundamentals (types, nullability, constraints).
  2. Add semantics: Write concise field definitions and units.
  3. Operationalize: Define SLOs and quality checks; set enforcement points (ingest validators).
  4. Evolve safely: Adopt semantic versioning and a deprecation playbook.
  5. Automate: Templates, validators, CI hooks, and contract catalogs/registries.

Mini challenge

Your consumer needs item_discount per line. You can either (A) add optional item_discount to items, or (B) modify total_amount to exclude discounts. Which do you choose and why?

Suggested answer

Choose (A) add optional item_discount. It is non-breaking and preserves existing semantics. Option (B) changes meaning of total_amount (breaking), requiring a major version and coordinated migration.

Next steps

  • Apply the checklist to one upstream you depend on.
  • Draft a 1-page contract and share it with producer and consumer teams for feedback.
  • Automate validation for at least one rule (e.g., uniqueness or schema conformity).

Quick Test

Take the quick test below to check understanding. Everyone can take it; logged-in learners get saved progress.

Practice Exercises

2 exercises to complete

Instructions

Create a minimal data contract for a daily Payments CSV delivered to object storage by 02:00 UTC with columns: payment_id, order_id, amount_usd, paid_at. Include:

  • Structure: types, nullability, constraints
  • Semantics: brief field meanings
  • SLOs: delivery deadline, completeness target
  • Quality rules: uniqueness, valid ranges
  • Change policy: non-breaking vs breaking examples
  • Ownership: producer and consumers
Expected Output
A concise JSON/YAML-like contract including schema, semantics, SLOs, quality rules, versioning policy, and ownership.

Data Contracts Concepts — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Data Contracts Concepts?

AI Assistant

Ask questions about this tool