luvv to helpDiscover the Best Free Online Tools
Topic 8 of 8

Defining And Tracking Data Contracts

Learn Defining And Tracking Data Contracts for free with explanations, exercises, and a quick test (for Analytics Engineer).

Published: December 23, 2025 | Updated: December 23, 2025

Why this matters

Analytics Engineers are often surprised by silent upstream changes: a column disappears, enums expand, or timestamps arrive late. Data contracts prevent surprises by agreeing on what data will look like, how reliable it will be, and what happens when change is needed. With contracts, you can:

  • Stop schema-breaking deploys before they hit production.
  • Set clear freshness and completeness expectations with producers.
  • Automate monitors that map directly to those expectations.
  • Reduce incident time-to-detect and time-to-recover.

Concept explained simply

A data contract is a handshake between data producers and data consumers. It contains three things:

  • The spec: schema, definitions, acceptable values, and SLAs (freshness, volume, uniqueness).
  • The monitors: tests and alerts that verify the spec continuously.
  • The change process: versioning rules, deprecation windows, and who to contact.

Mental model

Think of it like a service-level agreement for data. The producer promises: "We will deliver this shape of data, at this cadence, with these quality guarantees." The consumer promises: "We will test and alert, give feedback early, and follow the agreed change process."

Definitions cheat sheet
  • Freshness SLA: Maximum allowed delay between event occurrence and availability in analytics.
  • Completeness SLA: Minimum percentage of expected rows delivered in a time window.
  • Schema stability: Rules for adding, updating, or removing fields.
  • Semantics: What a field means in the business context (not just its type).
  • Breaking change: Any change that would break downstream logic or assumptions.

Core components of a data contract

  • Scope & ownership: Producer team, consumer teams, business purpose, single owner contact.
  • Entities & fields: Name, type, description, nullability, constraints, PII classification, enumerations.
  • Quality SLAs: Freshness, completeness, uniqueness, allowed duplicates, volume bands, referential integrity.
  • Backfills & historical corrections: If/when backfills occur and how they are communicated.
  • Versioning & change management: SemVer, deprecation periods, migration windows, change notice timelines.
  • Monitoring & alerts: Tests mapped to each SLA, severity levels, run frequency, alert routing.
  • Incident handling: How to raise, triage, rollback, and communicate status.

Worked examples

Example 1: Product signup event

entity: user_signup_event
owner: growth-eng@company
purpose: Measure signup funnel and activation
schema:
  - name: event_id
    type: string
    constraints: [not_null, unique]
  - name: user_id
    type: string
    constraints: [not_null]
  - name: occurred_at
    type: timestamp
    constraints: [not_null]
  - name: signup_method
    type: string
    allowed_values: [email, google, apple]
  - name: country
    type: string
    nullable: true
quality_sla:
  freshness_minutes: 15
  completeness_last_24h_pct: >= 99
  duplicate_rate_pct: <= 0.1
change_mgmt:
  version: 1.2.0
  add_field: allowed with 7-day notice
  breaking_change: 30-day deprecation + parallel fields
monitoring:
  tests: [not_null(event_id), unique(event_id), enum(signup_method), freshness(15m)]
  severity: {freshness: high, uniqueness: high, enum: medium}
  alerts: #growth-eng-slack, oncall pager

Notes: adding a new signup_method requires notice; a missing event_id triggers a high-severity alert.

Example 2: Payments table

entity: payments
owner: billing-platform@company
purpose: Revenue reporting and refunds
schema:
  - name: payment_id
    type: string
    constraints: [primary_key]
  - name: order_id
    type: string
    constraints: [not_null, foreign_key(orders.order_id)]
  - name: amount_cents
    type: integer
    constraints: [not_null, >= 0]
  - name: currency
    type: string
    allowed_values: [USD, EUR, GBP]
  - name: status
    type: string
    allowed_values: [pending, captured, refunded, failed]
  - name: processed_at
    type: timestamp
    constraints: [not_null]
quality_sla:
  freshness_minutes: 30
  completeness_last_hour_pct: >= 99.5
  referential_integrity_errors_per_day: 0
change_mgmt:
  version: 2.0.0
  breaking_change: requires ADR and 45-day window
monitoring:
  tests: [pk_unique(payment_id), not_null(order_id), fk_orders, enum(status), freshness(30m), volume(0.5x..2x 7d avg)]

Notes: A volume spike test guards against duplicate ingestion or missing batches.

Example 3: Vendor CSV drop

entity: ad_spend_daily
owner: marketing-ops@company
source: s3://vendor-bucket/spend/YYYY-MM-DD.csv
schema:
  - date: date not_null
  - channel: string enum[search, social, display]
  - spend_usd: numeric >= 0
quality_sla:
  delivery_deadline_utc: 06:00
  backfill_policy: vendor may correct last 7 days; notification required
monitoring:
  tests: [freshness(before 06:15 UTC), enum(channel), spend_nonnegative, completeness(daily row per channel)]

Notes: Delivery time is the key SLA in file-based feeds.

Tracking and monitoring

  • Translate each SLA to a test: freshness check, volume band, uniqueness, not_null, enum/regex, referential integrity.
  • Automate tests in your pipeline tool (for example: warehouse SQL tests, dbt tests, Great Expectations, Soda). Pick one and standardize.
  • Classify severities: high (break pipelines), medium (alert), low (log only).
  • Add pre-production gates: validate samples in staging; block deploys on failing contract tests.
  • Route alerts to owners with clear runbooks: when to rollback, re-run, or escalate.
Sample SQL monitors
-- Uniqueness
select payment_id from payments
group by 1 having count(*) > 1;

-- Freshness (minutes since last row)
select extract(epoch from (now() - max(processed_at)))/60 as freshness_min from payments;

-- Enum guard
select status from payments
where status not in ('pending','captured','refunded','failed')
limit 1;

Step-by-step: create a contract

  1. Identify critical questions. What decisions rely on this dataset? What breaks if shape or timeliness changes?
  2. Draft the spec. List fields, definitions, types, nullability, enums, constraints. Propose freshness and completeness SLAs.
  3. Negotiate with producers. Confirm feasibility, choose owners, and agree on change windows.
  4. Implement tests. Map each SLA to a monitor. Add pre-prod checks.
  5. Alerting & runbooks. Define severity thresholds and who responds.
  6. Versioning. Use semantic versioning; schedule deprecations; document migrations.
  7. Review cadence. Quarterly review SLAs and incidents; adjust as needed.
Mini task: pick SLAs fast

Start with defaults: freshness 30m, completeness 99% daily, uniqueness per primary key, allowed enums documented. Adjust after a week of observations.

Exercises (practice)

Do these in your notes or editor. Solutions are provided below each exercise.

Exercise 1 — Draft a contract for an orders table

ID: ex1

You ingest an orders table used by Finance and BI. Define a contract with: purpose, owner, schema (at least: order_id, user_id, created_at, status, total_cents, currency), SLAs (freshness, completeness, uniqueness), change policy, and monitoring tests.

Checklist
  • Owner and purpose included
  • Primary key uniqueness
  • Enum for status
  • Currency allowed values
  • Freshness SLA
  • Completeness metric
  • Change management rules
  • Tests mapped to SLAs

Exercise 2 — Write monitors for the contract

ID: ex2

Based on your orders contract, write SQL (or pseudo-config) for: uniqueness on order_id, enum guard on status, freshness check (max allowed 20 minutes), and a 7-day volume band test (0.6x–1.6x of trailing average).

Checklist
  • Uniqueness query returns zero rows on success
  • Enum guard catches any unexpected status
  • Freshness expressed in minutes
  • Volume band compares today vs. trailing 7 days

Common mistakes (and self-check)

  • Only schema, no semantics. Self-check: does each field have a clear business definition?
  • Too strict SLAs. Self-check: do SLAs reflect realistic producer capabilities?
  • No owner or pager. Self-check: can someone respond to alerts within minutes?
  • Ignoring backfills. Self-check: do you document how historical corrections are handled?
  • Alert fatigue. Self-check: are low-severity issues logged but not paged?
  • No versioning policy. Self-check: do breaking changes have a deprecation window?

Practical projects

  • Create a contract for your top-3 critical datasets and implement monitors for each SLA.
  • Set up a pre-production contract test that blocks deploys when enums are violated.
  • Simulate a breaking change (rename a column) and walk through your deprecation plan.
  • Build a simple dashboard that shows SLA compliance: freshness, volume, uniqueness over time.

Who this is for

  • Analytics Engineers who own modeling and testing in the warehouse.
  • Data Engineers building ingestion and transformation pipelines.
  • BI Developers who rely on stable, well-defined datasets.

Prerequisites

  • Comfortable with SQL and warehouse concepts.
  • Basic understanding of data modeling and dimensional design.
  • Familiarity with testing in your stack (for example, dbt tests, Great Expectations, or SQL-based checks).

Learning path

  • Before this: data modeling fundamentals, basic data quality tests.
  • This lesson: define contracts, map SLAs to monitors, plan change management.
  • Next: incident response, SLAs dash-boarding, and producer-validated schemas.

Next steps

  • Pick one critical dataset and ship a minimal contract this week.
  • Add two monitors tied to real SLAs (freshness + uniqueness).
  • Schedule a 30-minute review with the producer team to agree on change windows.

Mini challenge (15–20 min)

Choose any event stream you use (e.g., page_view or checkout). Draft a one-page contract: five fields with definitions, freshness 10–20 minutes, completeness 99%, and a change policy. Then write two SQL checks to enforce it.

Note: The quick test below is available to everyone. Only logged-in users get saved progress.

Practice Exercises

2 exercises to complete

Instructions

Write a concise contract for an orders dataset consumed by Finance and BI. Include:

  • owner and purpose
  • schema for: order_id (PK), user_id, created_at, status, total_cents, currency
  • SLAs: freshness (in minutes), completeness (last 24h), uniqueness (order_id)
  • change policy with versioning
  • monitoring tests mapped to each SLA
Expected Output
A YAML-like or structured snippet including owner, schema with constraints and enums, SLAs (freshness/completeness/uniqueness), change policy, and a list of tests.

Defining And Tracking Data Contracts — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Defining And Tracking Data Contracts?

AI Assistant

Ask questions about this tool