Menu

Topic 3 of 8

Reconciliation Patterns

Learn Reconciliation Patterns for free with explanations, exercises, and a quick test (for Data Architect).

Published: January 18, 2026 | Updated: January 18, 2026

Why this matters

Example 3: Watermark and streaming offsets

Goal: Ensure events up to 10:00 have been processed into lake table events_raw.

-- Watermark completeness
SELECT MAX(event_time) AS max_event_time, COUNT(*) AS cnt
FROM lake.events_raw
WHERE event_time <= '2026-01-18T10:00:00Z';

-- Expected watermark = 10:00; raise alert if max_event_time < 10:00.

-- Kafka offset coverage (pseudo)
-- Source: topic X partition 0..N committed offsets
-- Target: persisted last_ingested_offset per partition
-- Check: target_offset >= source_committed_offset for all partitions
Example 4: Balance check for ledger

Goal: Verify debits equal credits per day and currency in the warehouse.

SELECT txn_date, currency,
       ROUND(SUM(CASE WHEN side='DEBIT' THEN amount ELSE 0 END), 2) AS debits,
       ROUND(SUM(CASE WHEN side='CREDIT' THEN amount ELSE 0 END), 2) AS credits
FROM dwh.ledger
WHERE txn_date = CURRENT_DATE - INTERVAL '1 day'
GROUP BY txn_date, currency
HAVING ABS(debits - credits) > 0.01;  -- rows returned indicate mismatches

Designing reconciliation in your architecture

  1. Pick invariants per dataset. Start with counts + key sums, then add hashes or domain checks.
  2. Define windows. Daily for batch; minute/hour for streaming. Include lateness policy.
  3. Capture controls at source and target. Store in a reconciliation_log with fields: dataset, window_start, window_end, record_count, numeric_sums, window_hash, watermark, offsets, status, run_id, computed_at.
  4. Compare and alert. Tolerances for rounding and expected lateness. Escalate only after X consecutive failures.
  5. Remediate. Support idempotent replay and backfill; track retries and final state.
  6. Observe over time. Trend counts and sums to spot drift, not just point failures.
What good looks like
  • Automated controls for every critical dataset.
  • Fast, cheap checks run on every load.
  • Human-readable failure messages with next actions.
  • Re-run is safe and deterministic.

Exercises

These match the tasks listed below. You can complete them with any SQL-capable environment or by writing pseudo-SQL.

Exercise 1 (ex1): Control totals

You ingested orders for 2026-01-15. Source landing shows 125,000 rows and total_amount sum of 3,487,912.57. Your warehouse table orders_fact for the same date shows 124,998 rows and total_amount sum of 3,487,912.57.

  • Compute differences and decide if the load passes or fails.
  • Write the minimal SQL or pseudo-logic to compare and set a status.
  • Propose a remediation step if it fails.
Checklist
  • Compared counts
  • Compared sums with tolerance
  • Clear pass/fail rule
  • Remediation idea

Exercise 2 (ex2): Hash-based window check

For customer_dim on 2026-01-15, compute a stable window hash using customer_id, normalized name, and status. The source and target hashes differ.

  • List two likely causes.
  • Write a key-based diff query outline to find mismatched rows.
  • Suggest a guard to make the hash stable.
Checklist
  • Two plausible causes
  • Diff outline by key
  • Stability guard noted

Common mistakes and how to self-check

  • Relying on counts only. Add at least one numeric sum or hash.
  • Ignoring null/whitespace normalization in hashes. Always standardize.
  • Full table row-by-row compares on large data. Use windows and partitions.
  • No lateness policy. Watermark checks will false-alarm without it.
  • Mixing event time and processing time. Reconcile on event time.
  • Unbounded tolerances. Define small, explicit thresholds per metric.
  • Not checking deletes. Include tombstones or compare key sets.
  • One-off reconciliation scripts. Centralize into a reusable control log.
Self-check prompts
  • Can I explain which invariant catches which failure mode?
  • Do I have windowed checks for both batch and streaming?
  • Are my checks cheap enough to run every load?
  • Can I re-run a failed window safely?

Practical projects

  • Build a reconciliation_log table and instrument two critical datasets with counts, sums, and hashes.
  • Add a watermark check for one streaming topic and store committed offsets vs ingested offsets.
  • Implement a daily balance check for a ledger-like dataset and auto-create a remediation backfill job.
Mini tasks
  • Create a stable hash UDF that lowercases strings and replaces nulls.
  • Write a query that lists top 5 days with largest count deltas.
  • Design an alert message template with precise next actions.

Who this is for

  • Data Architects defining quality standards.
  • Data/Analytics Engineers building pipelines.
  • Platform teams offering observability as a service.

Prerequisites

  • Intermediate SQL (aggregation, joins, windowing).
  • Familiarity with batch/streaming concepts.
  • Basic understanding of CDC and partitions.

Learning path

  1. Start with control totals on 1–2 datasets.
  2. Add stable window hashes and key-based diffs for critical tables.
  3. Introduce watermark and offset checks for streaming.
  4. Centralize into a reconciliation_log with alert rules.
  5. Automate remediation (safe replays, backfills).
  6. Review thresholds quarterly with domain owners.

Next steps

  • Extend to data contracts and schema evolution checks.
  • Add referential coverage checks to star schemas.
  • Trend metrics to detect silent drift early.

Mini challenge

You have a daily inventory snapshot (by product_id, warehouse_id) and a real-time stream of adjustments. Pick two reconciliation patterns to guarantee accuracy at end-of-day and explain how you would implement them with a window of the last 24 hours.

About the quick test

The quick test is available to everyone. Only logged-in users will see saved progress in their account.

Practice Exercises

2 exercises to complete

Instructions

You ingested orders for 2026-01-15. Source landing shows 125,000 rows and total_amount sum of 3,487,912.57. Your warehouse table orders_fact for the same date shows 124,998 rows and total_amount sum of 3,487,912.57.

  • Compute the differences (counts and sums).
  • Decide pass/fail with a currency tolerance of 0.01.
  • Write pseudo-SQL to compare and set status.
  • Propose a remediation step if it fails.
Expected Output
Fail due to count mismatch; sums equal within tolerance. Pseudo-SQL comparison and a remediation step (replay/backfill the specific window).

Reconciliation Patterns — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Reconciliation Patterns?

AI Assistant

Ask questions about this tool