Table of Contents
Example 2 — Completeness checks: null rates and partition presence
Exercises you can run
These mirror the exercises below. You can complete them using your warehouse. The quick test at the bottom is available to everyone; sign in to save your progress.
Exercise 1 — Table freshness snapshot
Write a query that, for a list of tables, emits freshness in minutes and a PASS/FAIL based on an SLA column.
- Inputs: table_name, event_ts_column, expected_interval_hours
- Output: table_name, max_event_ts, freshness_minutes, status
Hint
Exercise 2 — Volume anomaly detector
Compute daily row counts for the last 30 days and flag the most recent day as ALERT if it is outside a ±35% band around a 7-day rolling median.
- Inputs: dataset and date column
- Output: dt, cnt, baseline_median, status
Hint
Operational checklist
- Each critical dataset has freshness, completeness, and volume checks
- Partition-level checks for daily/hourly tables
- Metrics land in a centralized table with dataset ownership
- Alerts include runbook links and suppression rules for known windows
- Dynamic thresholds for seasonality; static thresholds for strict SLAs
Common mistakes and self-check
- Using load_time when event_time is available. Self-check: compare both and quantify skew.
- Only table-level checks. Self-check: do partitions exist for the last N days?
- Static thresholds on seasonal data. Self-check: examine weekly cycles before setting bounds.
- Alert storms. Self-check: group alerts per dataset and use cool-down periods.
- Ignoring small null rates. Self-check: trend nulls; small drifts predict future breaks.
Practical projects
- Build a metrics table and a daily job that records freshness, completeness (PK null rate), and volume for your top 10 tables.
- Create a dashboard showing last 14 days of metrics with red/amber/green status.
- Implement dynamic volume bands using rolling medians and compare vs a static threshold approach.
Learning path
- Start: Implement basic freshness checks for your ingestion layer.
- Next: Add completeness metrics (PK nulls, partition presence).
- Then: Add volume checks with dynamic baselines and alerting.
- Advanced: Per-dimension completeness and schema drift monitoring.
Next steps
- Automate metrics collection post-pipeline runs
- Adopt runbooks for top failure modes
- Roll out to all critical datasets and review thresholds monthly
Mini challenge
Your marketing_events table is hourly. Yesterday 18:00–21:00 shows normal counts but 22:00 is 0. Draft the minimal queries and thresholds to catch this with freshness, completeness, and volume checks. Hint: partition-level freshness and a per-hour volume band will expose the anomaly.