Why this matters
As a Data Architect, you define data contracts and ensure downstream analytics, ML features, and dashboards are trustworthy. Freshness, completeness, and accuracy (F-C-A) are the core signals stakeholders rely on to decide: “Can I use this data right now?”
- Set and enforce SLAs/SLOs for pipelines and tables.
- Detect and triage incidents before business users do.
- Design platform patterns so teams implement quality checks consistently.
- Communicate data reliability in simple, objective terms.
Concept explained simply
- Freshness: Is the data up-to-date? Measured as the delay between now (or expected arrival time) and the most recent data timestamp. Example metric: now() - max(event_time).
- Completeness: Do we have all the data we expect? Measured as a percentage of expected records, partitions, or non-null fields that actually arrived. Example metric: received_rows / expected_rows.
- Accuracy: Are the values correct? Checked against rules, trusted references, or reconciliations. Example: revenue_total in warehouse matches payments provider within 0.5% tolerance.
Mental model
Think of a three-light status panel:
- Freshness: green if data arrived on time.
- Completeness: green if expected scope is present.
- Accuracy: green if values conform to truth or rules.
All three should be green for decision-grade data. If one is red, communicate which and why.
Set targets and thresholds
Ask consumers how recent the data must be (freshness), how much of it they need to trust a decision (completeness), and what error tolerance is acceptable (accuracy).
- Freshness SLO example: 95% of days, table is updated by 02:00 UTC.
- Completeness SLO example: 99% of required partitions present by 03:00 UTC.
- Accuracy SLO example: warehouse revenue within 0.3% of payments ledger daily.
- Warning near the boundary (e.g., at 80–90% of max delay).
- Critical beyond the boundary (e.g., exceeds SLA).
Specify timezones, late-arriving data policies, expected partitions, and reference sources for accuracy.
Edge cases to plan for
- Late-arriving events: allow a grace window or run backfills.
- Sparse tables: use arrival cadence (e.g., weekly) instead of minutes.
- Clock skew/timezones: standardize on UTC and consistent timestamp columns.
- Changing business volumes: define completeness as a ratio, not an absolute count.
Worked examples
Example 1 — Daily sales table
Context: Retail sales are ingested nightly to fact_sales.
- Freshness: Expect update by 02:00 UTC. Metric: 02:00 - max(sale_date_time). If delay > 30 min, warning; > 60 min, critical.
- Completeness: Expected rows from POS logs vs rows loaded. Metric: loaded_rows / expected_rows. Warning at 98–99%; critical below 98%.
- Accuracy: Reconcile revenue. Metric: |sum(fact_sales.amount) - payments_total| / payments_total. Critical if > 0.5%.
Sample SQL (conceptual)
-- Freshness (minutes)
SELECT EXTRACT(EPOCH FROM (NOW() - MAX(sale_date_time)))/60 AS lag_min FROM fact_sales WHERE sale_date = CURRENT_DATE;
-- Completeness
WITH expected AS (
SELECT COUNT(*) AS c FROM pos_log WHERE sale_date = CURRENT_DATE
), loaded AS (
SELECT COUNT(*) AS c FROM fact_sales WHERE sale_date = CURRENT_DATE
)
SELECT loaded.c::float/NULLIF(expected.c,0) AS completeness_ratio FROM loaded, expected;
-- Accuracy
WITH w AS (SELECT SUM(amount) AS amt FROM fact_sales WHERE sale_date = CURRENT_DATE),
p AS (SELECT SUM(total) AS amt FROM payments_ledger WHERE sale_date = CURRENT_DATE)
SELECT ABS(w.amt - p.amt)/NULLIF(p.amt,0) AS rel_diff;Example 2 — Streaming events topic
Context: Kafka topic for clickstream, consumed into a bronze table.
- Freshness: last_event_ts within 5 minutes of now. If > 5 min, critical.
- Completeness: Expect N partitions per hour; verify each partition contains at least one file/message batch.
- Accuracy: Enforce schema rules (e.g., country_code in known list); drop or quarantine violations.
Notes
- Use watermarks to handle out-of-order events.
- Track both ingestion_time and event_time to diagnose source vs pipeline delay.
Example 3 — Customer addresses
Context: Dimensions used by marketing and shipping.
- Freshness: updated_at not older than 24 hours.
- Completeness: required fields not null (street, city, postal_code, country). Threshold: 99.5% non-null.
- Accuracy: postal_code format by country and city-country geocode consistency. Tolerance: reject mismatches; quarantine offending rows.
Practical tip
For accuracy without a single truth source, combine rule checks (formats, ranges) with reconciliation (counts, totals) and anomaly detection on distributions.
How to implement checks consistently
- Where: close to data boundaries (ingestion), at key transformations, and before serving layers.
- When: on schedule (batch) and on arrival windows (streaming) with watermarks.
- What to store: raw metrics (lag minutes, ratio), thresholds, final status (pass/warn/fail), and run metadata (dataset, partition, timestamp).
- How to alert: route critical to on-call; warnings to owners or daily digest. Include context and suggested first steps.
Recommended metrics to log per run
- Freshness: max_timestamp, lag_minutes, window_start/end.
- Completeness: expected, observed, ratio, keys checked (e.g., dates/partitions).
- Accuracy: rule name, sample size, violations count, relative difference when reconciling.
Exercises
Try these before viewing solutions.
Given max(event_time) = 2025-04-10 01:13 UTC and current time = 2025-04-10 02:00 UTC. SLA: data updated within 45 minutes. Warning at 80% of SLA. Compute lag and status.
You expect 24 hourly partitions for 2025-04-09 in table sensor_readings. Present partitions: 00–10 and 13–23 (missing 11,12). Set thresholds: warning if completeness < 99%, critical if < 95%. Compute completeness and status. Suggest one remediation step.
Checklist — Did you cover essentials?
- Used UTC consistently.
- Defined both SLO and alert thresholds.
- Computed ratios, not just absolute counts.
- Considered late data policy and grace windows.
- Captured metrics for auditing and dashboards.
Common mistakes and self-check
- Using load_time instead of business event_time for freshness. Self-check: compare both; track both to diagnose delay source.
- Defining completeness as a fixed row count. Self-check: make it a ratio against an expectation source or partition list.
- “Accuracy” equals “no nulls.” Self-check: add value rules and reconciliations against reference totals.
- No tolerance bands. Self-check: define warn/crit to avoid alert fatigue.
- Ignoring timezones. Self-check: normalize to UTC end-to-end.
Practical projects
- Project A: Build a freshness dashboard for top 10 tables (lag minutes, trend, SLO attainment).
- Project B: Implement partition completeness for one daily fact table, including expected partitions, ratio, and alerts.
- Project C: Create an accuracy reconciliation job between warehouse revenue and an external ledger with 0.5% tolerance and nightly run.
Mini challenge
It is 03:10 UTC. For table orders_daily on 2025-05-01: max(order_ts)=02:35, expected rows=120k, loaded rows=116.4k, payments_total=$5,020,000, warehouse_total=$4,995,100. SLAs: freshness 45 min, completeness critical < 98%, accuracy critical > 0.5% diff. Classify each and name the likely cause for any failures.
Hint
- Compute lag: now - max_ts.
- Completeness: loaded/expected.
- Accuracy: abs(diff)/reference.
Who this is for
- Data Architects defining reliability standards.
- Data Product Owners needing trust signals for stakeholders.
Prerequisites
- Comfort with SQL and dataset schemas.
- Basic understanding of batch and streaming pipelines.
- Familiarity with SLAs/SLOs and monitoring concepts.
Learning path
- Start: Freshness, Completeness, Accuracy checks (this page).
- Next: Data contracts and schema evolution.
- Then: Incident response playbooks and root cause analysis.
- Finally: Reliability SLO dashboards and governance reviews.
Next steps
- Implement one freshness and one completeness check on a critical table.
- Add a single high-impact accuracy reconciliation.
- Review thresholds after one week using observed distributions.
Quick test and progress
You can take the quick test below for free. Everyone can attempt it; only logged-in users will have their progress saved.