How to learn Freshness Completeness Accuracy Checks for Data Quality And Observability in Data Architect for free

Why this matters

As a Data Architect, you define data contracts and ensure downstream analytics, ML features, and dashboards are trustworthy. Freshness, completeness, and accuracy (F-C-A) are the core signals stakeholders rely on to decide: “Can I use this data right now?”

Set and enforce SLAs/SLOs for pipelines and tables.
Detect and triage incidents before business users do.
Design platform patterns so teams implement quality checks consistently.
Communicate data reliability in simple, objective terms.

Concept explained simply

Freshness: Is the data up-to-date? Measured as the delay between now (or expected arrival time) and the most recent data timestamp. Example metric: now() - max(event_time).
Completeness: Do we have all the data we expect? Measured as a percentage of expected records, partitions, or non-null fields that actually arrived. Example metric: received_rows / expected_rows.
Accuracy: Are the values correct? Checked against rules, trusted references, or reconciliations. Example: revenue_total in warehouse matches payments provider within 0.5% tolerance.

Mental model

Think of a three-light status panel:

Freshness: green if data arrived on time.
Completeness: green if expected scope is present.
Accuracy: green if values conform to truth or rules.

All three should be green for decision-grade data. If one is red, communicate which and why.

Set targets and thresholds

Step 1 — Define use-case windows

Ask consumers how recent the data must be (freshness), how much of it they need to trust a decision (completeness), and what error tolerance is acceptable (accuracy).

Step 2 — Turn needs into SLOs

Freshness SLO example: 95% of days, table is updated by 02:00 UTC.
Completeness SLO example: 99% of required partitions present by 03:00 UTC.
Accuracy SLO example: warehouse revenue within 0.3% of payments ledger daily.

Step 3 — Choose alert thresholds

Warning near the boundary (e.g., at 80–90% of max delay).
Critical beyond the boundary (e.g., exceeds SLA).

Step 4 — Document assumptions

Specify timezones, late-arriving data policies, expected partitions, and reference sources for accuracy.

Edge cases to plan for

Late-arriving events: allow a grace window or run backfills.
Sparse tables: use arrival cadence (e.g., weekly) instead of minutes.
Clock skew/timezones: standardize on UTC and consistent timestamp columns.
Changing business volumes: define completeness as a ratio, not an absolute count.

Worked examples

Example 1 — Daily sales table

Context: Retail sales are ingested nightly to fact_sales.

Freshness: Expect update by 02:00 UTC. Metric: 02:00 - max(sale_date_time). If delay > 30 min, warning; > 60 min, critical.
Completeness: Expected rows from POS logs vs rows loaded. Metric: loaded_rows / expected_rows. Warning at 98–99%; critical below 98%.
Accuracy: Reconcile revenue. Metric: |sum(fact_sales.amount) - payments_total| / payments_total. Critical if > 0.5%.

Sample SQL (conceptual)

-- Freshness (minutes)
SELECT EXTRACT(EPOCH FROM (NOW() - MAX(sale_date_time)))/60 AS lag_min FROM fact_sales WHERE sale_date = CURRENT_DATE;
-- Completeness
WITH expected AS (
  SELECT COUNT(*) AS c FROM pos_log WHERE sale_date = CURRENT_DATE
), loaded AS (
  SELECT COUNT(*) AS c FROM fact_sales WHERE sale_date = CURRENT_DATE
)
SELECT loaded.c::float/NULLIF(expected.c,0) AS completeness_ratio FROM loaded, expected;
-- Accuracy
WITH w AS (SELECT SUM(amount) AS amt FROM fact_sales WHERE sale_date = CURRENT_DATE),
     p AS (SELECT SUM(total)  AS amt FROM payments_ledger WHERE sale_date = CURRENT_DATE)
SELECT ABS(w.amt - p.amt)/NULLIF(p.amt,0) AS rel_diff;

Example 2 — Streaming events topic

Context: Kafka topic for clickstream, consumed into a bronze table.

Freshness: last_event_ts within 5 minutes of now. If > 5 min, critical.
Completeness: Expect N partitions per hour; verify each partition contains at least one file/message batch.
Accuracy: Enforce schema rules (e.g., country_code in known list); drop or quarantine violations.

Notes

Use watermarks to handle out-of-order events.
Track both ingestion_time and event_time to diagnose source vs pipeline delay.

Example 3 — Customer addresses

Context: Dimensions used by marketing and shipping.

Freshness: updated_at not older than 24 hours.
Completeness: required fields not null (street, city, postal_code, country). Threshold: 99.5% non-null.
Accuracy: postal_code format by country and city-country geocode consistency. Tolerance: reject mismatches; quarantine offending rows.

Practical tip

For accuracy without a single truth source, combine rule checks (formats, ranges) with reconciliation (counts, totals) and anomaly detection on distributions.

How to implement checks consistently

Where: close to data boundaries (ingestion), at key transformations, and before serving layers.
When: on schedule (batch) and on arrival windows (streaming) with watermarks.
What to store: raw metrics (lag minutes, ratio), thresholds, final status (pass/warn/fail), and run metadata (dataset, partition, timestamp).
How to alert: route critical to on-call; warnings to owners or daily digest. Include context and suggested first steps.

Recommended metrics to log per run

Freshness: max_timestamp, lag_minutes, window_start/end.
Completeness: expected, observed, ratio, keys checked (e.g., dates/partitions).
Accuracy: rule name, sample size, violations count, relative difference when reconciling.

Exercises

Try these before viewing solutions.

Exercise 1 — Compute freshness and classify

Given max(event_time) = 2025-04-10 01:13 UTC and current time = 2025-04-10 02:00 UTC. SLA: data updated within 45 minutes. Warning at 80% of SLA. Compute lag and status.

Exercise 2 — Partition completeness

You expect 24 hourly partitions for 2025-04-09 in table sensor_readings. Present partitions: 00–10 and 13–23 (missing 11,12). Set thresholds: warning if completeness < 99%, critical if < 95%. Compute completeness and status. Suggest one remediation step.

Checklist — Did you cover essentials?

Used UTC consistently.
Defined both SLO and alert thresholds.
Computed ratios, not just absolute counts.
Considered late data policy and grace windows.
Captured metrics for auditing and dashboards.

Common mistakes and self-check

Using load_time instead of business event_time for freshness. Self-check: compare both; track both to diagnose delay source.
Defining completeness as a fixed row count. Self-check: make it a ratio against an expectation source or partition list.
“Accuracy” equals “no nulls.” Self-check: add value rules and reconciliations against reference totals.
No tolerance bands. Self-check: define warn/crit to avoid alert fatigue.
Ignoring timezones. Self-check: normalize to UTC end-to-end.

Practical projects

Project A: Build a freshness dashboard for top 10 tables (lag minutes, trend, SLO attainment).
Project B: Implement partition completeness for one daily fact table, including expected partitions, ratio, and alerts.
Project C: Create an accuracy reconciliation job between warehouse revenue and an external ledger with 0.5% tolerance and nightly run.

Mini challenge

It is 03:10 UTC. For table orders_daily on 2025-05-01: max(order_ts)=02:35, expected rows=120k, loaded rows=116.4k, payments_total=$5,020,000, warehouse_total=$4,995,100. SLAs: freshness 45 min, completeness critical < 98%, accuracy critical > 0.5% diff. Classify each and name the likely cause for any failures.

Hint

Compute lag: now - max_ts.
Completeness: loaded/expected.
Accuracy: abs(diff)/reference.

Who this is for

Data Architects defining reliability standards.
Data Product Owners needing trust signals for stakeholders.

Prerequisites

Comfort with SQL and dataset schemas.
Basic understanding of batch and streaming pipelines.
Familiarity with SLAs/SLOs and monitoring concepts.

Learning path

Start: Freshness, Completeness, Accuracy checks (this page).
Next: Data contracts and schema evolution.
Then: Incident response playbooks and root cause analysis.
Finally: Reliability SLO dashboards and governance reviews.

Next steps

Implement one freshness and one completeness check on a critical table.
Add a single high-impact accuracy reconciliation.
Review thresholds after one week using observed distributions.

Quick test and progress

You can take the quick test below for free. Everyone can attempt it; only logged-in users will have their progress saved.

Menu

Freshness Completeness Accuracy Checks

Table of Contents

Why this matters

Concept explained simply

Set targets and thresholds

Worked examples

Example 1 — Daily sales table

Example 2 — Streaming events topic

Example 3 — Customer addresses

How to implement checks consistently

Exercises

Common mistakes and self-check

Practical projects

Mini challenge

Who this is for

Prerequisites

Learning path

Next steps

Quick test and progress

Practice Exercises

Compute freshness and classify

Instructions

Expected Output

Partition completeness

Freshness Completeness Accuracy Checks — Quick Test

Have questions about Freshness Completeness Accuracy Checks?

AI Assistant