How to learn Data Quality Frameworks for Analytics Engineer for free

Why Data Quality Frameworks matter for Analytics Engineers

Analytics Engineers turn raw data into trustworthy, modeled layers for reporting and machine learning. A Data Quality (DQ) framework makes that trust explicit with tests, SLAs, ownership, and alerting. It reduces fire drills, protects decision-making, and keeps stakeholders confident in dashboards and metrics.

Catch freshness issues before executives ask, “Why is today’s data missing?”
Prevent silent errors like duplicates, null spikes, or schema drift from corrupting KPIs.
Standardize incident triage so issues are resolved quickly with clear ownership.

What this unlocks in your day-to-day

Automated checks for freshness, volume, schema, uniqueness, and referential integrity
Quality dashboards that show health by domain
Data contracts so producers and consumers agree on expectations
SLAs and alerts to align reliability with business needs

Practical roadmap

Step 1 — Baseline checks

Implement freshness and volume checks on critical sources and models. Start with daily tables and top KPIs.

Step 2 — Validations

Add uniqueness, not null, and accepted values on primary keys and critical dimensions. Cover joins and metric denominators.

Step 3 — Anomaly detection

Layer simple seasonality-aware rules (day-of-week, rolling windows). Use robust stats (median/MAD) to handle outliers.

Step 4 — Reconciliation

Compare source vs warehouse row counts and totals on key pipelines. Add tolerances (e.g., <=0.5% drift).

Step 5 — SLAs, ownership, contracts

Define who owns what, expected delivery times, schema guarantees, backfill policies, and change management.

Step 6 — Dashboards and alerts

Publish a DQ dashboard by domain. Route alerts to owners with clear runbooks and escalation rules.

Worked examples

1) Freshness check (dbt sources)

version: 2
sources:
  - name: app
    tables:
      - name: events
        loaded_at_field: _loaded_at
        freshness:
          warn_after: {count: 2, period: hour}
          error_after: {count: 4, period: hour}

Explanation: If events are older than 2 hours, warn; older than 4 hours, fail the job. Tune by business criticality.

2) Volume check with percent change

-- Daily counts
with counts as (
  select
    date_trunc('day', event_time) as d,
    count(*) as n
  from raw.events
  group by 1
),
chg as (
  select
    d,
    n,
    lag(n) over (order by d) as prev_n,
    case when lag(n) over (order by d) = 0 then null
         else (n - lag(n) over (order by d))::float / nullif(lag(n) over (order by d),0) end as pct_change
  from counts
)
select *
from chg
where pct_change < -0.3; -- flag >30% drop

Explanation: Flags sudden drops; pair with day-of-week baselines to avoid false positives.

3) Duplicates and null spikes

-- SQL: find duplicates on order_id
select order_id, count(*)
from mart.orders
group by 1
having count(*) > 1;

-- dbt tests (schema.yml)
version: 2
models:
  - name: orders
    columns:
      - name: order_id
        tests:
          - not_null
          - unique

Explanation: Enforce uniqueness and non-nullness on primary keys. Investigate upstream aggregation if duplicates appear.

4) Robust anomaly detection (rolling median + MAD)

with series as (
  select d::date as d,
         coalesce(n, 0) as n
  from generate_series('2025-01-01', current_date, interval '1 day') d
  left join (
    select date_trunc('day', created_at)::date d, count(*) n
    from mart.signups
    group by 1
  ) s using(d)
), stats as (
  select
    d,
    n,
    percentile_disc(0.5) within group (order by n)
      over (order by d rows between 14 preceding and current row) as med,
    1.4826 * percentile_disc(0.5) within group (order by abs(n - median_n))
      over (order by d rows between 14 preceding and current row) as mad
  from (
    select d, n,
           percentile_disc(0.5) within group (order by n)
             over (order by d rows between 14 preceding and current row) as median_n
    from series
  ) t
)
select *
from stats
where mad > 0 and abs(n - med)/mad > 3; -- robust z-score > 3

Explanation: More stable than simple z-scores for skewed or small samples.

5) Reconciliation with tolerance

with src as (
  select count(*) as c, sum(amount) as s
  from landing.payments
  where processed_at::date = current_date - 1
), wh as (
  select count(*) as c, sum(amount) as s
  from mart.payments_clean
  where processed_at::date = current_date - 1
)
select
  abs(wh.c - src.c) as row_delta,
  abs(wh.s - src.s) / nullif(greatest(abs(src.s),1),0) as amt_pct_diff
from src, wh
having abs(wh.c - src.c) > 0 or abs(wh.s - src.s)/nullif(greatest(abs(src.s),1),0) > 0.005; -- >0.5%

Explanation: Flags row count mismatches and total drift beyond 0.5%.

6) Alert payload example

{
  "pipeline": "orders_daily",
  "check": "freshness",
  "status": "error",
  "observed_lag_minutes": 187,
  "threshold_minutes": 120,
  "owner": "data-orders@company",
  "runbook": "Confluence: Orders Freshness Runbook",
  "fingerprint": "orders_daily_freshness_v2"
}

Explanation: Include owner, thresholds, and a stable fingerprint for de-duplication.

Drills and exercises

Create a freshness check on your top 3 source tables with warn/error thresholds.
Add not_null and unique tests on 2 primary keys used in joins.
Implement a daily volume check with a 30% drop alert and a day-of-week baseline.
Build one reconciliation query comparing source vs warehouse counts and sums.
Define an SLA: “Events table freshness <= 2h during 6am–10pm local.”
Draft an ownership map: table → owner team → escalation channel.
Create a one-pager runbook for “Freshness incident on events”.
Publish a simple DQ dashboard: freshness, volume, test pass rate, and incident MTTR.

Common mistakes and debugging tips

Over-alerting with static thresholds

Symptom: Alerts every Monday due to weekend patterns. Fix: Use seasonality-aware baselines (day-of-week, rolling windows) and min sample sizes.

Ignoring null spikes in dimensions

Symptom: KPI denominators collapse. Fix: Add not_null on keys and accepted_values on critical dimensions; fail fast in staging.

No ownership or runbooks

Symptom: Incidents bounce between teams. Fix: Assign clear owners per domain/table and link to a short, actionable runbook.

Reconciliation without tolerances

Symptom: False positives from rounding or late-arriving data. Fix: Use percentage tolerances and time windows; allow small drift.

Missing change management

Symptom: Upstream schema changes break models. Fix: Data contracts with deprecation windows, versioning, and communication rules.

Mini project: Stand up a DQ framework for a core revenue model

Goal: Protect a mart.revenue_daily model consumed by Finance.

Requirements

Freshness: error if source payments freshness > 2 hours (business hours).
Volume: alert if day-over-day row count drops > 25% (use day-of-week baselines).
Validation: unique+not_null on payment_id; accepted_values on status.
Reconciliation: compare source vs mart totals within 0.5%.
Anomaly: robust median+MAD for revenue amount anomalies.
Ownership: assign owner, escalation, on-call hours; add runbook.
Dashboard: freshness, volume, pass rate, MTTR; last 14 days.

Acceptance criteria

All checks automated and scheduled.
Failing checks page an owner during on-call hours; otherwise post a ticket.
Dashboard shows green/yellow/red status with trends.

Subskills

Freshness and Volume Checks — Detect lateness and volume shifts early; set warn/error thresholds and business-hour windows.
Anomaly Detection Basics — Use rolling windows, seasonality, and robust statistics to reduce false positives.
Validations for Duplicates and Null Spikes — Enforce unique and not_null constraints on keys and critical dimensions.
Row Count and Reconciliation Checks — Compare source vs warehouse counts and sums with sensible tolerances.
Incident Triage and Root Cause — Standardize routing, diagnostics, and post-incident notes for faster resolution.
Data Quality SLAs and Ownership — Define who owns data, expectations for freshness, and escalation policies.
Quality Dashboards and Alerts — Visualize health, pass rates, and MTTR; route alerts with useful context.
Defining and Tracking Data Contracts — Specify schema, semantics, SLAs, and change management between producers and consumers.

Who this is for

Analytics Engineers responsible for reliable models and metrics
Data Engineers building ingestion and transformation pipelines
BI Developers maintaining dashboards used in decisions

Prerequisites

Comfortable with SQL (window functions, aggregations)
Familiar with your orchestration/testing tool (e.g., dbt, Airflow concepts)
Basic understanding of warehouse tables, marts, and KPI definitions

Learning path

Start with Freshness and Volume Checks.
Add Validations for Duplicates and Null Spikes.
Introduce Anomaly Detection Basics.
Implement Row Count and Reconciliation Checks.
Define SLAs and Ownership; create an incident playbook.
Publish Quality Dashboards and set Alerts.
Finalize with Data Contracts to prevent future issues.

Practical projects

Protect a signup funnel mart: freshness, volume, uniqueness, and a DQ dashboard.
Reconcile finance payments across source, staging, and mart with tolerances.
Draft and adopt a data contract for the events stream with SLAs and change control.

Next steps

Expand checks to more domains; prioritize by business impact.
Tune thresholds using recent history and seasonality.
Run monthly post-incident reviews to refine SLAs and tests.

Menu

Data Quality Frameworks

Table of Contents

Why Data Quality Frameworks matter for Analytics Engineers

Practical roadmap

Worked examples

Drills and exercises

Common mistakes and debugging tips

Mini project: Stand up a DQ framework for a core revenue model

Subskills

Who this is for

Prerequisites

Learning path

Practical projects

Next steps

Data Quality Frameworks — Skill Exam

Topics

Freshness And Volume Checks

Anomaly Detection Basics

Validations For Duplicates And Null Spikes

Row Count And Reconciliation Checks

Incident Triage And Root Cause

Data Quality SLAs And Ownership

Quality Dashboards And Alerts

Defining And Tracking Data Contracts

Have questions about Data Quality Frameworks?

AI Assistant