luvv to helpDiscover the Best Free Online Tools
Topic 4 of 8

Monitoring Null Rates And Broken Joins

Learn Monitoring Null Rates And Broken Joins for free with explanations, exercises, and a quick test (for BI Analyst).

Published: December 22, 2025 | Updated: December 22, 2025

Who this is for

BI Analysts, Analytics Engineers, and Data Ops folks who build dashboards, write SQL, and need to keep reports trustworthy in production.

Prerequisites

  • Comfort with SQL SELECT, GROUP BY, JOIN
  • Basic understanding of star/snowflake schemas
  • Ability to read daily/partitioned tables (e.g., date columns)

Why this matters

Executives rely on your dashboards for decisions. Two silent data killers are:

  • Rising null rates: a critical column (e.g., customer_id, email) starts getting more NULLs or blank values.
  • Broken joins: keys stop matching between fact and dimension tables, creating orphans or duplicate amplifications.

Real BI tasks where this shows up:

  • Marketing funnel drops to zero because a join to campaign data broke yesterday.
  • Customer segmentation counts explode due to duplicate dimension keys.
  • New ETL release accidentally converts blanks to NULL (or vice versa) and metrics drift.

Concept explained simply

Monitoring null rates means tracking the percentage of missing values for important columns over time. Monitoring broken joins means checking if keys still match across tables and if join coverage and duplication behave as expected.

Mental model

Think of your data model like a Lego build:

  • Bricks with holes = NULLs. Too many holes and the build collapses.
  • Connectors that don’t fit = broken joins. Pieces don’t snap, so the structure can’t hold.
What counts as "null" in practice?
  • SQL NULL values
  • Empty strings and whitespace-only strings (use TRIM)
  • Sentinel values like 'UNKNOWN', 'N/A', '000000' (define your own list)

Normalize before measuring, e.g., NULLIF(TRIM(col), '') to treat blanks as NULL.

Worked examples

Example 1: Daily null rate for a required column

Goal: Track daily null rate for customer_email in table users_daily.

SELECT  dt,
        COUNT(*) AS total_rows,
        SUM(CASE WHEN NULLIF(TRIM(customer_email), '') IS NULL THEN 1 ELSE 0 END) AS null_rows,
        ROUND(100.0 * SUM(CASE WHEN NULLIF(TRIM(customer_email), '') IS NULL THEN 1 ELSE 0 END) / COUNT(*), 2) AS null_rate_pct
FROM users_daily
GROUP BY dt
ORDER BY dt;

Alert idea: Flag when null_rate_pct exceeds a rolling 7-day average by more than, say, 3 standard deviations.

Example 2: Detect orphan facts after a left join

Goal: Identify orders that no longer match a customer dimension row.

SELECT  o.dt,
        COUNT(*) AS orders_total,
        SUM(CASE WHEN c.customer_id IS NULL THEN 1 ELSE 0 END) AS orphan_orders
FROM orders o
LEFT JOIN dim_customers c
  ON o.customer_id = c.customer_id
GROUP BY o.dt
ORDER BY o.dt;

If orphan_orders spikes suddenly, your join is breaking (e.g., missing late-arriving dimension rows or key format changes).

Example 3: Join coverage and duplication health

Goal: Monitor how much of the fact table finds exactly one match in the dimension (no missing, no duplicates).

WITH dim AS (
  SELECT customer_id, COUNT(*) AS dim_dups
  FROM dim_customers
  GROUP BY customer_id
),
base AS (
  SELECT o.order_id,
         CASE WHEN d.customer_id IS NULL THEN 'orphan'
              WHEN d.dim_dups > 1 THEN 'duplicate_dim_match'
              ELSE 'single_match' END AS join_status,
         o.dt
  FROM orders o
  LEFT JOIN dim d ON o.customer_id = d.customer_id
)
SELECT dt,
       COUNT(*) AS orders_total,
       SUM(CASE WHEN join_status='single_match' THEN 1 ELSE 0 END) AS single_match_cnt,
       SUM(CASE WHEN join_status='orphan' THEN 1 ELSE 0 END) AS orphan_cnt,
       SUM(CASE WHEN join_status='duplicate_dim_match' THEN 1 ELSE 0 END) AS dup_match_cnt,
       ROUND(100.0 * SUM(CASE WHEN join_status='single_match' THEN 1 ELSE 0 END) / COUNT(*), 2) AS single_match_pct
FROM base
GROUP BY dt
ORDER BY dt;

A healthy model has high single_match_pct, with orphan_cnt and dup_match_cnt near zero.

Checklists

Null monitoring checklist

  • Normalize blanks to NULL (NULLIF(TRIM(col), ''))
  • Track null rate daily for key columns (IDs, emails, amounts)
  • Segment by important dimensions (e.g., source_system) to localize issues
  • Use rolling baselines and thresholds (e.g., 7-day mean ± 3σ)
  • Store results in an audit table for trend charts

Join health checklist

  • Monitor orphan rate (facts without a dim match)
  • Monitor duplicate matches (dim key duplicates)
  • Track single_match_pct over time
  • Validate key format and casing consistency
  • Watch for sudden row count changes in dimensions

Exercises

These mirror the graded exercises below. Try first, then open the solutions.

  1. Exercise 1: Compute a daily null rate for a chosen column and flag days above a dynamic threshold.
  2. Exercise 2: Find orphan facts after a left join and list top 5 sources causing orphans.
  • Tip: Use CTEs to keep queries readable.
  • Tip: Use TRIM and NULLIF to normalize values before counting NULLs.

Common mistakes and self-checks

  • Treating blanks as non-null: Self-check by counting TRIM(col) = '' occurrences.
  • Ignoring duplicates in dimensions: Self-check with COUNT(*) OVER (PARTITION BY dim_key).
  • Static thresholds only: Self-check by comparing to rolling averages; temporary spikes may be normal.
  • Not segmenting by source: Self-check by grouping nulls/orphans by source_system or ingestion pipeline.
  • Forgetting late-arriving dims: Self-check yesterday vs today orphan counts; if they “heal,” consider delayed joins or snapshot logic.

Practical projects

Project A: Build a data quality audit table

  1. Create a table dq_null_rates(dt, table_name, column_name, null_rate_pct, total_rows).
  2. Write a SQL job that inserts daily metrics for 3–5 critical columns.
  3. Add a status column (OK/WARN/ALERT) based on rolling thresholds.

Project B: Join health score

  1. Create a table dq_join_health(dt, fact, dim, orphan_pct, dup_match_pct, single_match_pct).
  2. Populate daily from your core fact-to-dimension pairs.
  3. Visualize trends in a lightweight dashboard to spot regressions.

Learning path

  1. Review your data model: identify critical keys and required columns.
  2. Implement null normalization and daily null rate checks.
  3. Add join coverage checks (orphans, dup matches, single_match_pct).
  4. Introduce rolling baselines with thresholds.
  5. Automate inserts into an audit table; chart trends.

Mini challenge

Yesterday your single_match_pct dropped from 99.5% to 92%. In one paragraph, outline the 3 quickest checks you would run and the SQL you’d start with.

Note: The Quick Test is available to everyone; only logged-in users will have their progress saved.

Practice Exercises

2 exercises to complete

Instructions

You have users_daily(dt, customer_email). Compute daily null_rate_pct and flag days where null_rate_pct exceeds the 7-day rolling average by more than 3 standard deviations.

Output columns: dt, null_rate_pct, rolling_mean_7d, rolling_stddev_7d, is_alert.

Assume your SQL engine supports window functions and that dates have no gaps; if gaps exist, treat missing days as zero rows.

Expected Output
A result set where is_alert is TRUE only on anomalous days, for example: dt=2025-05-10 null_rate_pct=8.2 rolling_mean_7d=2.1 rolling_stddev_7d=1.5 is_alert=TRUE

Monitoring Null Rates And Broken Joins — Quick Test

Test your knowledge with 6 questions. Pass with 70% or higher.

6 questions70% to pass

Have questions about Monitoring Null Rates And Broken Joins?

AI Assistant

Ask questions about this tool