How to learn Monitoring Null Rates And Broken Joins for Data Quality Checks in BI Analyst for free

Who this is for

BI Analysts, Analytics Engineers, and Data Ops folks who build dashboards, write SQL, and need to keep reports trustworthy in production.

Prerequisites

Comfort with SQL SELECT, GROUP BY, JOIN
Basic understanding of star/snowflake schemas
Ability to read daily/partitioned tables (e.g., date columns)

Why this matters

Executives rely on your dashboards for decisions. Two silent data killers are:

Rising null rates: a critical column (e.g., customer_id, email) starts getting more NULLs or blank values.
Broken joins: keys stop matching between fact and dimension tables, creating orphans or duplicate amplifications.

Real BI tasks where this shows up:

Marketing funnel drops to zero because a join to campaign data broke yesterday.
Customer segmentation counts explode due to duplicate dimension keys.
New ETL release accidentally converts blanks to NULL (or vice versa) and metrics drift.

Concept explained simply

Monitoring null rates means tracking the percentage of missing values for important columns over time. Monitoring broken joins means checking if keys still match across tables and if join coverage and duplication behave as expected.

Mental model

Think of your data model like a Lego build:

Bricks with holes = NULLs. Too many holes and the build collapses.
Connectors that don’t fit = broken joins. Pieces don’t snap, so the structure can’t hold.

What counts as "null" in practice?

SQL NULL values
Empty strings and whitespace-only strings (use TRIM)
Sentinel values like 'UNKNOWN', 'N/A', '000000' (define your own list)

Normalize before measuring, e.g., NULLIF(TRIM(col), '') to treat blanks as NULL.

Worked examples

Example 1: Daily null rate for a required column

Goal: Track daily null rate for customer_email in table users_daily.

SELECT  dt,
        COUNT(*) AS total_rows,
        SUM(CASE WHEN NULLIF(TRIM(customer_email), '') IS NULL THEN 1 ELSE 0 END) AS null_rows,
        ROUND(100.0 * SUM(CASE WHEN NULLIF(TRIM(customer_email), '') IS NULL THEN 1 ELSE 0 END) / COUNT(*), 2) AS null_rate_pct
FROM users_daily
GROUP BY dt
ORDER BY dt;

Alert idea: Flag when null_rate_pct exceeds a rolling 7-day average by more than, say, 3 standard deviations.

Example 2: Detect orphan facts after a left join

Goal: Identify orders that no longer match a customer dimension row.

SELECT  o.dt,
        COUNT(*) AS orders_total,
        SUM(CASE WHEN c.customer_id IS NULL THEN 1 ELSE 0 END) AS orphan_orders
FROM orders o
LEFT JOIN dim_customers c
  ON o.customer_id = c.customer_id
GROUP BY o.dt
ORDER BY o.dt;

If orphan_orders spikes suddenly, your join is breaking (e.g., missing late-arriving dimension rows or key format changes).

Example 3: Join coverage and duplication health

Goal: Monitor how much of the fact table finds exactly one match in the dimension (no missing, no duplicates).

WITH dim AS (
  SELECT customer_id, COUNT(*) AS dim_dups
  FROM dim_customers
  GROUP BY customer_id
),
base AS (
  SELECT o.order_id,
         CASE WHEN d.customer_id IS NULL THEN 'orphan'
              WHEN d.dim_dups > 1 THEN 'duplicate_dim_match'
              ELSE 'single_match' END AS join_status,
         o.dt
  FROM orders o
  LEFT JOIN dim d ON o.customer_id = d.customer_id
)
SELECT dt,
       COUNT(*) AS orders_total,
       SUM(CASE WHEN join_status='single_match' THEN 1 ELSE 0 END) AS single_match_cnt,
       SUM(CASE WHEN join_status='orphan' THEN 1 ELSE 0 END) AS orphan_cnt,
       SUM(CASE WHEN join_status='duplicate_dim_match' THEN 1 ELSE 0 END) AS dup_match_cnt,
       ROUND(100.0 * SUM(CASE WHEN join_status='single_match' THEN 1 ELSE 0 END) / COUNT(*), 2) AS single_match_pct
FROM base
GROUP BY dt
ORDER BY dt;

A healthy model has high single_match_pct, with orphan_cnt and dup_match_cnt near zero.

Checklists

Null monitoring checklist

Normalize blanks to NULL (NULLIF(TRIM(col), ''))
Track null rate daily for key columns (IDs, emails, amounts)
Segment by important dimensions (e.g., source_system) to localize issues
Use rolling baselines and thresholds (e.g., 7-day mean ± 3σ)
Store results in an audit table for trend charts

Join health checklist

Monitor orphan rate (facts without a dim match)
Monitor duplicate matches (dim key duplicates)
Track single_match_pct over time
Validate key format and casing consistency
Watch for sudden row count changes in dimensions

Exercises

These mirror the graded exercises below. Try first, then open the solutions.

Exercise 1: Compute a daily null rate for a chosen column and flag days above a dynamic threshold.
Exercise 2: Find orphan facts after a left join and list top 5 sources causing orphans.

Tip: Use CTEs to keep queries readable.
Tip: Use TRIM and NULLIF to normalize values before counting NULLs.

Common mistakes and self-checks

Treating blanks as non-null: Self-check by counting TRIM(col) = '' occurrences.
Ignoring duplicates in dimensions: Self-check with COUNT(*) OVER (PARTITION BY dim_key).
Static thresholds only: Self-check by comparing to rolling averages; temporary spikes may be normal.
Not segmenting by source: Self-check by grouping nulls/orphans by source_system or ingestion pipeline.
Forgetting late-arriving dims: Self-check yesterday vs today orphan counts; if they “heal,” consider delayed joins or snapshot logic.

Practical projects

Project A: Build a data quality audit table

Create a table dq_null_rates(dt, table_name, column_name, null_rate_pct, total_rows).
Write a SQL job that inserts daily metrics for 3–5 critical columns.
Add a status column (OK/WARN/ALERT) based on rolling thresholds.

Project B: Join health score

Create a table dq_join_health(dt, fact, dim, orphan_pct, dup_match_pct, single_match_pct).
Populate daily from your core fact-to-dimension pairs.
Visualize trends in a lightweight dashboard to spot regressions.

Learning path

Review your data model: identify critical keys and required columns.
Implement null normalization and daily null rate checks.
Add join coverage checks (orphans, dup matches, single_match_pct).
Introduce rolling baselines with thresholds.
Automate inserts into an audit table; chart trends.

Mini challenge

Yesterday your single_match_pct dropped from 99.5% to 92%. In one paragraph, outline the 3 quickest checks you would run and the SQL you’d start with.

Note: The Quick Test is available to everyone; only logged-in users will have their progress saved.

Menu

Monitoring Null Rates And Broken Joins

Table of Contents