luvv to helpDiscover the Best Free Online Tools
Topic 8 of 8

Data Validation Checks

Learn Data Validation Checks for free with explanations, exercises, and a quick test (for Data Visualization Engineer).

Published: December 28, 2025 | Updated: December 28, 2025

Why this matters

Data Visualization Engineers rely on trustworthy data to build accurate dashboards. Data validation checks in SQL catch issues early: missing rows, broken joins, wrong aggregations, and unexpected values. With a few repeatable queries, you can prevent misleading charts, reduce rework, and build stakeholder confidence.

Who this is for

  • Data Visualization Engineers and BI developers building dashboards and metrics.
  • Analysts who prepare SQL datasets for reporting.
  • Anyone needing a practical checklist to verify data correctness before publishing visuals.

Prerequisites

  • Basic SQL: SELECT, WHERE, GROUP BY, JOIN.
  • Comfort with reading table schemas and primary/foreign keys.

Concept explained simply

Data validation checks are small SQL queries that test assumptions about your dataset. Think of them as unit tests for your tables: do counts match expectations, are keys unique, and are values within valid ranges?

Mental model

  • Shape: Does the dataset have the expected number of rows and columns?
  • Keys: Are identifiers unique and do foreign keys match dimensions?
  • Ranges: Are dates, numbers, and categories within valid bounds?
  • Aggregates: Do totals, averages, and proportions reconcile with sources?
  • Freshness: Is the latest data present and on time?

Practical validation checklist

  • Row count comparison (source vs. report table)
  • NULL and empties check for critical fields
  • Uniqueness of primary keys
  • Referential integrity (no orphaned foreign keys)
  • Range checks (dates, amounts, percentages)
  • Distribution shifts (mean/median vs. last period)
  • Aggregation reconciliation (totals vs. authoritative source)
  • Freshness (max date within expected window)

Worked examples

1) Row count and total reconciliation

Compare yesterday's orders loaded to your reporting table against the source.

-- Expected: counts and totals match or known, documented differences
WITH src AS (
  SELECT COUNT(*) AS c, SUM(order_amount) AS s
  FROM raw.orders
  WHERE order_date = CURRENT_DATE - INTERVAL '1 day'
), rpt AS (
  SELECT COUNT(*) AS c, SUM(order_amount) AS s
  FROM mart.orders_report
  WHERE order_date = CURRENT_DATE - INTERVAL '1 day'
)
SELECT src.c AS src_count, rpt.c AS rpt_count,
       src.s AS src_sum,   rpt.s AS rpt_sum,
       (rpt.c - src.c) AS diff_count,
       (rpt.s - src.s) AS diff_sum
FROM src, rpt;
Why it works

Row count checks catch missing or duplicated rows; total reconciliation reveals partial loads or aggregation errors.

2) Referential integrity (orphan keys)

Ensure each fact order references an existing customer.

SELECT f.customer_id, COUNT(*) AS orphan_rows
FROM mart.orders_report f
LEFT JOIN dim.customers d ON f.customer_id = d.customer_id
WHERE d.customer_id IS NULL
GROUP BY f.customer_id
ORDER BY orphan_rows DESC;
Why it works

LEFT JOIN + NULL filter surfaces foreign keys with no matching dimension row.

3) Uniqueness and range checks

Verify primary key uniqueness and reasonable value ranges.

-- Duplicate primary keys
SELECT order_id, COUNT(*) AS n
FROM mart.orders_report
GROUP BY order_id
HAVING COUNT(*) > 1;

-- Numeric ranges (no negatives, sane upper bound)
SELECT
  SUM(CASE WHEN order_amount < 0 THEN 1 ELSE 0 END) AS negatives,
  SUM(CASE WHEN order_amount > 100000 THEN 1 ELSE 0 END) AS extreme_high
FROM mart.orders_report;

-- Freshness (latest record date)
SELECT MAX(order_date) AS max_date FROM mart.orders_report;
Why it works

Uniqueness protects against double-counting, range checks catch data entry or pipeline errors, and freshness flags late or missing loads.

Step-by-step: add checks to your workflow

  1. List critical metrics, keys, and expected ranges for your dashboard.
  2. Write small SQL queries for each assumption (count, sum, uniqueness, FK joins).
  3. Run checks on a known-good day to baseline expected results.
  4. Automate: save queries and run before publishing or scheduling refreshes.
  5. Document known differences and thresholds for alerts.

Exercises

Do these in your SQL editor. Compare your results with the expected outputs. Use the hints if you get stuck.

Exercise 1 — Daily validation across source and report (mirrors EX1)

You maintain two tables: raw.orders (source) and mart.orders_report (dashboard-ready). Validate yesterday's load.

  • Compare row counts and total order_amount for order_date = CURRENT_DATE - 1.
  • Return differences for both count and sum in one result row.
Exercise 2 — Find orphaned orders (mirrors EX2)

Check that all orders in mart.orders_report have a matching customer_id in dim.customers.

  • Write a query to list customer_id values in facts that do not exist in the dimension.
  • Include a count of orphan rows per customer_id, sorted by count desc.

Self-check checklist

  • I can validate row counts and totals between source and report tables.
  • I can detect orphaned foreign keys using a LEFT JOIN.
  • I can check uniqueness of primary keys and investigate duplicates.
  • I can verify ranges and freshness for key fields.

Common mistakes and how to catch them

Comparing different filters

Ensure both sides use the same date filter and status flags. Mismatched WHERE clauses create false alarms.

Counting after JOINs

A JOIN can duplicate rows. For reconciliation, compare base tables first, then separately validate joins for FKs.

Ignoring NULL semantics

Use COALESCE where appropriate and explicitly count NULLs in critical fields to spot missing data.

Forgetting distinctness

When checking uniqueness, use GROUP BY and HAVING rather than a DISTINCT SELECT which hides duplicates.

No baseline or thresholds

Record a baseline from a known-good day and define acceptable deltas (e.g., ±1%). Not all differences indicate errors.

Practical projects

  • Build a reusable SQL validation script that outputs a single row of key checks (counts, sums, max date, duplicate count).
  • Create a small "data quality" table where you insert daily check results with timestamps for easy monitoring.
  • Design a dashboard tile that displays validation KPIs (row diff, sum diff, freshness) from your quality table.

Learning path

  • Start: Row counts and sums → Keys and referential integrity → Ranges and freshness → Distribution checks (optional) → Automate.
  • Next skill: Aggregations and window functions to deepen reconciliation techniques.

Mini challenge

Your product manager reports a sudden drop in daily revenue on your dashboard. In 10 minutes, run three checks to isolate the issue:

  • Row count and sum reconciliation (source vs. report) for yesterday.
  • Orphan check on customer_id and product_id.
  • Freshness check on both dimension and fact tables.
Suggested approach

Start with base counts and sums. If they match, check joins for orphans. If joins are OK, verify max dates to ensure all tables refreshed.

Next steps

  • Turn your best checks into saved queries or views and run before publishing dashboards.
  • Document known differences and acceptable thresholds in your team wiki or dataset description.
  • Practice on a new dataset each week to build speed and reliability.

Quick Test

Take the quick test to check your understanding. Available to everyone for free; only logged-in users have results saved.

Practice Exercises

2 exercises to complete

Instructions

You maintain two tables: raw.orders (source) and mart.orders_report (report). Compare yesterday's data.

  • Return source and report counts and sums of order_amount for order_date = CURRENT_DATE - 1.
  • Include diff_count and diff_sum columns.
Expected Output
One row with src_count, rpt_count, src_sum, rpt_sum, diff_count, diff_sum. Differences should be 0 on a clean load.

Data Validation Checks — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Data Validation Checks?

AI Assistant

Ask questions about this tool