Validating Results With Sanity Checks

Learn Validating Results With Sanity Checks for free with explanations, exercises, and a quick test (for Business Analyst).

Published: December 20, 2025 | Updated: December 20, 2025

Why this matters

As a Business Analyst, your insights drive decisions. A solid query can still be wrong if the result is mis-joined, double-counted, filtered incorrectly, or off by a simple oversight. Sanity checks are fast, lightweight tests you run after writing SQL to ensure results are believable and safe to present.

Verify that revenue totals match expectations after joins.
Confirm counts before/after filters or joins didn’t multiply rows.
Ensure percentages sum to ~100% and no negatives appear in amounts that should be positive.
Catch missing dates, unexpected spikes/drops, and odd null rates.

Concept explained simply

Sanity checks are quick, approximate validations you run right after a query. They don’t prove the query is perfect; they prove it’s not obviously wrong. Think of them as smoke alarms: simple but powerful.

Mental model

Three-lens review: Input (source tables) → Transform (joins, filters, group-bys) → Output (numbers and shape). Run checks at each lens.
Invariants: Things that should always hold true (e.g., order_id uniqueness, non-negative paid amounts).
Reconciliation: Compare two ways of computing the same number, or compare to a known baseline.

Core sanity checks you can apply today

Row count continuity: Compare counts before/after joins and filters.

-- Before join
SELECT COUNT(*) AS orders_cnt FROM orders WHERE status = 'paid';
-- After join
SELECT COUNT(*) AS joined_cnt
FROM orders o
JOIN order_items i ON o.order_id = i.order_id
WHERE o.status = 'paid';

Expect joined_cnt ≥ orders_cnt. If much larger, check for duplicate inflation.

Primary key uniqueness:

SELECT order_id, COUNT(*) AS c
FROM orders
GROUP BY order_id
HAVING COUNT(*) > 1;

No rows should return if order_id is a true key.

Join duplication trap: If joining to a many-side table, aggregate first:

WITH items AS (
  SELECT order_id, SUM(item_amount) AS items_total
  FROM order_items
  GROUP BY order_id
)
SELECT o.order_id, o.total_amount, items.items_total
FROM orders o
LEFT JOIN items ON o.order_id = items.order_id;

Range and sign checks:

-- Negative amounts that shouldn't be negative
SELECT * FROM orders WHERE status = 'paid' AND total_amount < 0;

Percent-of-total sanity:

WITH by_country AS (
  SELECT country, SUM(total_amount) AS amt
  FROM orders o JOIN users u ON o.user_id = u.user_id
  WHERE o.status = 'paid'
  GROUP BY country
), total AS (
  SELECT SUM(amt) AS t FROM by_country
)
SELECT SUM(ROUND(100.0 * amt / t, 2)) AS percent_sum
FROM by_country, total;

Expect percent_sum ≈ 100 (small rounding tolerance is okay).

Time continuity: look for missing dates or extreme spikes.

SELECT order_date::date AS d, SUM(total_amount) AS revenue
FROM orders
WHERE status = 'paid'
GROUP BY d
ORDER BY d;

Scan for gaps or shocking changes.

Null-rate inspection:

SELECT
  SUM(CASE WHEN coupon_code IS NULL THEN 1 ELSE 0 END) * 1.0 / COUNT(*) AS null_rate
FROM orders;

Sudden jumps in null_rate can signal pipeline issues.

Two-method reconciliation:

-- Method A: sum of order totals
SELECT SUM(total_amount) FROM orders WHERE status = 'paid';
-- Method B: sum of item_amounts
SELECT SUM(i.item_amount)
FROM orders o JOIN order_items i ON o.order_id = i.order_id
WHERE o.status = 'paid';

The two sums should match (allow small rounding differences).

Tip: A quick review checklist

Row counts before vs after joins look reasonable
Keys unique where expected
No unexpected negatives or out-of-range values
Percentages sum to ~100%
Time series has no gaps, spikes are explainable
Recomputed totals match via a second method
Null rates plausible; sudden shifts investigated

Worked examples

Example 1: Detect duplicate inflation after a join

Goal: Revenue by day from orders joined to order_items.

-- 1) Baseline count
SELECT COUNT(*) AS paid_orders
FROM orders
WHERE status = 'paid';

-- 2) After naive join (likely duplicates)
SELECT COUNT(*) AS rows_after_join
FROM orders o
JOIN order_items i ON o.order_id = i.order_id
WHERE o.status = 'paid';

-- 3) Diagnose order-level duplication
SELECT o.order_id, COUNT(*) AS row_c
FROM orders o JOIN order_items i ON o.order_id = i.order_id
WHERE o.status = 'paid'
GROUP BY o.order_id
HAVING COUNT(*) > 1
ORDER BY row_c DESC
LIMIT 10;

If many order_ids have row_c > 1, aggregate items first:

WITH items AS (
  SELECT order_id, SUM(item_amount) AS items_total
  FROM order_items
  GROUP BY order_id
)
SELECT o.order_date::date AS d, SUM(items.items_total) AS revenue
FROM orders o
JOIN items ON o.order_id = items.order_id
WHERE o.status = 'paid'
GROUP BY d
ORDER BY d;

Example 2: Percent-of-total sanity

Goal: Country revenue shares sum to ~100%.

WITH by_country AS (
  SELECT u.country, SUM(o.total_amount) AS amt
  FROM orders o
  JOIN users u ON o.user_id = u.user_id
  WHERE o.status = 'paid'
  GROUP BY u.country
), total AS (
  SELECT SUM(amt) AS t FROM by_country
)
SELECT country,
       ROUND(100.0 * amt / t, 2) AS pct
FROM by_country, total
ORDER BY pct DESC;

-- Sanity: do the parts sum to ~100?
WITH by_country AS (
  SELECT u.country, SUM(o.total_amount) AS amt
  FROM orders o
  JOIN users u ON o.user_id = u.user_id
  WHERE o.status = 'paid'
  GROUP BY u.country
), total AS (
  SELECT SUM(amt) AS t FROM by_country
)
SELECT SUM(ROUND(100.0 * amt / t, 2)) AS percent_sum
FROM by_country, total;

Investigate if percent_sum is far from 100 due to rounding or missing categories.

Example 3: Two-method revenue reconciliation

Goal: Sum by orders vs sum of items.

-- Method A
SELECT SUM(total_amount) AS a
FROM orders
WHERE status = 'paid';

-- Method B
SELECT SUM(i.item_amount) AS b
FROM orders o
JOIN order_items i ON o.order_id = i.order_id
WHERE o.status = 'paid';

-- Compare difference
WITH a AS (
  SELECT SUM(total_amount) AS v
  FROM orders WHERE status = 'paid'
), b AS (
  SELECT SUM(i.item_amount) AS v
  FROM orders o JOIN order_items i ON o.order_id = i.order_id
  WHERE o.status = 'paid'
)
SELECT (SELECT v FROM a) - (SELECT v FROM b) AS diff;

Expect diff ≈ 0. If not, check missing items, returns, or rounding.

Common mistakes and how to self-check

Counting rows after joining to a many-side table. Fix: Aggregate first or count DISTINCT keys.
Filtering on the wrong side of a LEFT JOIN (turns it into INNER). Fix: Put child-table filters in the ON clause.
Ignoring rounding tolerances. Fix: Allow a small epsilon when comparing sums.
Overlooking time gaps. Fix: Always scan daily counts for missing dates or sharp anomalies.
Assuming keys are unique. Fix: Explicitly test for duplicates with GROUP BY ... HAVING COUNT(*) > 1.

Quick self-check routine

Run key uniqueness checks.
Compare pre- and post-join counts.
Check negative/out-of-range values.
Scan time series for gaps/spikes.
Reconcile a main KPI via two methods.

Exercises

Use tables: orders(order_id, user_id, order_date, status, total_amount), order_items(order_id, product_id, quantity, item_amount), users(user_id, country).

Exercise ex1 — Stop duplicate inflation after a join
Write queries to: (a) count paid orders; (b) count rows after joining to order_items; (c) find order_ids with row duplication; (d) fix by aggregating items before the join.
See the detailed task in the Exercises section below.
Exercise ex2 — Range, gaps, and reconciliation
For the last 30 days of paid orders: (a) find negative totals; (b) detect missing dates in daily revenue; (c) compare order totals vs item totals and return the difference in cents.

I verified row counts pre/post join
I checked key uniqueness
I validated percentages and ranges
I reconciled a KPI via two methods

Practical projects

Sales dashboard QA: Build daily revenue and units dashboards and embed a query cell that prints sanity flags (e.g., gaps, negative totals, duplicates found).
Cohort retention validator: Compute weekly retention and auto-calc percent sum to 100% per cohort with anomaly notes.
Marketing attribution double-check: Compare attribution sums to total paid orders revenue; alert if mismatch exceeds 1%.

Who this is for

Business Analysts and aspiring analysts who run SQL and need to trust their numbers before sharing.

Prerequisites

Basic SELECT, WHERE, GROUP BY, JOIN
Comfort reading result sets and simple aggregates

Learning path

Master core SELECT/JOIN (done or in progress)
Apply sanity checks on every query (this lesson)
Introduce re-usable QA CTEs/macros in your queries
Learn window functions to compare against baselines
Automate a daily “QA panel” query for key dashboards

Next steps

Repeat today’s checks on another KPI (e.g., orders, active users, refunds).
Add a small tolerance rule you’ll reuse (e.g., sums must match within 0.5%).
Document two anomalies you found and how you resolved them.

Mini challenge

In one query, return yesterday’s paid revenue along with flags: duplicated_orders (bool), negative_amounts (bool), missing_yesterday (bool if no orders). Keep it compact and readable.

Quick Test

Take the quick test to check your understanding. Everyone can take it; progress is saved if you’re logged in.

Practice Exercises

2 exercises to complete

Instructions

Goal: Build reliable revenue by day without duplicate inflation.

Count paid orders (baseline).
Join orders to order_items and count rows. Compare with baseline.
List order_ids with duplicates after the join.
Fix it by aggregating items first, then joining. Return revenue by order_date (date) with sum of item_amounts.

Tables: orders(order_id, user_id, order_date, status, total_amount), order_items(order_id, product_id, quantity, item_amount)

Expected Output

1) A baseline count of paid orders. 2) After naive join, row count is greater than or equal to baseline; duplicates identified. 3) Duplicate order_ids listed. 4) Final daily revenue where duplicates are eliminated and totals match order-level amounts within rounding.