luvv to helpDiscover the Best Free Online Tools
Topic 12 of 12

Validating Results With Sanity Checks

Learn Validating Results With Sanity Checks for free with explanations, exercises, and a quick test (for Business Analyst).

Published: December 20, 2025 | Updated: December 20, 2025

Why this matters

As a Business Analyst, your insights drive decisions. A solid query can still be wrong if the result is mis-joined, double-counted, filtered incorrectly, or off by a simple oversight. Sanity checks are fast, lightweight tests you run after writing SQL to ensure results are believable and safe to present.

  • Verify that revenue totals match expectations after joins.
  • Confirm counts before/after filters or joins didn’t multiply rows.
  • Ensure percentages sum to ~100% and no negatives appear in amounts that should be positive.
  • Catch missing dates, unexpected spikes/drops, and odd null rates.

Concept explained simply

Sanity checks are quick, approximate validations you run right after a query. They don’t prove the query is perfect; they prove it’s not obviously wrong. Think of them as smoke alarms: simple but powerful.

Mental model

  • Three-lens review: Input (source tables) → Transform (joins, filters, group-bys) → Output (numbers and shape). Run checks at each lens.
  • Invariants: Things that should always hold true (e.g., order_id uniqueness, non-negative paid amounts).
  • Reconciliation: Compare two ways of computing the same number, or compare to a known baseline.

Core sanity checks you can apply today

  • Row count continuity: Compare counts before/after joins and filters.
    -- Before join
    SELECT COUNT(*) AS orders_cnt FROM orders WHERE status = 'paid';
    -- After join
    SELECT COUNT(*) AS joined_cnt
    FROM orders o
    JOIN order_items i ON o.order_id = i.order_id
    WHERE o.status = 'paid';
    Expect joined_cnt ≥ orders_cnt. If much larger, check for duplicate inflation.
  • Primary key uniqueness:
    SELECT order_id, COUNT(*) AS c
    FROM orders
    GROUP BY order_id
    HAVING COUNT(*) > 1;
    No rows should return if order_id is a true key.
  • Join duplication trap: If joining to a many-side table, aggregate first:
    WITH items AS (
      SELECT order_id, SUM(item_amount) AS items_total
      FROM order_items
      GROUP BY order_id
    )
    SELECT o.order_id, o.total_amount, items.items_total
    FROM orders o
    LEFT JOIN items ON o.order_id = items.order_id;
  • Range and sign checks:
    -- Negative amounts that shouldn't be negative
    SELECT * FROM orders WHERE status = 'paid' AND total_amount < 0;
  • Percent-of-total sanity:
    WITH by_country AS (
      SELECT country, SUM(total_amount) AS amt
      FROM orders o JOIN users u ON o.user_id = u.user_id
      WHERE o.status = 'paid'
      GROUP BY country
    ), total AS (
      SELECT SUM(amt) AS t FROM by_country
    )
    SELECT SUM(ROUND(100.0 * amt / t, 2)) AS percent_sum
    FROM by_country, total;
    Expect percent_sum ≈ 100 (small rounding tolerance is okay).
  • Time continuity: look for missing dates or extreme spikes.
    SELECT order_date::date AS d, SUM(total_amount) AS revenue
    FROM orders
    WHERE status = 'paid'
    GROUP BY d
    ORDER BY d;
    Scan for gaps or shocking changes.
  • Null-rate inspection:
    SELECT
      SUM(CASE WHEN coupon_code IS NULL THEN 1 ELSE 0 END) * 1.0 / COUNT(*) AS null_rate
    FROM orders;
    Sudden jumps in null_rate can signal pipeline issues.
  • Two-method reconciliation:
    -- Method A: sum of order totals
    SELECT SUM(total_amount) FROM orders WHERE status = 'paid';
    -- Method B: sum of item_amounts
    SELECT SUM(i.item_amount)
    FROM orders o JOIN order_items i ON o.order_id = i.order_id
    WHERE o.status = 'paid';
    The two sums should match (allow small rounding differences).
Tip: A quick review checklist
  • Row counts before vs after joins look reasonable
  • Keys unique where expected
  • No unexpected negatives or out-of-range values
  • Percentages sum to ~100%
  • Time series has no gaps, spikes are explainable
  • Recomputed totals match via a second method
  • Null rates plausible; sudden shifts investigated

Worked examples

Example 1: Detect duplicate inflation after a join

Goal: Revenue by day from orders joined to order_items.

-- 1) Baseline count
SELECT COUNT(*) AS paid_orders
FROM orders
WHERE status = 'paid';

-- 2) After naive join (likely duplicates)
SELECT COUNT(*) AS rows_after_join
FROM orders o
JOIN order_items i ON o.order_id = i.order_id
WHERE o.status = 'paid';

-- 3) Diagnose order-level duplication
SELECT o.order_id, COUNT(*) AS row_c
FROM orders o JOIN order_items i ON o.order_id = i.order_id
WHERE o.status = 'paid'
GROUP BY o.order_id
HAVING COUNT(*) > 1
ORDER BY row_c DESC
LIMIT 10;

If many order_ids have row_c > 1, aggregate items first:

WITH items AS (
  SELECT order_id, SUM(item_amount) AS items_total
  FROM order_items
  GROUP BY order_id
)
SELECT o.order_date::date AS d, SUM(items.items_total) AS revenue
FROM orders o
JOIN items ON o.order_id = items.order_id
WHERE o.status = 'paid'
GROUP BY d
ORDER BY d;
Example 2: Percent-of-total sanity

Goal: Country revenue shares sum to ~100%.

WITH by_country AS (
  SELECT u.country, SUM(o.total_amount) AS amt
  FROM orders o
  JOIN users u ON o.user_id = u.user_id
  WHERE o.status = 'paid'
  GROUP BY u.country
), total AS (
  SELECT SUM(amt) AS t FROM by_country
)
SELECT country,
       ROUND(100.0 * amt / t, 2) AS pct
FROM by_country, total
ORDER BY pct DESC;

-- Sanity: do the parts sum to ~100?
WITH by_country AS (
  SELECT u.country, SUM(o.total_amount) AS amt
  FROM orders o
  JOIN users u ON o.user_id = u.user_id
  WHERE o.status = 'paid'
  GROUP BY u.country
), total AS (
  SELECT SUM(amt) AS t FROM by_country
)
SELECT SUM(ROUND(100.0 * amt / t, 2)) AS percent_sum
FROM by_country, total;

Investigate if percent_sum is far from 100 due to rounding or missing categories.

Example 3: Two-method revenue reconciliation

Goal: Sum by orders vs sum of items.

-- Method A
SELECT SUM(total_amount) AS a
FROM orders
WHERE status = 'paid';

-- Method B
SELECT SUM(i.item_amount) AS b
FROM orders o
JOIN order_items i ON o.order_id = i.order_id
WHERE o.status = 'paid';

-- Compare difference
WITH a AS (
  SELECT SUM(total_amount) AS v
  FROM orders WHERE status = 'paid'
), b AS (
  SELECT SUM(i.item_amount) AS v
  FROM orders o JOIN order_items i ON o.order_id = i.order_id
  WHERE o.status = 'paid'
)
SELECT (SELECT v FROM a) - (SELECT v FROM b) AS diff;

Expect diff ≈ 0. If not, check missing items, returns, or rounding.

Common mistakes and how to self-check

  • Counting rows after joining to a many-side table. Fix: Aggregate first or count DISTINCT keys.
  • Filtering on the wrong side of a LEFT JOIN (turns it into INNER). Fix: Put child-table filters in the ON clause.
  • Ignoring rounding tolerances. Fix: Allow a small epsilon when comparing sums.
  • Overlooking time gaps. Fix: Always scan daily counts for missing dates or sharp anomalies.
  • Assuming keys are unique. Fix: Explicitly test for duplicates with GROUP BY ... HAVING COUNT(*) > 1.
Quick self-check routine
  1. Run key uniqueness checks.
  2. Compare pre- and post-join counts.
  3. Check negative/out-of-range values.
  4. Scan time series for gaps/spikes.
  5. Reconcile a main KPI via two methods.

Exercises

Use tables: orders(order_id, user_id, order_date, status, total_amount), order_items(order_id, product_id, quantity, item_amount), users(user_id, country).

  1. Exercise ex1 — Stop duplicate inflation after a join
    Write queries to: (a) count paid orders; (b) count rows after joining to order_items; (c) find order_ids with row duplication; (d) fix by aggregating items before the join.
    See the detailed task in the Exercises section below.
  2. Exercise ex2 — Range, gaps, and reconciliation
    For the last 30 days of paid orders: (a) find negative totals; (b) detect missing dates in daily revenue; (c) compare order totals vs item totals and return the difference in cents.
  • I verified row counts pre/post join
  • I checked key uniqueness
  • I validated percentages and ranges
  • I reconciled a KPI via two methods

Practical projects

  • Sales dashboard QA: Build daily revenue and units dashboards and embed a query cell that prints sanity flags (e.g., gaps, negative totals, duplicates found).
  • Cohort retention validator: Compute weekly retention and auto-calc percent sum to 100% per cohort with anomaly notes.
  • Marketing attribution double-check: Compare attribution sums to total paid orders revenue; alert if mismatch exceeds 1%.

Who this is for

Business Analysts and aspiring analysts who run SQL and need to trust their numbers before sharing.

Prerequisites

  • Basic SELECT, WHERE, GROUP BY, JOIN
  • Comfort reading result sets and simple aggregates

Learning path

  1. Master core SELECT/JOIN (done or in progress)
  2. Apply sanity checks on every query (this lesson)
  3. Introduce re-usable QA CTEs/macros in your queries
  4. Learn window functions to compare against baselines
  5. Automate a daily “QA panel” query for key dashboards

Next steps

  • Repeat today’s checks on another KPI (e.g., orders, active users, refunds).
  • Add a small tolerance rule you’ll reuse (e.g., sums must match within 0.5%).
  • Document two anomalies you found and how you resolved them.

Mini challenge

In one query, return yesterday’s paid revenue along with flags: duplicated_orders (bool), negative_amounts (bool), missing_yesterday (bool if no orders). Keep it compact and readable.

Quick Test

Take the quick test to check your understanding. Everyone can take it; progress is saved if you’re logged in.

Practice Exercises

2 exercises to complete

Instructions

Goal: Build reliable revenue by day without duplicate inflation.

  1. Count paid orders (baseline).
  2. Join orders to order_items and count rows. Compare with baseline.
  3. List order_ids with duplicates after the join.
  4. Fix it by aggregating items first, then joining. Return revenue by order_date (date) with sum of item_amounts.

Tables: orders(order_id, user_id, order_date, status, total_amount), order_items(order_id, product_id, quantity, item_amount)

Expected Output
1) A baseline count of paid orders. 2) After naive join, row count is greater than or equal to baseline; duplicates identified. 3) Duplicate order_ids listed. 4) Final daily revenue where duplicates are eliminated and totals match order-level amounts within rounding.

Validating Results With Sanity Checks — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Validating Results With Sanity Checks?

AI Assistant

Ask questions about this tool