Why this matters
As a Data Analyst, you will merge datasets, calculate rates, and build dashboards. Real tables often contain missing values. If you don’t handle NULLs correctly you may get wrong counts, divide-by-zero errors, or misleading text. Mastering IS NULL, COALESCE, and NULLIF lets you write safe, predictable queries and clear reports.
- Quality checks: count missing values in critical columns.
- Reporting: replace NULLs with readable labels like "Unknown".
- Metrics: avoid division by zero in conversion rates.
- Joins: distinguish between no match vs. real zero values.
Who this is for
- Beginner to intermediate SQL users preparing for analyst roles.
- Anyone building dashboards where missing values can mislead users.
Prerequisites
- Basic SELECT, WHERE, GROUP BY, ORDER BY.
- Familiarity with NULL semantics (NULL means "unknown/absent").
Concept explained simply
NULL is a special marker for "no value". It is not zero and not an empty string. Comparisons with NULL don’t return true or false; they become unknown. That’s why you cannot write = NULL; you must use IS NULL.
- IS NULL / IS NOT NULL: test whether a value is missing.
- COALESCE(a, b, c): return the first non-NULL value. Great for defaults and readable outputs.
- NULLIF(x, y): returns NULL when x equals y; otherwise returns x. Great for avoiding divide-by-zero.
Mental model
Think of a pipeline: IS NULL checks if the pipe is empty; COALESCE supplies a fallback if empty; NULLIF temporarily empties the pipe when a problematic value appears (e.g., 0 in a denominator).
Dialect notes (works broadly)
- Syntax is standard across major SQL engines (PostgreSQL, MySQL, SQL Server, SQLite, BigQuery).
- Type precedence: COALESCE will try to align argument types; cast explicitly if needed.
- String concatenation: use CONCAT(...) if || is not supported.
Syntax cheat-sheet
-- Check missing values
col IS NULL
col IS NOT NULL
-- First non-NULL
COALESCE(col, 'Unknown')
COALESCE(email, alt_email, 'no-email')
-- Make a value disappear when equal
NULLIF(numerator, 0) -- often for division
-- Safe division example
metric = numerator * 1.0 / NULLIF(denominator, 0)
Worked examples
Example 1: Clean labels for reporting
-- Show customer name or 'Unknown'
SELECT COALESCE(customer_name, 'Unknown') AS customer_name_clean,
COUNT(*) AS orders
FROM orders
GROUP BY COALESCE(customer_name, 'Unknown')
ORDER BY orders DESC;
What this fixes
Prevents NULL from creating a separate blank label in charts; groups them as "Unknown" so stakeholders understand.
Example 2: Safe rate calculation
-- Avoid divide-by-zero when impressions are 0
SELECT campaign_id,
clicks * 1.0 / NULLIF(impressions, 0) AS ctr
FROM campaign_stats;
What this fixes
When impressions = 0, NULLIF makes denominator NULL, so ctr becomes NULL instead of error/Infinity.
Example 3: Distinguish 0 vs NULL
-- Find rows where discount is truly missing, not zero
SELECT *
FROM sales
WHERE discount IS NULL;
Why it matters
Zero means "no discount applied"; NULL may mean "not recorded". They tell different stories in analysis.
Example 4: Fallback across multiple sources
-- Prefer primary phone, else secondary, else a label
SELECT COALESCE(primary_phone, secondary_phone, 'no-phone') AS phone
FROM customers;
Tip
Order arguments by reliability. Cast if types differ, e.g., COALESCE(CAST(primary_phone AS TEXT), ...).
Common mistakes and self-check
- Using = NULL instead of IS NULL. Self-check: search your query for "= NULL" and replace with IS NULL.
- Forgetting COALESCE before grouping/visualizing, causing blank labels. Self-check: run GROUP BY with and without COALESCE and compare category counts.
- Division by zero. Self-check: scan denominators; wrap with NULLIF(den, 0).
- Mixing types in COALESCE. Self-check: explicitly CAST arguments to a single type.
- Replacing NULL with 0 when 0 has meaning. Self-check: confirm with domain owners whether 0 and NULL are equivalent.
Learning path
- Review NULL semantics and three-valued logic (true/false/unknown).
- Learn IS NULL / IS NOT NULL for filtering.
- Use COALESCE for default values in SELECT and GROUP BY.
- Apply NULLIF for safe division and conditional nulling.
- Combine with CASE for nuanced data cleaning.
Practical projects
- Dashboard-ready customers table: create a view that COALESCEs missing names, emails, and regions to standard labels.
- Marketing metrics: compute CTR, CVR, and CPA with NULLIF to avoid divide-by-zero; flag rows where metrics are NULL.
- Data quality report: output counts of NULL by column for 5 key tables, sorted by highest NULL rate.
Exercises
Try these before viewing solutions. The same tasks are in the Exercises panel below.
Exercise 1: Group with COALESCE
Table: orders(order_id, customer_name, order_total). Task: Group orders by cleaned customer_name using COALESCE(customer_name, 'Unknown') and return name + count, sorted by count desc.
Show solution
SELECT COALESCE(customer_name, 'Unknown') AS customer_name_clean,
COUNT(*) AS order_count
FROM orders
GROUP BY COALESCE(customer_name, 'Unknown')
ORDER BY order_count DESC;
Exercise 2: Safe rate with NULLIF
Table: web_metrics(day, sessions, transactions). Task: Compute conversion_rate = transactions * 1.0 / NULLIF(sessions, 0). Return day and conversion_rate.
Show solution
SELECT day,
transactions * 1.0 / NULLIF(sessions, 0) AS conversion_rate
FROM web_metrics;
Self-check checklist
- I used IS NULL/IS NOT NULL instead of = NULL.
- I applied COALESCE before grouping to stabilize labels.
- I wrapped denominators with NULLIF(..., 0).
- I casted COALESCE arguments to the same type when needed.
Mini challenge
Table: inventory(product_id, supplier_name, units_in_stock, reorder_point). Produce: product_id, supplier_label (COALESCE(supplier_name,'Unknown Supplier')), stock_gap = CASE WHEN units_in_stock IS NULL THEN NULL WHEN reorder_point IS NULL THEN NULL ELSE reorder_point - units_in_stock END. Sort by stock_gap desc, putting NULLs last.
Hint
Use ORDER BY stock_gap IS NULL, stock_gap DESC.
Quick Test
Everyone can take the test; only logged-in users get saved progress.
When ready, open the Quick Test below.
Next steps
- Audit a recent query and add defensive NULL handling where needed.
- Create a reusable view with consistent NULL defaults for key dimensions.
- Move on to CASE expressions to complement COALESCE and NULLIF.