Why this matters
As a Data Visualization Engineer, your charts must be accurate and readable. NULLs (missing/unknown values) can break totals, distort rates (division by zero), hide categories, and produce confusing labels like blank strings in legends. Knowing how to detect, fill, and safely compute with NULLs keeps dashboards trustworthy.
- Remove blank labels by replacing NULLs with friendly text.
- Prevent division errors when a denominator is NULL or zero.
- Find unmapped items after joins to fix data quality issues.
- Compute correct counts and averages when data is incomplete.
Who this is for
- Data Visualization Engineers building BI dashboards and reports.
- Analysts preparing datasets for Looker, Tableau, Power BI, or custom charts.
- Analytics Engineers ensuring semantic layer correctness.
Prerequisites
- Basic SELECT, WHERE, GROUP BY, ORDER BY.
- JOIN basics (especially LEFT JOIN).
- Simple CASE expressions.
Concept explained simply
NULL means "unknown" or "not applicable". It is not zero, not an empty string, and not false. Comparisons with NULL return unknown, not true or false. This affects filtering and calculations.
Mental model
- Think of NULL as a covered box: you don’t know what’s inside. Any comparison with a covered box stays unknown.
- Aggregates like SUM/AVG skip covered boxes. COUNT(col) counts only opened boxes; COUNT(*) counts all rows.
- To open the box or provide a fallback, use COALESCE.
Key facts in one glance
- Check NULL: IS NULL, IS NOT NULL
- Fallback: COALESCE(x, y, z)
- Avoid divide-by-zero: denominator -> NULLIF(den, 0)
- Anti-join check after LEFT JOIN: WHERE right.id IS NULL
- Filters with NULLs: status <> 'active' won’t match NULL; add OR status IS NULL if needed
Core patterns and syntax
Detect NULLs
-- Correct
WHERE city IS NULL
WHERE city IS NOT NULL
-- Incorrect: this never matches NULL
WHERE city = NULL
Fill or label NULLs
SELECT COALESCE(city, 'Unknown') AS city_label FROM customers;
-- COALESCE returns the first non-NULL argument
Safe math (avoid errors)
-- Safe rates: if denominator is 0, make it NULL (rate becomes NULL)
SELECT revenue / NULLIF(orders, 0) AS avg_order_value FROM daily_stats;
Counts and averages
-- COUNT(*) counts rows (including NULL values in columns)
-- COUNT(col) counts non-NULL values only
-- SUM/AVG ignore NULL inputs
Conditional logic
CASE WHEN status IS NULL THEN 'Unknown' ELSE status END AS status_label
Joins and missing mappings
-- Find products without a category mapping
SELECT p.id
FROM products p
LEFT JOIN product_categories pc ON p.id = pc.product_id
WHERE pc.product_id IS NULL;
Worked examples
Example 1: Friendly labels for charts
Goal: show a market share donut by city, labeling missing city as "Unknown".
SELECT COALESCE(city, 'Unknown') AS city_label, COUNT(*) AS users
FROM users
GROUP BY COALESCE(city, 'Unknown')
ORDER BY users DESC;
Why it works: COALESCE groups NULLs under a readable label, avoiding empty legend entries.
Example 2: Safe conversion rate
Goal: conversion_rate = conversions / visits, but visits can be zero or NULL.
SELECT date,
conversions,
visits,
conversions / NULLIF(visits, 0) AS conversion_rate
FROM daily_web;
Why it works: NULLIF(visits, 0) returns NULL when visits = 0, preventing divide-by-zero. The rate becomes NULL for those days.
Example 3: Find unmapped items after a LEFT JOIN
Goal: list products that lack a category mapping.
SELECT p.product_id, p.product_name
FROM products p
LEFT JOIN product_category_map m ON p.product_id = m.product_id
WHERE m.category_id IS NULL;
Why it works: Unmatched right-side rows yield NULLs; filtering on IS NULL isolates missing mappings.
Example 4: COUNT differences
Goal: count emails vs. total users.
SELECT COUNT(*) AS total_users, COUNT(email) AS users_with_email
FROM users;
Why it works: COUNT(email) excludes NULL emails, while COUNT(*) includes all rows.
Exercises
Try these hands-on tasks. Then verify with the solutions. A checklist is included to self-review.
- ex1 — Label NULLs for dimension clarity: In customers(id, name, city), return id, name, and city_label where city_label = COALESCE(city, 'Unknown').
- ex2 — Safe rate computation: In sales_by_day(d, total_revenue, total_orders), compute avg_order_value = total_revenue / NULLIF(total_orders, 0).
- ex3 — Missing mapping audit: Using products(p_id, brand) and product_categories(p_id, category_id), list p_id, brand where no category exists, and return a total count.
Checklist: Did you cover these?
- Used IS NULL / IS NOT NULL instead of = NULL.
- Used COALESCE to create readable labels.
- Protected divisions with NULLIF.
- Used LEFT JOIN + WHERE right.col IS NULL for unmapped records.
- Verified COUNT(col) vs COUNT(*).
Common mistakes and self-check
- Using = NULL instead of IS NULL. Self-check: Search your query for "= NULL" — replace it.
- Forgetting to handle zero denominators. Self-check: Every division has a NULLIF? If not, add it.
- Empty strings vs NULL. Self-check: If you see '' values, consider NULLIF(field, '') before COALESCE. Note: Behavior can vary by database; treat this as a design choice in your pipeline.
- Incorrect anti-join filter. Self-check: After LEFT JOIN, your WHERE must check the right-side column IS NULL.
- Unexpected filters with NULL. Self-check: If status <> 'active' is used, add OR status IS NULL if you intend to include unknowns.
Mini challenge
Dataset:
support_tickets(id, created_at, resolved_at, channel, agent_id)
-- resolved_at can be NULL if not resolved
- Task A: resolution_rate per channel = resolved_tickets / total_tickets (safe division).
- Task B: bucket resolution_time_days into 'Under 1 day', '1-3 days', '3+ days', 'Unresolved'.
- Task C: list channels with zero resolved tickets.
Hints
- Resolved flag: CASE WHEN resolved_at IS NOT NULL THEN 1 END, or COUNT(resolved_at)
- Rate: SUM(resolved_flag) / NULLIF(COUNT(*), 0)
- Buckets: CASE on DATE_DIFF/EXTRACT function appropriate to your SQL dialect
Practical projects
- Build a "Data Quality Watch" dashboard showing counts of NULLs across key dimensions and trends over time.
- Create a KPI dataset where all rates are computed with NULLIF protection and all labels use COALESCE; feed it to a chart tool.
- Design a join-audit report listing unmapped products or users after dimension joins, with counts by source system.
Learning path
- Before: SELECT and WHERE basics; JOIN fundamentals.
- Now: NULL handling patterns (this lesson).
- Next: Window functions with NULLs (e.g., COALESCE in partitions), date handling, and data type casting safety.
Next steps
Practice the exercises, then take the Quick Test below to confirm understanding. The test is available to everyone; only logged-in users get saved progress.