Skill Not Found

Why SQL matters for Data Visualization Engineers

SQL is how you turn raw tables into clean, chart-ready datasets. As a Data Visualization Engineer, you use SQL to pick the right rows, shape columns, join sources, aggregate safely, and validate that what you show is correct. Good SQL means faster dashboards, reliable KPIs, and fewer surprises during demos.

What you’ll use SQL for day to day

Build compact, reusable datasets for charts and dashboards.
Join facts to dimensions without double counting.
Aggregate metrics correctly at the needed grain.
Rank, filter, and segment results for top-N and cohort visuals.
Handle dates, time windows, and missing data (NULLs).
Run quick validation checks to catch data issues early.

Who this is for

New or transitioning Data Visualization Engineers.
BI developers who build charts but want stronger SQL hygiene.
Analysts who need reliable, visual-ready datasets.

Prerequisites

Basic data concepts: tables, rows, columns, primary/foreign keys.
Comfort reading simple SELECT queries.
No prior experience with joins or window functions required.

Learning path

Step 1 — Select and filter with purpose

Practice SELECT, WHERE (including IN, BETWEEN, LIKE), ORDER BY, LIMIT. Focus on the columns and rows your visual needs, nothing extra.
Step 2 — Join facts and dimensions safely

Use INNER and LEFT JOIN. Confirm keys and the expected row counts before and after joins to avoid duplicates.
Step 3 — Aggregate cleanly

Use GROUP BY with SUM, COUNT, AVG, MIN, MAX. Filter post-aggregation with HAVING, and ensure the result grain matches your chart.
Step 4 — Rank with window functions

Use ROW_NUMBER, RANK, or DENSE_RANK with PARTITION BY to get top-N lists within categories.
Step 5 — Tame dates for time series

Truncate dates to week/month, build period-over-period comparisons, and keep date columns consistent.
Step 6 — Handle NULLs explicitly

Use COALESCE/IFNULL/NVL and remember that NULLs don’t equal anything, they must be checked with IS NULL.
Step 7 — Validate before you publish

Run quick counts and distinct checks, verify totals, and confirm the final dataset has exactly the columns and grain your visual expects.

Worked examples

1) Select and filter the right rows

-- Daily online revenue in January 2024
SELECT
  order_date,
  SUM(revenue) AS daily_revenue
FROM fact_orders
WHERE channel = 'Online'
  AND order_date BETWEEN DATE '2024-01-01' AND DATE '2024-01-31'
GROUP BY order_date
ORDER BY order_date;

Tip: return only columns you will chart. Here, one date column and one metric is ideal for a line chart.

2) Join facts to dimensions

-- Revenue by product category (ensure one category per product)
SELECT
  d.category,
  SUM(f.revenue) AS revenue
FROM fact_orders f
LEFT JOIN dim_product d
  ON f.product_id = d.product_id
GROUP BY d.category
ORDER BY revenue DESC;

Why LEFT JOIN?

To keep orders even if a product is missing in the dimension. Inner join would drop them.

3) Aggregate and filter groups with HAVING

-- Keep only categories with revenue above 1,000,000
SELECT
  d.category,
  SUM(f.revenue) AS revenue
FROM fact_orders f
JOIN dim_product d ON f.product_id = d.product_id
GROUP BY d.category
HAVING SUM(f.revenue) > 1000000
ORDER BY revenue DESC;

Use WHERE for row-level filters, HAVING for aggregated filters.

4) Rank products within category

-- Top 3 products by revenue per category
WITH product_rev AS (
  SELECT d.category, f.product_id, SUM(f.revenue) AS revenue
  FROM fact_orders f
  JOIN dim_product d ON f.product_id = d.product_id
  GROUP BY d.category, f.product_id
)
SELECT
  category,
  product_id,
  revenue,
  ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) AS rnk
FROM product_rev
WHERE rnk <= 3
ORDER BY category, rnk;

ROW_NUMBER gives deterministic top-N per group when ordered by a metric.

5) Time series with month truncation

-- Monthly revenue by channel
SELECT
  DATE_TRUNC('month', order_date) AS month_start,
  channel,
  SUM(revenue) AS revenue
FROM fact_orders
GROUP BY DATE_TRUNC('month', order_date), channel
ORDER BY month_start, channel;

Dialect note

Some systems use DATETRUNC('month', order_date), others DATE_TRUNC('month', order_date).
In systems without date truncation, compute year and month components and rebuild a first-of-month date.

6) Build a clean dataset for a stacked bar chart

-- Dataset grain: month x channel, one metric column
WITH base AS (
  SELECT
    DATE_TRUNC('month', order_date) AS month_start,
    channel,
    COALESCE(revenue, 0) AS revenue
  FROM fact_orders
)
SELECT month_start, channel, SUM(revenue) AS revenue
FROM base
GROUP BY month_start, channel
ORDER BY month_start, channel;

Only include the dimensions used in the chart (month, channel) and a single metric (revenue). Use COALESCE to avoid NULL metrics.

Drills / exercises

Write a query that returns top 10 products by total revenue in the last 90 days.
Return weekly active users: one row per week with COUNT(DISTINCT user_id).
Join orders to customers and compute revenue by customer segment; ensure segments with no orders still show as 0 revenue.
Produce a dataset for a line chart: date (daily) and cumulative revenue using a window SUM.
Find duplicate order_ids in your orders table (show the ids and their counts).

Common mistakes (and how to fix them)

Using WHERE on aggregated conditions. Fix: move aggregated filters to HAVING.
Unexpected row multiplication after joins. Fix: verify key uniqueness; pre-aggregate the many-side to the join grain.
Double counting revenue when joining facts to details (e.g., order lines + events). Fix: pick a single fact grain or aggregate details first.
Wrong top-N per group. Fix: use ROW_NUMBER with PARTITION BY and filter the resulting rank.
Inconsistent date grains. Fix: always truncate to the visual’s intended period and use consistent time zones.
NULL comparisons with =. Fix: use IS NULL / IS NOT NULL; use COALESCE to set defaults for metrics.

Debugging checklist

Quick validation before publishing

Row count sanity: is the number of rows equal to expected unique combinations of your dimensions?
Totals: does SUM(metric) match a known control total for the same filter period?
Duplicates: compare COUNT(id) vs COUNT(DISTINCT id) where relevant.
Join quality: check how many rows have NULL dimension keys after the join.
Date grain: confirm only one date unit (e.g., month_start) exists in the final dataset.

Mini project: Executive KPI dataset

Define the visuals

Line: monthly revenue; Bar: revenue by channel for the latest month; Table: top 10 products by revenue this quarter.
Build source queries

Create one query per visual with only the needed columns and correct grain. Use DATE_TRUNC for monthly, and window RANK for top 10.
Handle NULLs and naming

COALESCE numeric metrics to 0; set readable column names like month_start, channel, revenue.
Validate

Check duplicates, totals per month, and that counts match business expectations.

Acceptance checks to pass

No duplicate (month_start, channel) pairs in the monthly dataset.
Top 10 table has at most 10 rows per category/overall as intended.
Revenue totals match finance exports for the same period (within expected tolerance).

Practical projects

Marketing funnel dataset: sessions → signups → purchases by week and channel.
Support SLA dashboard: ticket volume and median resolution time by priority and week.
Inventory health: days-of-supply by warehouse and product category, monthly.

Subskills

Selecting and Filtering Data — Return only what the visual needs using WHERE, IN, BETWEEN, LIKE, ORDER BY, LIMIT.
Joining Tables — Combine facts and dimensions with INNER/LEFT JOIN, avoiding row inflation.
Aggregations, GROUP BY, HAVING — Summarize metrics to the chart grain and filter groups correctly.
Basic Window Functions for Ranking — Rank within partitions to get top-N lists per category.
Date Handling for Time Series — Truncate to week/month and keep time columns consistent.
Null Handling — Use COALESCE/IFNULL/NVL and proper NULL checks.
Building Dataset Queries for Visuals — Design outputs with clean columns and stable grain.
Data Validation Checks — Quick tests for duplicates, totals, and join quality.

Next steps

Practice on real datasets; favor small, composable queries over one giant query.
Create a personal snippet library for date truncation, window ranks, and group filters.
Review your last dashboard’s queries and add validation checks before each publish.

Menu

SQL Basics

Table of Contents

Why SQL matters for Data Visualization Engineers

Who this is for

Prerequisites

Learning path

Step 1 — Select and filter with purpose

Step 2 — Join facts and dimensions safely

Step 3 — Aggregate cleanly

Step 4 — Rank with window functions

Step 5 — Tame dates for time series

Step 6 — Handle NULLs explicitly

Step 7 — Validate before you publish

Worked examples

1) Select and filter the right rows

2) Join facts to dimensions

3) Aggregate and filter groups with HAVING

4) Rank products within category

5) Time series with month truncation

6) Build a clean dataset for a stacked bar chart

Drills / exercises

Common mistakes (and how to fix them)

Debugging checklist

Mini project: Executive KPI dataset

Define the visuals

Build source queries

Handle NULLs and naming

Validate

Practical projects

Subskills

Next steps

SQL Basics — Skill Exam

Topics

Date Handling For Time Series

Selecting And Filtering Data

Null Handling

Joining Tables

Aggregations Group By Having

Basic Window Functions For Ranking

Building Dataset Queries For Visuals

Data Validation Checks

Have questions about SQL Basics?

AI Assistant