luvv to helpDiscover the Best Free Online Tools

SQL Basics

Learn SQL Basics for Data Visualization Engineer for free: roadmap, examples, subskills, and a skill exam.

Published: December 28, 2025 | Updated: December 28, 2025

Why SQL matters for Data Visualization Engineers

SQL is how you turn raw tables into clean, chart-ready datasets. As a Data Visualization Engineer, you use SQL to pick the right rows, shape columns, join sources, aggregate safely, and validate that what you show is correct. Good SQL means faster dashboards, reliable KPIs, and fewer surprises during demos.

What you’ll use SQL for day to day
  • Build compact, reusable datasets for charts and dashboards.
  • Join facts to dimensions without double counting.
  • Aggregate metrics correctly at the needed grain.
  • Rank, filter, and segment results for top-N and cohort visuals.
  • Handle dates, time windows, and missing data (NULLs).
  • Run quick validation checks to catch data issues early.

Who this is for

  • New or transitioning Data Visualization Engineers.
  • BI developers who build charts but want stronger SQL hygiene.
  • Analysts who need reliable, visual-ready datasets.

Prerequisites

  • Basic data concepts: tables, rows, columns, primary/foreign keys.
  • Comfort reading simple SELECT queries.
  • No prior experience with joins or window functions required.

Learning path

  1. Step 1 — Select and filter with purpose

    Practice SELECT, WHERE (including IN, BETWEEN, LIKE), ORDER BY, LIMIT. Focus on the columns and rows your visual needs, nothing extra.

  2. Step 2 — Join facts and dimensions safely

    Use INNER and LEFT JOIN. Confirm keys and the expected row counts before and after joins to avoid duplicates.

  3. Step 3 — Aggregate cleanly

    Use GROUP BY with SUM, COUNT, AVG, MIN, MAX. Filter post-aggregation with HAVING, and ensure the result grain matches your chart.

  4. Step 4 — Rank with window functions

    Use ROW_NUMBER, RANK, or DENSE_RANK with PARTITION BY to get top-N lists within categories.

  5. Step 5 — Tame dates for time series

    Truncate dates to week/month, build period-over-period comparisons, and keep date columns consistent.

  6. Step 6 — Handle NULLs explicitly

    Use COALESCE/IFNULL/NVL and remember that NULLs don’t equal anything, they must be checked with IS NULL.

  7. Step 7 — Validate before you publish

    Run quick counts and distinct checks, verify totals, and confirm the final dataset has exactly the columns and grain your visual expects.

Worked examples

1) Select and filter the right rows

-- Daily online revenue in January 2024
SELECT
  order_date,
  SUM(revenue) AS daily_revenue
FROM fact_orders
WHERE channel = 'Online'
  AND order_date BETWEEN DATE '2024-01-01' AND DATE '2024-01-31'
GROUP BY order_date
ORDER BY order_date;

Tip: return only columns you will chart. Here, one date column and one metric is ideal for a line chart.

2) Join facts to dimensions

-- Revenue by product category (ensure one category per product)
SELECT
  d.category,
  SUM(f.revenue) AS revenue
FROM fact_orders f
LEFT JOIN dim_product d
  ON f.product_id = d.product_id
GROUP BY d.category
ORDER BY revenue DESC;
Why LEFT JOIN?

To keep orders even if a product is missing in the dimension. Inner join would drop them.

3) Aggregate and filter groups with HAVING

-- Keep only categories with revenue above 1,000,000
SELECT
  d.category,
  SUM(f.revenue) AS revenue
FROM fact_orders f
JOIN dim_product d ON f.product_id = d.product_id
GROUP BY d.category
HAVING SUM(f.revenue) > 1000000
ORDER BY revenue DESC;

Use WHERE for row-level filters, HAVING for aggregated filters.

4) Rank products within category

-- Top 3 products by revenue per category
WITH product_rev AS (
  SELECT d.category, f.product_id, SUM(f.revenue) AS revenue
  FROM fact_orders f
  JOIN dim_product d ON f.product_id = d.product_id
  GROUP BY d.category, f.product_id
)
SELECT
  category,
  product_id,
  revenue,
  ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) AS rnk
FROM product_rev
WHERE rnk <= 3
ORDER BY category, rnk;

ROW_NUMBER gives deterministic top-N per group when ordered by a metric.

5) Time series with month truncation

-- Monthly revenue by channel
SELECT
  DATE_TRUNC('month', order_date) AS month_start,
  channel,
  SUM(revenue) AS revenue
FROM fact_orders
GROUP BY DATE_TRUNC('month', order_date), channel
ORDER BY month_start, channel;
Dialect note
  • Some systems use DATETRUNC('month', order_date), others DATE_TRUNC('month', order_date).
  • In systems without date truncation, compute year and month components and rebuild a first-of-month date.

6) Build a clean dataset for a stacked bar chart

-- Dataset grain: month x channel, one metric column
WITH base AS (
  SELECT
    DATE_TRUNC('month', order_date) AS month_start,
    channel,
    COALESCE(revenue, 0) AS revenue
  FROM fact_orders
)
SELECT month_start, channel, SUM(revenue) AS revenue
FROM base
GROUP BY month_start, channel
ORDER BY month_start, channel;

Only include the dimensions used in the chart (month, channel) and a single metric (revenue). Use COALESCE to avoid NULL metrics.

Drills / exercises

  • Write a query that returns top 10 products by total revenue in the last 90 days.
  • Return weekly active users: one row per week with COUNT(DISTINCT user_id).
  • Join orders to customers and compute revenue by customer segment; ensure segments with no orders still show as 0 revenue.
  • Produce a dataset for a line chart: date (daily) and cumulative revenue using a window SUM.
  • Find duplicate order_ids in your orders table (show the ids and their counts).

Common mistakes (and how to fix them)

  • Using WHERE on aggregated conditions. Fix: move aggregated filters to HAVING.
  • Unexpected row multiplication after joins. Fix: verify key uniqueness; pre-aggregate the many-side to the join grain.
  • Double counting revenue when joining facts to details (e.g., order lines + events). Fix: pick a single fact grain or aggregate details first.
  • Wrong top-N per group. Fix: use ROW_NUMBER with PARTITION BY and filter the resulting rank.
  • Inconsistent date grains. Fix: always truncate to the visual’s intended period and use consistent time zones.
  • NULL comparisons with =. Fix: use IS NULL / IS NOT NULL; use COALESCE to set defaults for metrics.

Debugging checklist

Quick validation before publishing
  • Row count sanity: is the number of rows equal to expected unique combinations of your dimensions?
  • Totals: does SUM(metric) match a known control total for the same filter period?
  • Duplicates: compare COUNT(id) vs COUNT(DISTINCT id) where relevant.
  • Join quality: check how many rows have NULL dimension keys after the join.
  • Date grain: confirm only one date unit (e.g., month_start) exists in the final dataset.

Mini project: Executive KPI dataset

  1. Define the visuals

    Line: monthly revenue; Bar: revenue by channel for the latest month; Table: top 10 products by revenue this quarter.

  2. Build source queries

    Create one query per visual with only the needed columns and correct grain. Use DATE_TRUNC for monthly, and window RANK for top 10.

  3. Handle NULLs and naming

    COALESCE numeric metrics to 0; set readable column names like month_start, channel, revenue.

  4. Validate

    Check duplicates, totals per month, and that counts match business expectations.

Acceptance checks to pass
  • No duplicate (month_start, channel) pairs in the monthly dataset.
  • Top 10 table has at most 10 rows per category/overall as intended.
  • Revenue totals match finance exports for the same period (within expected tolerance).

Practical projects

  • Marketing funnel dataset: sessions → signups → purchases by week and channel.
  • Support SLA dashboard: ticket volume and median resolution time by priority and week.
  • Inventory health: days-of-supply by warehouse and product category, monthly.

Subskills

  • Selecting and Filtering Data — Return only what the visual needs using WHERE, IN, BETWEEN, LIKE, ORDER BY, LIMIT.
  • Joining Tables — Combine facts and dimensions with INNER/LEFT JOIN, avoiding row inflation.
  • Aggregations, GROUP BY, HAVING — Summarize metrics to the chart grain and filter groups correctly.
  • Basic Window Functions for Ranking — Rank within partitions to get top-N lists per category.
  • Date Handling for Time Series — Truncate to week/month and keep time columns consistent.
  • Null Handling — Use COALESCE/IFNULL/NVL and proper NULL checks.
  • Building Dataset Queries for Visuals — Design outputs with clean columns and stable grain.
  • Data Validation Checks — Quick tests for duplicates, totals, and join quality.

Next steps

  • Practice on real datasets; favor small, composable queries over one giant query.
  • Create a personal snippet library for date truncation, window ranks, and group filters.
  • Review your last dashboard’s queries and add validation checks before each publish.

SQL Basics — Skill Exam

This timed quiz checks your grasp of SQL essentials for visualization work. You can take it as many times as you like, for free. Everyone can take the exam; only logged-in users will have their progress and scores saved.Scoring: questions may be single- or multiple-select. Aim for 70% or higher to pass.

12 questions70% to pass

Have questions about SQL Basics?

AI Assistant

Ask questions about this tool