How to learn Building Dataset Queries For Visuals for SQL Basics in Data Visualization Engineer for free

Why this matters

As a Data Visualization Engineer, you turn raw tables into visuals people trust. Your SQL must deliver clean, visual-ready datasets that:

Match the visual grain (e.g., one row per month for a time series)
Aggregate correctly (SUM/COUNT/AVG with proper GROUP BY)
Filter and parametrize safely (date range, segments, top-N)
Perform well (pre-aggregation, minimal columns, appropriate joins)

Real tasks you’ll do

Build a monthly revenue time series with optional product/region filters
Create a bar chart of top categories with an “Others” group
Prepare a funnel dataset (visited → added_to_cart → purchased)
Produce KPI tiles with current value, target, and delta vs last period

Concept explained simply

Visuals consume tables. Your job is to produce a table where each row equals one visual mark. If the visual is a monthly line chart, each row should represent one month. If it’s a bar chart by category, each row is one category.

SQL building blocks:

SELECT: choose only the columns the visual needs
FROM/JOIN: bring together necessary tables
WHERE: filter to the relevant subset (use parameters for interactivity)
GROUP BY: set row grain (what each row represents)
ORDER BY/LIMIT: control visual order and top-N logic

Mental model: “One row per mark”

Pick the mark grain first: time bucket, category, segment, or a single KPI. Then ensure the query emits exactly one row per mark. Everything else (joins, filters, aggregations) must serve that grain.

Query patterns for common visuals

Time series: DATE_TRUNC on a timestamp, GROUP BY the trunc, aggregate measures
Bar chart (top-N): aggregate by category, rank with window functions, limit or bucket into Others
KPIs: aggregate to a single row; optionally union with prior period for delta
Funnel: compute counts at each step on the same population; left joins to preserve earlier steps
Tables: minimal transformations, clear column names, stable sorting

Visual-friendly dataset checklist

Row grain matches the visual
Column names are readable and final (no cryptic aliases)
Only needed columns are returned
Filters are parameterized and safe
Aggregations are correct and non-duplicative
Performance: pre-aggregate where possible

Worked examples

Assume a simple commerce schema:

orders(order_id, order_date, customer_id, status, total_amount)
order_items(order_id, product_id, quantity, price)
products(product_id, category)
refunds(refund_id, order_id, refund_amount, refund_date)

Example 1: Monthly net revenue time series

SELECT
  DATE_TRUNC('month', o.order_date) AS month,
  SUM(o.total_amount) - COALESCE(SUM(r.refund_amount), 0) AS net_revenue
FROM orders o
LEFT JOIN refunds r
  ON r.order_id = o.order_id
WHERE o.status = 'completed'
  AND o.order_date >= :start_date
  AND o.order_date < :end_date
GROUP BY 1
ORDER BY 1;

Notes: DATE_TRUNC sets the grain; LEFT JOIN keeps months with zero refunds; parameters bound in your BI tool.

Example 2: Top 5 categories with Others

WITH cat_rev AS (
  SELECT p.category,
         SUM(oi.quantity * oi.price) AS revenue
  FROM order_items oi
  JOIN orders o ON o.order_id = oi.order_id AND o.status = 'completed'
  JOIN products p ON p.product_id = oi.product_id
  WHERE o.order_date >= CURRENT_DATE - INTERVAL '90 days'
  GROUP BY p.category
), ranked AS (
  SELECT category, revenue,
         DENSE_RANK() OVER (ORDER BY revenue DESC) AS rnk
  FROM cat_rev
), split AS (
  SELECT category, revenue FROM ranked WHERE rnk <= 5
  UNION ALL
  SELECT 'Others' AS category,
         (SELECT SUM(revenue) FROM ranked WHERE rnk > 5) AS revenue
)
SELECT category, revenue
FROM split
WHERE revenue IS NOT NULL
ORDER BY revenue DESC;

Notes: Window function ranks; a UNION builds the Others bucket.

Example 3: Funnel counts

Assume events(user_id, event_name, event_time). Steps: visited_site, add_to_cart, purchase.

WITH base AS (
  SELECT DISTINCT user_id
  FROM events
  WHERE event_time BETWEEN :start_date AND :end_date
), s1 AS (
  SELECT DISTINCT user_id FROM events
  WHERE event_name = 'visited_site' AND event_time BETWEEN :start_date AND :end_date
), s2 AS (
  SELECT DISTINCT user_id FROM events
  WHERE event_name = 'add_to_cart' AND event_time BETWEEN :start_date AND :end_date
), s3 AS (
  SELECT DISTINCT user_id FROM events
  WHERE event_name = 'purchase' AND event_time BETWEEN :start_date AND :end_date
)
SELECT 'visited_site' AS step, COUNT(b.user_id) AS users
FROM base b LEFT JOIN s1 ON b.user_id = s1.user_id
WHERE s1.user_id IS NOT NULL
UNION ALL
SELECT 'add_to_cart', COUNT(b.user_id)
FROM base b LEFT JOIN s2 ON b.user_id = s2.user_id
WHERE s2.user_id IS NOT NULL
UNION ALL
SELECT 'purchase', COUNT(b.user_id)
FROM base b LEFT JOIN s3 ON b.user_id = s3.user_id;

Notes: The funnel is on the same population window; steps are counted distinctly.

Design a dataset query in 6 steps

Define the visual grain (time bucket, category, segment, or single KPI).
List required fields: dimensions, measures, filters, sort order.
Sketch joins and filters on paper (avoid accidental fan-out).
Write SELECT with clear aliases and minimal columns.
Group and aggregate to the grain; test with small date ranges.
Optimize: predicates on indexed columns, pre-aggregation, limit rows.

Performance tips that matter

Pre-aggregate for dashboards; avoid aggregating millions of rows per view
Filter early by date/status; avoid functions wrapping indexed columns
Return only needed columns; avoid SELECT *
Use appropriate time buckets; don’t over-granularize
Consider materialized views for heavy reusable datasets

Quality, filters, and parameters

Use explicit WHERE for data window; set defaults for :start_date/:end_date
Handle NULLs with COALESCE where appropriate
Name columns user-friendly: month, category, revenue
Validate against known totals before publishing

Common mistakes and how to self-check

Fan-out joins causing inflated sums
- Self-check: sum at detail level, then aggregate; compare to expected totals
Mismatched grain for the visual
- Self-check: one row per mark? If not, adjust GROUP BY
Forgetting filters (status, canceled orders)
- Self-check: explicit WHERE status filters, reviewed with domain owners
Top-N without deterministic ordering
- Self-check: add tie-breakers and document them
Returning too many columns
- Self-check: remove unused columns; keep the dataset lean

Exercises

Do these to practice. The Quick Test at the bottom is available to everyone; logged-in learners will have progress saved automatically.

Exercise 1: Monthly Net Revenue Time Series

Goal: Build a monthly net revenue dataset for a line chart.

Tables: orders, refunds (as described above)
Include only completed orders
Parameters: :start_date, :end_date (end exclusive)
Columns: month, net_revenue

Expected output (example):

month        | net_revenue
2024-01-01   | 120000.00
2024-02-01   | 135500.00
...

Hints

Use DATE_TRUNC('month', order_date)
LEFT JOIN refunds on order_id; aggregate after the join
Filter orders by date and status before grouping

Show solution

SELECT
  DATE_TRUNC('month', o.order_date) AS month,
  SUM(o.total_amount) - COALESCE(SUM(r.refund_amount), 0) AS net_revenue
FROM orders o
LEFT JOIN refunds r ON r.order_id = o.order_id
WHERE o.status = 'completed'
  AND o.order_date >= :start_date
  AND o.order_date < :end_date
GROUP BY 1
ORDER BY 1;

Exercise 2: Top 5 Categories with Others

Goal: Produce a bar chart dataset of revenue by category for the last 90 days.

Tables: orders, order_items, products
Include completed orders only
Columns: category, revenue (with an Others row if needed)

Expected output (example):

category     | revenue
Electronics  | 250000.00
Home         | 210000.00
Apparel      | 175000.00
Beauty       | 120000.00
Toys         | 90000.00
Others       | 60000.00

Hints

SUM(quantity * price) for revenue
Use DENSE_RANK over revenue desc to find top 5
UNION ALL an Others row for the rest

Show solution

WITH cat_rev AS (
  SELECT p.category,
         SUM(oi.quantity * oi.price) AS revenue
  FROM order_items oi
  JOIN orders o ON o.order_id = oi.order_id AND o.status = 'completed'
  JOIN products p ON p.product_id = oi.product_id
  WHERE o.order_date >= CURRENT_DATE - INTERVAL '90 days'
  GROUP BY p.category
), ranked AS (
  SELECT category, revenue,
         DENSE_RANK() OVER (ORDER BY revenue DESC) AS rnk
  FROM cat_rev
), split AS (
  SELECT category, revenue FROM ranked WHERE rnk <= 5
  UNION ALL
  SELECT 'Others' AS category,
         (SELECT SUM(revenue) FROM ranked WHERE rnk > 5)
)
SELECT category, revenue
FROM split
WHERE revenue IS NOT NULL
ORDER BY revenue DESC;

Checklist before running:
- Does the query emit exactly the required columns?
- Is the grain correct for the target visual?
- Are filters parameterized and safe?
- Can you explain each join and why it won’t duplicate rows?

Mini challenge

Create a KPI dataset that returns three rows: metric, current_value, prior_value, delta for total revenue over :start_date to :end_date vs the previous equal-length period. Keep it fast and single-query if possible.

Practical projects

Build a daily active users (DAU) time series with weekday/weekly rollups
Design a sales dashboard dataset pack: KPIs, time series, top-N by product and region
Create a retention cohort dataset (monthly signup cohort vs active in month N)

Who this is for

Aspiring and junior Data Visualization Engineers
Data analysts building dashboard-ready datasets
Engineers connecting SQL outputs to BI tools

Prerequisites

Basic SQL: SELECT, WHERE, JOIN, GROUP BY, ORDER BY
Comfort reading simple schemas and understanding visual requirements

Learning path

Start: aggregate functions and grouping
Next: window functions and top-N patterns
Then: parameterization and performance basics
Finally: robust datasets for multi-visual dashboards

Next steps

Complete the exercises above
Take the Quick Test below to check understanding
Apply the patterns to one real dataset in your environment

Menu

Building Dataset Queries For Visuals

Table of Contents

Why this matters

Concept explained simply

Query patterns for common visuals

Worked examples

Example 1: Monthly net revenue time series

Example 2: Top 5 categories with Others

Example 3: Funnel counts

Design a dataset query in 6 steps

Performance tips that matter

Quality, filters, and parameters

Common mistakes and how to self-check

Exercises

Mini challenge

Practical projects

Who this is for

Prerequisites

Learning path

Next steps

Practice Exercises

Monthly Net Revenue Time Series

Instructions

Expected Output

Top 5 Categories with Others Bucket

Building Dataset Queries For Visuals — Quick Test

Have questions about Building Dataset Queries For Visuals?

AI Assistant