luvv to helpDiscover the Best Free Online Tools
Topic 7 of 8

Building Dataset Queries For Visuals

Learn Building Dataset Queries For Visuals for free with explanations, exercises, and a quick test (for Data Visualization Engineer).

Published: December 28, 2025 | Updated: December 28, 2025

Why this matters

As a Data Visualization Engineer, you turn raw tables into visuals people trust. Your SQL must deliver clean, visual-ready datasets that:

  • Match the visual grain (e.g., one row per month for a time series)
  • Aggregate correctly (SUM/COUNT/AVG with proper GROUP BY)
  • Filter and parametrize safely (date range, segments, top-N)
  • Perform well (pre-aggregation, minimal columns, appropriate joins)
Real tasks you’ll do
  • Build a monthly revenue time series with optional product/region filters
  • Create a bar chart of top categories with an “Others” group
  • Prepare a funnel dataset (visited → added_to_cart → purchased)
  • Produce KPI tiles with current value, target, and delta vs last period

Concept explained simply

Visuals consume tables. Your job is to produce a table where each row equals one visual mark. If the visual is a monthly line chart, each row should represent one month. If it’s a bar chart by category, each row is one category.

SQL building blocks:

  • SELECT: choose only the columns the visual needs
  • FROM/JOIN: bring together necessary tables
  • WHERE: filter to the relevant subset (use parameters for interactivity)
  • GROUP BY: set row grain (what each row represents)
  • ORDER BY/LIMIT: control visual order and top-N logic
Mental model: “One row per mark”

Pick the mark grain first: time bucket, category, segment, or a single KPI. Then ensure the query emits exactly one row per mark. Everything else (joins, filters, aggregations) must serve that grain.

Query patterns for common visuals

  • Time series: DATE_TRUNC on a timestamp, GROUP BY the trunc, aggregate measures
  • Bar chart (top-N): aggregate by category, rank with window functions, limit or bucket into Others
  • KPIs: aggregate to a single row; optionally union with prior period for delta
  • Funnel: compute counts at each step on the same population; left joins to preserve earlier steps
  • Tables: minimal transformations, clear column names, stable sorting
Visual-friendly dataset checklist
  • Row grain matches the visual
  • Column names are readable and final (no cryptic aliases)
  • Only needed columns are returned
  • Filters are parameterized and safe
  • Aggregations are correct and non-duplicative
  • Performance: pre-aggregate where possible

Worked examples

Assume a simple commerce schema:

  • orders(order_id, order_date, customer_id, status, total_amount)
  • order_items(order_id, product_id, quantity, price)
  • products(product_id, category)
  • refunds(refund_id, order_id, refund_amount, refund_date)

Example 1: Monthly net revenue time series

SELECT
  DATE_TRUNC('month', o.order_date) AS month,
  SUM(o.total_amount) - COALESCE(SUM(r.refund_amount), 0) AS net_revenue
FROM orders o
LEFT JOIN refunds r
  ON r.order_id = o.order_id
WHERE o.status = 'completed'
  AND o.order_date >= :start_date
  AND o.order_date < :end_date
GROUP BY 1
ORDER BY 1;

Notes: DATE_TRUNC sets the grain; LEFT JOIN keeps months with zero refunds; parameters bound in your BI tool.

Example 2: Top 5 categories with Others

WITH cat_rev AS (
  SELECT p.category,
         SUM(oi.quantity * oi.price) AS revenue
  FROM order_items oi
  JOIN orders o ON o.order_id = oi.order_id AND o.status = 'completed'
  JOIN products p ON p.product_id = oi.product_id
  WHERE o.order_date >= CURRENT_DATE - INTERVAL '90 days'
  GROUP BY p.category
), ranked AS (
  SELECT category, revenue,
         DENSE_RANK() OVER (ORDER BY revenue DESC) AS rnk
  FROM cat_rev
), split AS (
  SELECT category, revenue FROM ranked WHERE rnk <= 5
  UNION ALL
  SELECT 'Others' AS category,
         (SELECT SUM(revenue) FROM ranked WHERE rnk > 5) AS revenue
)
SELECT category, revenue
FROM split
WHERE revenue IS NOT NULL
ORDER BY revenue DESC;

Notes: Window function ranks; a UNION builds the Others bucket.

Example 3: Funnel counts

Assume events(user_id, event_name, event_time). Steps: visited_site, add_to_cart, purchase.

WITH base AS (
  SELECT DISTINCT user_id
  FROM events
  WHERE event_time BETWEEN :start_date AND :end_date
), s1 AS (
  SELECT DISTINCT user_id FROM events
  WHERE event_name = 'visited_site' AND event_time BETWEEN :start_date AND :end_date
), s2 AS (
  SELECT DISTINCT user_id FROM events
  WHERE event_name = 'add_to_cart' AND event_time BETWEEN :start_date AND :end_date
), s3 AS (
  SELECT DISTINCT user_id FROM events
  WHERE event_name = 'purchase' AND event_time BETWEEN :start_date AND :end_date
)
SELECT 'visited_site' AS step, COUNT(b.user_id) AS users
FROM base b LEFT JOIN s1 ON b.user_id = s1.user_id
WHERE s1.user_id IS NOT NULL
UNION ALL
SELECT 'add_to_cart', COUNT(b.user_id)
FROM base b LEFT JOIN s2 ON b.user_id = s2.user_id
WHERE s2.user_id IS NOT NULL
UNION ALL
SELECT 'purchase', COUNT(b.user_id)
FROM base b LEFT JOIN s3 ON b.user_id = s3.user_id;

Notes: The funnel is on the same population window; steps are counted distinctly.

Design a dataset query in 6 steps

  1. Define the visual grain (time bucket, category, segment, or single KPI).
  2. List required fields: dimensions, measures, filters, sort order.
  3. Sketch joins and filters on paper (avoid accidental fan-out).
  4. Write SELECT with clear aliases and minimal columns.
  5. Group and aggregate to the grain; test with small date ranges.
  6. Optimize: predicates on indexed columns, pre-aggregation, limit rows.

Performance tips that matter

  • Pre-aggregate for dashboards; avoid aggregating millions of rows per view
  • Filter early by date/status; avoid functions wrapping indexed columns
  • Return only needed columns; avoid SELECT *
  • Use appropriate time buckets; don’t over-granularize
  • Consider materialized views for heavy reusable datasets

Quality, filters, and parameters

  • Use explicit WHERE for data window; set defaults for :start_date/:end_date
  • Handle NULLs with COALESCE where appropriate
  • Name columns user-friendly: month, category, revenue
  • Validate against known totals before publishing

Common mistakes and how to self-check

  • Fan-out joins causing inflated sums
    • Self-check: sum at detail level, then aggregate; compare to expected totals
  • Mismatched grain for the visual
    • Self-check: one row per mark? If not, adjust GROUP BY
  • Forgetting filters (status, canceled orders)
    • Self-check: explicit WHERE status filters, reviewed with domain owners
  • Top-N without deterministic ordering
    • Self-check: add tie-breakers and document them
  • Returning too many columns
    • Self-check: remove unused columns; keep the dataset lean

Exercises

Do these to practice. The Quick Test at the bottom is available to everyone; logged-in learners will have progress saved automatically.

Exercise 1: Monthly Net Revenue Time Series

Goal: Build a monthly net revenue dataset for a line chart.

  • Tables: orders, refunds (as described above)
  • Include only completed orders
  • Parameters: :start_date, :end_date (end exclusive)
  • Columns: month, net_revenue

Expected output (example):

month        | net_revenue
2024-01-01   | 120000.00
2024-02-01   | 135500.00
...
Hints
  • Use DATE_TRUNC('month', order_date)
  • LEFT JOIN refunds on order_id; aggregate after the join
  • Filter orders by date and status before grouping
Show solution
SELECT
  DATE_TRUNC('month', o.order_date) AS month,
  SUM(o.total_amount) - COALESCE(SUM(r.refund_amount), 0) AS net_revenue
FROM orders o
LEFT JOIN refunds r ON r.order_id = o.order_id
WHERE o.status = 'completed'
  AND o.order_date >= :start_date
  AND o.order_date < :end_date
GROUP BY 1
ORDER BY 1;
Exercise 2: Top 5 Categories with Others

Goal: Produce a bar chart dataset of revenue by category for the last 90 days.

  • Tables: orders, order_items, products
  • Include completed orders only
  • Columns: category, revenue (with an Others row if needed)

Expected output (example):

category     | revenue
Electronics  | 250000.00
Home         | 210000.00
Apparel      | 175000.00
Beauty       | 120000.00
Toys         | 90000.00
Others       | 60000.00
Hints
  • SUM(quantity * price) for revenue
  • Use DENSE_RANK over revenue desc to find top 5
  • UNION ALL an Others row for the rest
Show solution
WITH cat_rev AS (
  SELECT p.category,
         SUM(oi.quantity * oi.price) AS revenue
  FROM order_items oi
  JOIN orders o ON o.order_id = oi.order_id AND o.status = 'completed'
  JOIN products p ON p.product_id = oi.product_id
  WHERE o.order_date >= CURRENT_DATE - INTERVAL '90 days'
  GROUP BY p.category
), ranked AS (
  SELECT category, revenue,
         DENSE_RANK() OVER (ORDER BY revenue DESC) AS rnk
  FROM cat_rev
), split AS (
  SELECT category, revenue FROM ranked WHERE rnk <= 5
  UNION ALL
  SELECT 'Others' AS category,
         (SELECT SUM(revenue) FROM ranked WHERE rnk > 5)
)
SELECT category, revenue
FROM split
WHERE revenue IS NOT NULL
ORDER BY revenue DESC;
  • Checklist before running:
    • Does the query emit exactly the required columns?
    • Is the grain correct for the target visual?
    • Are filters parameterized and safe?
    • Can you explain each join and why it won’t duplicate rows?

Mini challenge

Create a KPI dataset that returns three rows: metric, current_value, prior_value, delta for total revenue over :start_date to :end_date vs the previous equal-length period. Keep it fast and single-query if possible.

Practical projects

  • Build a daily active users (DAU) time series with weekday/weekly rollups
  • Design a sales dashboard dataset pack: KPIs, time series, top-N by product and region
  • Create a retention cohort dataset (monthly signup cohort vs active in month N)

Who this is for

  • Aspiring and junior Data Visualization Engineers
  • Data analysts building dashboard-ready datasets
  • Engineers connecting SQL outputs to BI tools

Prerequisites

  • Basic SQL: SELECT, WHERE, JOIN, GROUP BY, ORDER BY
  • Comfort reading simple schemas and understanding visual requirements

Learning path

  • Start: aggregate functions and grouping
  • Next: window functions and top-N patterns
  • Then: parameterization and performance basics
  • Finally: robust datasets for multi-visual dashboards

Next steps

  • Complete the exercises above
  • Take the Quick Test below to check understanding
  • Apply the patterns to one real dataset in your environment

Practice Exercises

2 exercises to complete

Instructions

Build a dataset that returns month and net_revenue for completed orders between :start_date and :end_date (end exclusive). Net revenue = SUM(total_amount) - SUM(refund_amount). Use DATE_TRUNC('month', order_date). Return only the two columns, ordered by month ascending.
Expected Output
month | net_revenue 2024-01-01 | 120000.00 2024-02-01 | 135500.00 ...

Building Dataset Queries For Visuals — Quick Test

Test your knowledge with 6 questions. Pass with 70% or higher.

6 questions70% to pass

Have questions about Building Dataset Queries For Visuals?

AI Assistant

Ask questions about this tool