luvv to helpDiscover the Best Free Online Tools
Topic 2 of 13

Summary Tables

Learn Summary Tables for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 19, 2025 | Updated: December 19, 2025

Why this matters

Summary tables turn raw rows into a clear picture. As a Data Analyst, you will use them to size problems, spot anomalies, and guide stakeholders quickly. Typical tasks include:

  • Profiling a new dataset (counts, missing rates, distributions).
  • Measuring performance by segment (e.g., signups by channel, revenue by region).
  • Monitoring week-over-week changes and flagging spikes or drops.
  • Preparing inputs for dashboards and deeper modeling.

Concept explained simply

A summary table groups rows and calculates metrics (counts, sums, averages, percentages). It can be flat (grouped table) or pivoted (values spread across columns).

Mental model

Think of each row as a small item and your summary table as shelves in a store:

  • Group By = which shelf you place items on (e.g., by country).
  • Aggregations = what you read on the shelf label (count, total, average).
  • Pivot = turning some labels into columns (e.g., one column per device type).
Core building blocks
  • Grouping keys: 1–3 categorical columns (e.g., date, region, channel).
  • Aggregations: count, distinct count, sum, mean/median, min/max, percent.
  • Pivot/crosstab: put a grouping key on columns to compare categories side-by-side.
  • Binning: bucket numeric values (e.g., ages 0–17, 18–24, etc.).
  • Missingness: add a column to count or rate nulls.

Worked examples

Example 1 — E-commerce orders by device and month

Goal: See order volume and average order value (AOV) by month, split by device.

SQL sketch:
SELECT DATE_TRUNC('month', order_date) AS month,
       device,
       COUNT(*) AS orders,
       ROUND(AVG(order_value), 2) AS aov
FROM orders
GROUP BY 1, 2
ORDER BY 1, 2;

Interpretation: Compare month-to-month growth and whether mobile AOV is consistently lower than desktop.

Example 2 — Support tickets by severity with row and column percentages

Goal: Understand distribution by severity and team.

Steps:
1) Build ticket_count = COUNT(*) by team, severity.
2) Row % = ticket_count / SUM(ticket_count) OVER (PARTITION BY team).
3) Col % = ticket_count / SUM(ticket_count) OVER (PARTITION BY severity).

Interpretation: Teams with high row % in "Critical" need help; column % highlights which team dominates a severity type.

Example 3 — Churn rate by customer tenure buckets

Goal: Identify at-risk tenure cohorts.

Tenure bucket rules: 0-1m, 1-3m, 3-6m, 6-12m, 12m+.
For each bucket: churn_rate = churned_customers / total_customers.

Interpretation: Spikes early in tenure may signal onboarding gaps.

How to build a summary table (step-by-step)

  1. Define the question: What decision should this table inform?
  2. Pick grouping keys: Time, segment, product lines, etc.
  3. Select metrics: Count, distinct count, sums, averages, medians, percentages.
  4. Handle data quality: Treat missing values and outliers; decide on filters.
  5. Construct table: Group, aggregate, optionally pivot.
  6. Validate: Totals match? Percentages sum to ~100%? Sample raw rows.
  7. Annotate: Clear metric names and time context (e.g., last 30 days).

Quick recipes

SQL grouped table
SELECT channel,
       COUNT(*) AS signups,
       COUNT(DISTINCT user_id) AS unique_users,
       ROUND(AVG(session_duration), 1) AS avg_secs
FROM sessions
WHERE event_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY channel
ORDER BY signups DESC;
Pandas grouped table
import pandas as pd
agg = (df
  .query("event_date >= @cutoff")
  .groupby("channel")
  .agg(signups=("user_id", "count"),
       unique_users=("user_id", pd.Series.nunique),
       avg_secs=("session_duration", "mean"))
  .reset_index())
Excel quick pivot
  • Insert > PivotTable > Rows: channel; Values: Count of user_id, Average of session_duration.
  • Add Slicers for date ranges if needed.

Quality checklist

  • Totals reconcile with raw counts.
  • Percentages sum to ~100% (allow for rounding).
  • Distinct counts used where appropriate (e.g., users vs. events).
  • Clear time window (e.g., last 7 days vs. all-time).
  • Outliers and missing values handled intentionally.
  • Column naming is unambiguous (e.g., aov_usd_30d).

Common mistakes and how to self-check

  • Using mean on skewed data: Prefer median or trimmed mean for heavy tails. Self-check: Compare mean vs. median; if far apart, report both.
  • Mixing events and users: Use COUNT distinct for users. Self-check: sanity-check ratios (events per user).
  • Wrong denominator in percentages: Define row% vs. col% upfront. Self-check: Sum across the intended direction and verify ~100%.
  • Hidden filters: Document all filters. Self-check: Re-run without filters and compare deltas.
  • Over-pivoting: Too many columns hurt readability. Self-check: Keep 2–6 columns per pivot dimension, else regroup or bin.

Who this is for

  • Early-career analysts who want fast, reliable EDA.
  • Professionals switching to analytics from ops, marketing, or finance.
  • Anyone preparing data for dashboards or experiments.

Prerequisites

  • Basic comfort with spreadsheets or SQL.
  • Knowing data types (numeric vs. categorical) and nulls.
  • Basic understanding of averages and percentages.

Learning path

  1. Practice basic grouped tables (count, distinct, sum).
  2. Add central tendency and dispersion (median, IQR).
  3. Introduce percentages (row, column, overall).
  4. Pivot and bin for comparisons.
  5. Automate with your main tool (SQL views, Pivot caches, pandas scripts).

Exercises

These mirror the exercises below. Try them before opening the solutions.

Exercise 1 — Orders by country with AOV and missing rate

Dataset (CSV snippet):

order_id,order_value,country
1,120,US
2,80,US
3, ,CA
4,50,CA
5,200,US
6,110,MX
7, ,US
8,75,CA

Task: Create a summary table by country with: orders (count of rows), orders_with_value (non-null), missing_value_rate, total_value (sum), aov (average of non-null).

Exercise 2 — Pivot: Signups by week and channel with row and column %

Dataset (CSV snippet):

week_start,channel,signups
2025-01-06,Organic,140
2025-01-06,Paid,220
2025-01-06,Referral,40
2025-01-13,Organic,160
2025-01-13,Paid,180
2025-01-13,Referral,60

Task: Create a pivot with Rows=week_start, Columns=channel, Values=sum(signups). Add row% per week and col% per channel across weeks.

  • Check that your totals reconcile with raw data.
  • Confirm your missing_value_rate = 1 - (non_null / total) per group.
  • Ensure row% sums to 100% across channels; col% sums to 100% down weeks.

Practical projects

  • Acquisition funnel table: visits, signups, activations by channel with conversion rates.
  • Revenue table: monthly revenue by product line with MoM deltas and median order value.
  • Quality table: defects by plant and shift with Pareto 80/20 highlighting.

Mini challenge

Pick a dataset you use weekly. Build one flat summary and one pivot that answers a question your team keeps asking. Add a quality checklist and one insight in one sentence. Share with a teammate for feedback.

Next steps

  • Automate your summary as a saved query, pivot template, or script.
  • Add clear definitions and time windows to make it re-usable.
  • Move to the Quick Test below to check your understanding.

Note: The quick test is available to everyone; log in to save your progress.

Practice Exercises

2 exercises to complete

Instructions

Dataset (CSV):

order_id,order_value,country
1,120,US
2,80,US
3, ,CA
4,50,CA
5,200,US
6,110,MX
7, ,US
8,75,CA

Build a grouped summary by country with these columns:

  • orders: COUNT of rows
  • orders_with_value: COUNT of non-null order_value
  • missing_value_rate: 1 - orders_with_value / orders
  • total_value: SUM(order_value) over non-null rows
  • aov: AVG(order_value) over non-null rows
Expected Output
A table with one row per country (US, CA, MX) including orders, orders_with_value, missing_value_rate, total_value, aov. US should have the highest total and a non-zero missing_value_rate.

Summary Tables — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Summary Tables?

AI Assistant

Ask questions about this tool