Why this matters
Readable SQL saves time, prevents bugs, and helps teams share insights quickly. As a Business Analyst, you will:
- Explain how a number was calculated to non-technical stakeholders.
- Hand off queries to engineers or analysts for automation.
- Debug and update older queries under time pressure.
- Review others' work and ensure consistency across dashboards and reports.
Concept explained simply
Readable queries are easy to scan, explain, and change. They follow a consistent structure, use clear names, and separate logic into steps. Think of a query as a short story: who (tables), what (columns), how (joins, filters), and summary (aggregations).
Mental model
Use the "Story Flow" mental model:
- Start with a clear purpose comment.
- Collect data in small steps (CTEs/subqueries) with descriptive names.
- Make each step do one job (filter, join, aggregate).
- Name outputs so a teammate understands them without opening the query.
Golden rules checklist
- One clause per line. Align keywords (SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY).
- Avoid SELECT *. List only needed columns.
- Use clear aliases (e.g., customer_total_amount, not amt).
- Prefer Common Table Expressions (CTEs) to break complex logic.
- Comment why, not what. The code shows what; your note explains intent.
- Never rely on GROUP BY ordinals (1,2). Group by column names.
- Consistent casing: keywords upper-case, identifiers lower_snake_case (or your team standard).
Formatting patterns (with quick references)
Indentation style
SELECT
o.order_id,
o.customer_id,
SUM(o.amount) AS total_amount
FROM orders AS o
WHERE o.status = 'paid'
GROUP BY o.order_id, o.customer_id
ORDER BY total_amount DESC;
Keep 2–4 spaces indentation. Line up columns under SELECT. One clause per line.
Naming and aliases
- Tables: short aliases that are memorable (orders o, customers c).
- Columns: descriptive result names (monthly_revenue, is_repeat_buyer).
CTEs vs. nested subqueries
WITH paid_orders AS (
SELECT *
FROM orders
WHERE status = 'paid'
)
SELECT COUNT(*) AS paid_count
FROM paid_orders;
CTEs improve readability and make each step testable on its own.
Comments that help
-- Purpose: Weekly revenue by channel for performance review
-- Note: Excludes refunds to match Finance policy
WITH paid AS (
SELECT order_id, channel, amount, created_at
FROM orders
WHERE status = 'paid'
)
SELECT channel, DATE_TRUNC('week', created_at) AS week_start, SUM(amount) AS revenue
FROM paid
GROUP BY channel, DATE_TRUNC('week', created_at);
Comment why choices were made (exclusions, business rules, caveats).
Worked examples
Example 1: Clean up a sales summary query
Messy:
select c.id, c.name, sum(o.amount) amt from orders o join customers c on c.id=o.customer_id where o.status='paid' and o.created_at >= current_date-30 group by 1,2 order by 3 desc;
Readable:
-- Purpose: Top customers by paid amount in last 30 days
WITH recent_paid_orders AS (
SELECT order_id, customer_id, amount, created_at
FROM orders
WHERE status = 'paid'
AND created_at >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT
c.id AS customer_id,
c.name AS customer_name,
SUM(rpo.amount) AS total_amount
FROM recent_paid_orders AS rpo
JOIN customers AS c
ON c.id = rpo.customer_id
GROUP BY c.id, c.name
ORDER BY total_amount DESC;
Example 2: Split complex logic into CTE steps
-- Purpose: Monthly revenue (paid only), including month label
WITH paid AS (
SELECT order_id, amount, created_at
FROM orders
WHERE status = 'paid'
),
monthly AS (
SELECT DATE_TRUNC('month', created_at) AS month_start, SUM(amount) AS revenue
FROM paid
GROUP BY DATE_TRUNC('month', created_at)
)
SELECT month_start, revenue
FROM monthly
ORDER BY month_start;
Each CTE does one thing: filter paid orders, then aggregate monthly.
Example 3: Clear CASE expressions and labels
-- Purpose: Classify customers by spend level
SELECT
customer_id,
CASE
WHEN total_spend >= 1000 THEN 'gold'
WHEN total_spend >= 300 THEN 'silver'
ELSE 'bronze'
END AS spend_tier
FROM customer_spend;
Short, ordered conditions with readable labels.
Step-by-step: make any query readable
- Write a purpose comment (one line describing the outcome).
- Upper-case SQL keywords; use consistent lower_snake_case for identifiers.
- Break into steps (CTEs) if more than ~10 lines or mixed responsibilities.
- Indent 2–4 spaces; one clause per line.
- Rename outputs with descriptive aliases.
- Remove SELECT * and list only needed columns.
- Group by names, not ordinals; align ORDER BY with selected aliases.
Exercises
These match the exercises below so your work is auto-checkable. Tip: Run each CTE on its own during drafting.
- Exercise 1: Reformat and clarify a messy query (aliases, grouping, date math).
- Exercise 2: Convert a nested subquery into CTEs with purpose comments.
Checklist before you compare with solutions
- I added a purpose comment describing business intent.
- No SELECT *. Only necessary columns are listed.
- Aliases and output names are descriptive.
- GROUP BY and ORDER BY use names, not ordinals.
- Clauses are on separate lines with consistent indentation.
Common mistakes and self-check
- Using ordinals in GROUP BY (e.g., GROUP BY 1,2). Self-check: replace with column names; run again to ensure identical results.
- Over-nesting instead of CTEs. Self-check: can you name each step? If not, split it.
- Cryptic aliases like amt or t1. Self-check: will a new teammate understand in 30 seconds?
- Commenting what the code does rather than why. Self-check: remove a comment; if nothing is lost, it was redundant.
- Inconsistent date filters. Self-check: standardize INTERVAL usage and document timezone assumptions.
Practical projects
- Cleanup sprint: pick 5 legacy queries and standardize formatting, aliases, and comments. Measure diff in lines changed.
- Team style guide draft: create a one-page SQL formatting guide with examples used in your org.
- Snippet library: build reusable CTE templates (date spine, paid_orders, active_users) for faster, consistent analysis.
Mini challenge
Improve the snippet below for readability without changing results:
select date_trunc('week',created_at) wk, count(distinct user_id) du
from events where event_name='login' and created_at>now()-interval '90 days'
group by 1 order by 1;
Show a possible improvement
-- Purpose: Weekly distinct logins in last 90 days
WITH recent_logins AS (
SELECT user_id, created_at
FROM events
WHERE event_name = 'login'
AND created_at > NOW() - INTERVAL '90 days'
)
SELECT
DATE_TRUNC('week', created_at) AS week_start,
COUNT(DISTINCT user_id) AS distinct_users
FROM recent_logins
GROUP BY DATE_TRUNC('week', created_at)
ORDER BY week_start;
Who this is for
- Business Analysts who write and share SQL for reporting and ad-hoc insights.
- Analysts transitioning from spreadsheets to SQL-based analytics.
Prerequisites
- Basic SQL: SELECT, WHERE, JOIN, GROUP BY, ORDER BY.
- Comfort reading simple queries and running them against a database.
Learning path
- Learn SQL basics (selection, filtering, joins).
- Practice formatting and naming conventions.
- Introduce CTEs to break complex logic.
- Adopt a team style guide and review others' queries for consistency.
Next steps
- Refactor 2–3 of your recent queries using the checklist.
- Create a personal snippet for date filters and paid order filters.
- Take the Quick Test below to confirm mastery.
Note on progress saving
The quick test is available to everyone. If you are logged in, your progress will be saved.
Ready for the Quick Test?
When you feel confident, open the Quick Test to check your understanding.