Skill Not Found

Why SQL matters for Data Analysts

SQL is the primary way data analysts access, clean, and aggregate data stored in relational databases and data warehouses. Mastering SQL unlocks faster insights, reliable dashboards, ad-hoc analyses, and collaboration with engineers. You will query tables, join datasets, build metrics, audit data quality, and deliver trustworthy answers.

Who this is for

Aspiring data analysts needing a practical, project-ready SQL foundation.
Analysts who can write simple queries and want to advance to window functions, optimization, and modeling.
Professionals in adjacent roles (product, marketing, operations) who need reliable data pull and aggregation skills.

Prerequisites

Comfort with basic math and spreadsheets.
Familiarity with rows, columns, and CSVs.
Optional but helpful: a local or cloud SQL environment (any vendor is fine; syntax differences are small for core topics).

Learning path

Milestone 1: Read data reliably — selection, filtering, sorting, handling NULLs, and basic functions.

Milestone 2: Summarize data — GROUP BY, HAVING, CASE, date functions.

Milestone 3: Combine data — joins, subqueries, CTEs, set operations.

Milestone 4: Analytics power-ups — window functions, views, materialized views.

Milestone 5: Make it fast and safe — indexes, EXPLAIN/ANALYZE, transactions, permissions.

Milestone 6: Production readiness — imports/exports, backups, monitoring, partitioning, vendor-specific nuances.

Quick study plan (2–4 weeks)

Days 1–3: SELECT, WHERE, ORDER BY, NULLs, strings, dates.
Days 4–6: CASE, GROUP BY, HAVING, joins.
Days 7–9: Subqueries, CTEs, set operations; mini drills daily.
Days 10–12: Window functions; build a summary dashboard query.
Days 13–14: Indexes, EXPLAIN, import/export; finish a mini-project.

Worked examples

Assume an e-commerce schema:

customers(id, name, country)
orders(id, customer_id, order_date, status, total_amount)
order_items(order_id, product_id, quantity, unit_price)
products(id, name, category, price)

1) Basic filtering and sorting: last 7 days completed orders

SELECT id, customer_id, order_date, total_amount
FROM orders
WHERE status = 'completed'
  AND order_date >= CURRENT_DATE - INTERVAL '7 days'
ORDER BY order_date DESC;

Why it matters: fast ad-hoc pulls for recent performance.

2) Aggregate revenue by country with a threshold

SELECT c.country, SUM(o.total_amount) AS revenue
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'completed'
GROUP BY c.country
HAVING SUM(o.total_amount) >= 50000
ORDER BY revenue DESC;

Why it matters: focus on meaningful segments; HAVING filters groups.

3) Monthly cohorts with CASE and date functions

SELECT DATE_TRUNC('month', o.order_date) AS month,
       CASE WHEN o.total_amount >= 100 THEN 'AOV 100+' ELSE 'AOV < 100' END AS aov_bucket,
       COUNT(*) AS orders,
       SUM(o.total_amount) AS revenue
FROM orders o
WHERE o.status = 'completed'
GROUP BY 1, 2
ORDER BY 1, 2;

Why it matters: quick cohort and KPI breakdowns.

4) Top 3 products per category using window functions

WITH sales AS (
  SELECT oi.product_id,
         p.category,
         SUM(oi.quantity * oi.unit_price) AS revenue
  FROM order_items oi
  JOIN products p ON p.id = oi.product_id
  GROUP BY 1, 2
)
SELECT category, product_id, revenue
FROM (
  SELECT category, product_id, revenue,
         ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) AS rn
  FROM sales
) t
WHERE rn <= 3
ORDER BY category, revenue DESC;

Why it matters: rank within groups without subqueries for every category.

5) Detect potential duplicates with window functions

SELECT id, name, country,
       COUNT(*) OVER (PARTITION BY LOWER(TRIM(name)), country) AS dup_count
FROM customers
ORDER BY dup_count DESC, name;

Why it matters: data-quality checks before reporting.

Hands-on drills

Return the 20 most recent completed orders with customer names.
Create a CASE expression that buckets order totals into Low (<50), Medium (50–199.99), High (200+).
Find the top 5 countries by total revenue this quarter.
List products never ordered (anti-join or LEFT JOIN ... WHERE is NULL).
Use a CTE to compute daily revenue, then a second query to compute a 7-day moving average with window functions.
Compare UNION vs UNION ALL on two small derived tables; explain the difference in row counts.
Run EXPLAIN (or your dialect’s equivalent) on a slow join; add an index and compare plans.

Common mistakes

Filtering aggregated results with WHERE instead of HAVING.
Forgetting that COUNT(column) ignores NULLs while COUNT(*) counts rows.
Using INNER JOIN when you need to keep unmatched rows (use LEFT JOIN).
Grouping by a derived expression in SELECT without adding it to GROUP BY (or reusing positional references if your dialect supports it).
Assuming ORDER BY in subqueries is preserved; only the outermost ORDER BY guarantees ordering.
Confusing UNION (deduplicates) with UNION ALL (keeps duplicates).
Expecting indexes to help on expressions not covered by the index; consider functional or composite indexes where supported.

Debugging tips

Build queries incrementally: SELECT few columns → add WHERE → add JOINs → add GROUP BY → add HAVING → add windows.
Validate join cardinalities by counting rows before and after joins.
Check NULL behavior explicitly with COALESCE or IS NULL to avoid surprises.
Use EXPLAIN/EXPLAIN ANALYZE to see scans vs. index usage; watch for huge row estimates.
Simplify complex logic into CTEs; name each step by its intent.

Mini project: Product performance dashboard query

Goal: Produce a single query that powers a simple dashboard with KPIs by category and month.

Requirements

Inputs: orders, order_items, products.
Outputs per category and month: orders, units, revenue, average order value, top product (by revenue), 7-day moving average of revenue (computed at daily granularity then aggregated).
Use: CTEs, joins, CASE, GROUP BY, window functions.

Suggested approach

CTE daily_sales: sum revenue per day and category.
CTE moving_avg: compute 7-day moving average per category.
Main select: aggregate by month and category; join the top product per category via a window function ranking.

Acceptance checklist

Uses at least two CTEs with clear names.
Includes one window function and one CASE expression.
Reproducible if tables refresh.

Subskills

Data Types and Casting — Choose correct column types and convert safely to avoid truncation and precision issues.
Filtering and Sorting (WHERE, ORDER BY) — Retrieve just the rows you need in the sequence you expect.
Null Handling (IS NULL, COALESCE, NULLIF) — Control missing values to keep metrics correct.
String Functions and Pattern Matching (LIKE, regex) — Clean, parse, and search text data.
Date and Time Functions — Work with truncation, extraction, and intervals for time-based analytics.
Conditional Logic (CASE WHEN THEN ELSE) — Create labeled segments and KPI buckets.
Aggregations (GROUP BY, HAVING) — Summarize data into meaningful metrics.
DML (INSERT, UPDATE, DELETE, MERGE) — Modify data carefully when needed.
Joins (INNER/LEFT/RIGHT/FULL/CROSS) — Combine tables correctly without duplications or losses.
Constraints and Keys (PK, FK, UNIQUE, CHECK) — Keep datasets consistent and trustworthy.
DDL (CREATE, ALTER, DROP) — Define and evolve schemas safely.
Normalization and Relational Modeling — Structure data for clarity and efficiency.
Set Operations (UNION, INTERSECT, EXCEPT) — Combine result sets precisely.
Subqueries (correlated/uncorrelated) — Express complex filters and comparisons.
CTEs (WITH) — Organize complex queries into readable steps.
Views and Materialized Views — Package logic for reuse and speed up reads.
Window Functions (OVER, PARTITION BY, ORDER BY) — Compute rankings, moving averages, and shares.
Indexes (B-tree, hash, composite, covering) — Accelerate lookups and joins.
Transactions and Isolation Levels (ACID, locks) — Keep writes safe and consistent.
Security and Permissions (GRANT, REVOKE, roles) — Enforce least-privilege access.
JSON and Semi-Structured Data — Query nested attributes when schemas vary.
Full Text Search Basics — Search human language fields effectively.
Query Optimization (EXPLAIN/ANALYZE) — Diagnose and fix slow queries.
Stored Procedures and Functions — Encapsulate logic close to the data.
Triggers — Automate actions on data changes; use sparingly.
Data Import and Export (COPY/LOAD/UNLOAD) — Move data reliably in and out.
Backup, Restore, and Migrations — Protect and evolve production data.
Database Monitoring and Maintenance (VACUUM/ANALYZE, stats) — Keep performance healthy.
Partitioning Basics — Scale large tables and improve pruning.
Sharding Basics — Distribute data across nodes; trade-offs apply.
Vendor Dialects (PostgreSQL, MySQL, SQL Server) — Recognize syntax differences and features.

Next steps

Rebuild one of your regular reports using CTEs and window functions to simplify logic.
Create a personal SQL notebook of patterns: date bucketing, top-N per group, dedup strategies.
Move on to complementary analyst skills: spreadsheets for quick checks, Python for advanced analysis, and visualization tools for storytelling.

Menu

SQL

Table of Contents

Why SQL matters for Data Analysts

Who this is for

Prerequisites

Learning path

Worked examples

Hands-on drills

Common mistakes

Debugging tips

Mini project: Product performance dashboard query

Subskills

Next steps

SQL — Skill Exam

Topics

Data Types and Casting

Null Handling Is Null Coalesce Nullif

Filtering and Sorting Where Order By

String Functions and Pattern Matching Like Regex

Date and Time Functions

Conditional Logic Case When Then Else

Aggregations Group By Having

DML Insert Update Delete Merge

Joins Inner Join Left Join Right Join Full Join Cross Join

Constraints and Keys Primary Key Foreign Key Unique Check

DDL Create Alter Drop

Normalization and Relational Modeling

Transactions and Isolation Levels Acid Locks

Security and Permissions Grant Revoke Roles

Set Operations Union Intersect Except

Subqueries Correlated Subqueries Uncorrelated Subqueries

CTEs With

Views and Materialized Views

Window Functions Over Partition By Order By

Indexes B Tree Hash Composite Index Covering Index

Json and Semi Structured Data

Full Text Search Basics

Query Optimization Explain Analyze Execution Plans

Stored Procedures and Functions

Triggers

Data Import and Export Copy Load Unload

Backup Restore and Migrations

Database Monitoring and Maintenance Vacuum Analyze Statistics

Partitioning Basics

Sharding Basics

Vendor Dialects Postgresql Mysql Sql Server Difference

Have questions about SQL?

AI Assistant