luvv to helpDiscover the Best Free Online Tools

SQL

Learn SQL for Data Analyst for free: roadmap, examples, subskills, and a skill exam.

Published: December 19, 2025 | Updated: December 19, 2025

Why SQL matters for Data Analysts

SQL is the primary way data analysts access, clean, and aggregate data stored in relational databases and data warehouses. Mastering SQL unlocks faster insights, reliable dashboards, ad-hoc analyses, and collaboration with engineers. You will query tables, join datasets, build metrics, audit data quality, and deliver trustworthy answers.

Who this is for

  • Aspiring data analysts needing a practical, project-ready SQL foundation.
  • Analysts who can write simple queries and want to advance to window functions, optimization, and modeling.
  • Professionals in adjacent roles (product, marketing, operations) who need reliable data pull and aggregation skills.

Prerequisites

  • Comfort with basic math and spreadsheets.
  • Familiarity with rows, columns, and CSVs.
  • Optional but helpful: a local or cloud SQL environment (any vendor is fine; syntax differences are small for core topics).

Learning path

Milestone 1: Read data reliably — selection, filtering, sorting, handling NULLs, and basic functions.
Milestone 2: Summarize data — GROUP BY, HAVING, CASE, date functions.
Milestone 3: Combine data — joins, subqueries, CTEs, set operations.
Milestone 4: Analytics power-ups — window functions, views, materialized views.
Milestone 5: Make it fast and safe — indexes, EXPLAIN/ANALYZE, transactions, permissions.
Milestone 6: Production readiness — imports/exports, backups, monitoring, partitioning, vendor-specific nuances.
Quick study plan (2–4 weeks)
  • Days 1–3: SELECT, WHERE, ORDER BY, NULLs, strings, dates.
  • Days 4–6: CASE, GROUP BY, HAVING, joins.
  • Days 7–9: Subqueries, CTEs, set operations; mini drills daily.
  • Days 10–12: Window functions; build a summary dashboard query.
  • Days 13–14: Indexes, EXPLAIN, import/export; finish a mini-project.

Worked examples

Assume an e-commerce schema:

customers(id, name, country)
orders(id, customer_id, order_date, status, total_amount)
order_items(order_id, product_id, quantity, unit_price)
products(id, name, category, price)
1) Basic filtering and sorting: last 7 days completed orders
SELECT id, customer_id, order_date, total_amount
FROM orders
WHERE status = 'completed'
  AND order_date >= CURRENT_DATE - INTERVAL '7 days'
ORDER BY order_date DESC;

Why it matters: fast ad-hoc pulls for recent performance.

2) Aggregate revenue by country with a threshold
SELECT c.country, SUM(o.total_amount) AS revenue
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'completed'
GROUP BY c.country
HAVING SUM(o.total_amount) >= 50000
ORDER BY revenue DESC;

Why it matters: focus on meaningful segments; HAVING filters groups.

3) Monthly cohorts with CASE and date functions
SELECT DATE_TRUNC('month', o.order_date) AS month,
       CASE WHEN o.total_amount >= 100 THEN 'AOV 100+' ELSE 'AOV < 100' END AS aov_bucket,
       COUNT(*) AS orders,
       SUM(o.total_amount) AS revenue
FROM orders o
WHERE o.status = 'completed'
GROUP BY 1, 2
ORDER BY 1, 2;

Why it matters: quick cohort and KPI breakdowns.

4) Top 3 products per category using window functions
WITH sales AS (
  SELECT oi.product_id,
         p.category,
         SUM(oi.quantity * oi.unit_price) AS revenue
  FROM order_items oi
  JOIN products p ON p.id = oi.product_id
  GROUP BY 1, 2
)
SELECT category, product_id, revenue
FROM (
  SELECT category, product_id, revenue,
         ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) AS rn
  FROM sales
) t
WHERE rn <= 3
ORDER BY category, revenue DESC;

Why it matters: rank within groups without subqueries for every category.

5) Detect potential duplicates with window functions
SELECT id, name, country,
       COUNT(*) OVER (PARTITION BY LOWER(TRIM(name)), country) AS dup_count
FROM customers
ORDER BY dup_count DESC, name;

Why it matters: data-quality checks before reporting.

Hands-on drills

  • Return the 20 most recent completed orders with customer names.
  • Create a CASE expression that buckets order totals into Low (<50), Medium (50–199.99), High (200+).
  • Find the top 5 countries by total revenue this quarter.
  • List products never ordered (anti-join or LEFT JOIN ... WHERE is NULL).
  • Use a CTE to compute daily revenue, then a second query to compute a 7-day moving average with window functions.
  • Compare UNION vs UNION ALL on two small derived tables; explain the difference in row counts.
  • Run EXPLAIN (or your dialect’s equivalent) on a slow join; add an index and compare plans.

Common mistakes

  • Filtering aggregated results with WHERE instead of HAVING.
  • Forgetting that COUNT(column) ignores NULLs while COUNT(*) counts rows.
  • Using INNER JOIN when you need to keep unmatched rows (use LEFT JOIN).
  • Grouping by a derived expression in SELECT without adding it to GROUP BY (or reusing positional references if your dialect supports it).
  • Assuming ORDER BY in subqueries is preserved; only the outermost ORDER BY guarantees ordering.
  • Confusing UNION (deduplicates) with UNION ALL (keeps duplicates).
  • Expecting indexes to help on expressions not covered by the index; consider functional or composite indexes where supported.

Debugging tips

  • Build queries incrementally: SELECT few columns → add WHERE → add JOINs → add GROUP BY → add HAVING → add windows.
  • Validate join cardinalities by counting rows before and after joins.
  • Check NULL behavior explicitly with COALESCE or IS NULL to avoid surprises.
  • Use EXPLAIN/EXPLAIN ANALYZE to see scans vs. index usage; watch for huge row estimates.
  • Simplify complex logic into CTEs; name each step by its intent.

Mini project: Product performance dashboard query

Goal: Produce a single query that powers a simple dashboard with KPIs by category and month.

Requirements
  • Inputs: orders, order_items, products.
  • Outputs per category and month: orders, units, revenue, average order value, top product (by revenue), 7-day moving average of revenue (computed at daily granularity then aggregated).
  • Use: CTEs, joins, CASE, GROUP BY, window functions.
Suggested approach
  1. CTE daily_sales: sum revenue per day and category.
  2. CTE moving_avg: compute 7-day moving average per category.
  3. Main select: aggregate by month and category; join the top product per category via a window function ranking.
Acceptance checklist
  • Uses at least two CTEs with clear names.
  • Includes one window function and one CASE expression.
  • Reproducible if tables refresh.

Subskills

  • Data Types and Casting — Choose correct column types and convert safely to avoid truncation and precision issues.
  • Filtering and Sorting (WHERE, ORDER BY) — Retrieve just the rows you need in the sequence you expect.
  • Null Handling (IS NULL, COALESCE, NULLIF) — Control missing values to keep metrics correct.
  • String Functions and Pattern Matching (LIKE, regex) — Clean, parse, and search text data.
  • Date and Time Functions — Work with truncation, extraction, and intervals for time-based analytics.
  • Conditional Logic (CASE WHEN THEN ELSE) — Create labeled segments and KPI buckets.
  • Aggregations (GROUP BY, HAVING) — Summarize data into meaningful metrics.
  • DML (INSERT, UPDATE, DELETE, MERGE) — Modify data carefully when needed.
  • Joins (INNER/LEFT/RIGHT/FULL/CROSS) — Combine tables correctly without duplications or losses.
  • Constraints and Keys (PK, FK, UNIQUE, CHECK) — Keep datasets consistent and trustworthy.
  • DDL (CREATE, ALTER, DROP) — Define and evolve schemas safely.
  • Normalization and Relational Modeling — Structure data for clarity and efficiency.
  • Set Operations (UNION, INTERSECT, EXCEPT) — Combine result sets precisely.
  • Subqueries (correlated/uncorrelated) — Express complex filters and comparisons.
  • CTEs (WITH) — Organize complex queries into readable steps.
  • Views and Materialized Views — Package logic for reuse and speed up reads.
  • Window Functions (OVER, PARTITION BY, ORDER BY) — Compute rankings, moving averages, and shares.
  • Indexes (B-tree, hash, composite, covering) — Accelerate lookups and joins.
  • Transactions and Isolation Levels (ACID, locks) — Keep writes safe and consistent.
  • Security and Permissions (GRANT, REVOKE, roles) — Enforce least-privilege access.
  • JSON and Semi-Structured Data — Query nested attributes when schemas vary.
  • Full Text Search Basics — Search human language fields effectively.
  • Query Optimization (EXPLAIN/ANALYZE) — Diagnose and fix slow queries.
  • Stored Procedures and Functions — Encapsulate logic close to the data.
  • Triggers — Automate actions on data changes; use sparingly.
  • Data Import and Export (COPY/LOAD/UNLOAD) — Move data reliably in and out.
  • Backup, Restore, and Migrations — Protect and evolve production data.
  • Database Monitoring and Maintenance (VACUUM/ANALYZE, stats) — Keep performance healthy.
  • Partitioning Basics — Scale large tables and improve pruning.
  • Sharding Basics — Distribute data across nodes; trade-offs apply.
  • Vendor Dialects (PostgreSQL, MySQL, SQL Server) — Recognize syntax differences and features.

Next steps

  • Rebuild one of your regular reports using CTEs and window functions to simplify logic.
  • Create a personal SQL notebook of patterns: date bucketing, top-N per group, dedup strategies.
  • Move on to complementary analyst skills: spreadsheets for quick checks, Python for advanced analysis, and visualization tools for storytelling.

SQL — Skill Exam

This exam checks practical SQL for data analysis: reading data, aggregations, joins, window functions, and performance basics. You can take it for free. Progress and results are saved for logged-in users; guests can still complete and see their score at the end.Rules: open-notes, no time limit, focus on accuracy and reasoning. For code items, write valid SQL in any common dialect; avoid vendor-only features unless stated.

12 questions70% to pass

Have questions about SQL?

AI Assistant

Ask questions about this tool