Why this matters
One possible approach
-- Parameters: :start_date (7 days ago)
WITH clicks_7d AS (
SELECT url, revenue
FROM clicks
WHERE ts >= :start_date -- enables partition pruning and index range scan
)
SELECT url, SUM(revenue) AS rev
FROM clicks_7d
GROUP BY url
ORDER BY rev DESC
LIMIT 10;
-- Index/partition notes: partition by day on ts; index on (ts, url) if needed.
Who this is for
- Data Scientists who query large analytics tables.
- Data Analysts building dashboards and reports on big data.
- ML Engineers assembling training datasets from SQL sources.
Prerequisites
- Comfort with SELECT, WHERE, JOIN, GROUP BY, ORDER BY.
- Basic understanding of indexes and query plans (EXPLAIN).
- Familiarity with parameters (e.g., :start_date) in notebooks or apps.
Learning path
- Read plans with EXPLAIN; identify scans, sorts, and cardinalities.
- Practice SARGable filters and removing functions from WHERE clauses.
- Adopt keyset pagination for ordered feeds.
- Pre-aggregate before joining; confirm reduced row counts with EXPLAIN.
- Experiment with composite indexes on common filters and join keys.
- Try partition pruning on date-ranged queries.
Practical projects
- Build a fast KPI query pack (daily revenue, active users, conversion rate) with explain plans documented.
- Refactor a dashboard’s slowest query using pre-aggregation and keyset pagination where applicable.
- Create a benchmarking notebook that compares naive vs optimized queries on synthetic 100M-row tables.
Next steps
- Learn partition strategies and maintenance (rolling windows, vacuum/analyze where applicable).
- Explore approximate algorithms for exploration (sampling, HyperLogLog where available).
- Automate query regression checks: compare plans, timings, and result equality after refactors.
Quick Test
Take the quick test to lock in the concepts. Everyone can take it; logged-in users will have their progress saved.