How to learn Indexes B Tree Hash Composite Index Covering Index for SQL in Data Analyst for free

Why this matters

As a Data Analyst, you often run queries on large tables: finding recent orders, filtering by customer segments, building dashboards, and preparing A/B reports. The difference between a 50 ms query and a 50 s query is usually the right index. Knowing when to use B-tree, Hash, Composite, and Covering indexes helps you:

Speed up filters, joins, and ORDER BY/LIMIT queries.
Cut compute costs by avoiding full table scans.
Make dashboards responsive and ad-hoc analysis snappy.

Concept explained simply

An index is a compact structure that lets the database quickly find rows without scanning the whole table—like a book index. Different index types excel at different query patterns:

B-tree index: The default in many systems. Great for equality (=) and range (>, <, BETWEEN), and helps with ORDER BY on the indexed columns.
Hash index: Optimized for equality lookups only (e.g., WHERE token = ...). Not for range, sorting, or prefix searches. Availability and behavior vary by database engine.
Composite index: An index on multiple columns (e.g., (customer_id, order_date)). The leftmost (leading) column matters for how the index is used.
Covering index: An index that contains all columns needed by a query, so the database can answer from the index alone (index-only scan) without touching the table. In some systems you can add non-key columns to cover a query (e.g., INCLUDE).

Mental model

Seek vs Scan: Without an index, the database scans every row. With the right index, it seeks directly to matching entries, reads fewer pages, and returns results faster.
Selectivity: Indexes are most useful when a predicate filters to a small subset of rows (high selectivity).
Leading column rule: For composite indexes, the order of columns matters. The index is most effective when your WHERE and ORDER BY use the leftmost columns.
Trade-offs: Indexes accelerate reads but slow down writes (INSERT/UPDATE/DELETE) and use storage. Add only those that deliver real benefit.

Worked examples

Example 1 — B-tree index for range + sort

Task: Show the latest 20 orders for a given customer quickly.

-- Query
SELECT order_id, order_date, total_amount
FROM orders
WHERE customer_id = 123
  AND order_date >= DATE '2024-01-01'
ORDER BY order_date DESC
LIMIT 20;

-- Index (compliant with many RDBMS; DESC may be optional depending on engine)
CREATE INDEX idx_orders_customer_date ON orders (customer_id, order_date DESC);

Why this works: The composite B-tree lets the engine find the subset for customer_id, then read the date range in order, making ORDER BY + LIMIT efficient.

Example 2 — Hash index for equality-only lookups

Task: Fetch a user session by exact token.

-- Query
SELECT user_id, last_seen
FROM sessions
WHERE session_token = 'abc123';

-- Option A (engines that support hash indexes, e.g., PostgreSQL):
CREATE INDEX idx_sessions_token_hash ON sessions USING HASH (session_token);

-- Option B (portable fallback):
CREATE INDEX idx_sessions_token ON sessions (session_token);

Use a Hash index only for equality comparisons and if your engine benefits from it. Otherwise, a B-tree on the column is a safe choice.

Example 3 — Covering index for index-only scans

Task: A dashboard lists order_id and total_amount for a customer’s last 50 orders by date.

-- Query
SELECT order_id, total_amount
FROM orders
WHERE customer_id = 123
ORDER BY order_date DESC
LIMIT 50;

-- PostgreSQL (cover with INCLUDE)
CREATE INDEX idx_orders_cover ON orders (customer_id, order_date DESC) INCLUDE (total_amount);

-- MySQL/InnoDB (no INCLUDE keyword)
-- If the index has (customer_id, order_date, total_amount), the query may be covered
CREATE INDEX idx_orders_cover ON orders (customer_id, order_date, total_amount);

-- SQL Server (include non-key)
CREATE INDEX idx_orders_cover ON dbo.orders (customer_id, order_date DESC) INCLUDE (total_amount);

When fully covered, the engine can avoid visiting the base table, reducing I/O and latency.

Example 4 — Composite index order matters

Task: Two queries on events(user_id, occurred_at, event_type):

-- Q1: filter by user, then time range
SELECT * FROM events
WHERE user_id = 7 AND occurred_at BETWEEN NOW() - INTERVAL '7 days' AND NOW();

-- Q2: filter by time range only
SELECT count(*) FROM events
WHERE occurred_at >= NOW() - INTERVAL '1 day';

-- Index choice
CREATE INDEX idx_events_user_time ON events (user_id, occurred_at);

This index is great for Q1 (matches leading user_id then uses occurred_at). For Q2 (time-only filter), you may also want an index on occurred_at alone if it’s a frequent query.

How to choose the right index (quick steps)

1. Map the query pattern. Identify WHERE columns, join keys, and ORDER BY.

2. Pick index type. Range/sorting → B-tree. Equality-only hot lookups on a single column (and supported engine) → consider Hash.

3. Set column order. For composite indexes, put the most selective and most frequently filtered column first; preserve useful sort order if needed.

4. Consider covering. Add included columns (or append in engines without INCLUDE) so the index alone answers the query.

5. Validate. Use EXPLAIN/EXPLAIN ANALYZE to confirm index usage and estimate I/O reduction.

Exercises you can do now

These mirror the graded exercises below. Run on a realistic dataset if possible.

Composite + covering. Speed up: SELECT order_id, total_amount FROM sales WHERE customer_id = ? AND order_date BETWEEN ? AND ? ORDER BY order_date DESC LIMIT 50; Create a composite B-tree and make it covering.
Equality-heavy lookup. Speed up: SELECT user_id FROM api_keys WHERE api_key = ?; Try a Hash index if supported; otherwise a B-tree. Compare EXPLAIN before/after.

Checklist: Identify WHERE columns and sort order.
Checklist: Choose B-tree or Hash thoughtfully.
Checklist: Order composite columns by how the query filters/sorts.
Checklist: Cover the query to avoid table lookups when it helps.
Checklist: Verify with EXPLAIN and row/page reads.

Common mistakes and how to self-check

Indexing low-selectivity columns alone (e.g., status with 95% of rows the same). Self-check: Does the predicate filter to a small fraction?
Wrong composite order. Self-check: Do WHERE clauses use the leftmost columns? Does the ORDER BY align with the index order?
Overlapping/duplicate indexes. Self-check: Can one composite index serve multiple queries? Remove redundant ones.
Expecting Hash to help ranges or sorting. Self-check: Does the query use only equality? If not, prefer B-tree.
Ignoring write cost. Self-check: Are inserts/updates slowing down after adding indexes?
Not validating with EXPLAIN. Self-check: Confirm Index Seek/Index Scan and row estimates match expectations.

Self-check mini task

Pick 2 of your slowest queries. For each, list WHERE columns, joins, ORDER BY. Propose one index. Use EXPLAIN to confirm usage. If not used, ask: wrong order? low selectivity? already covered by another index?

Practical projects

Dashboard turbocharge: Take a BI dashboard with 3 slow widgets. Add/adjust indexes so all load under 500 ms. Document each index, expected benefit, and EXPLAIN before/after.
Query coverage audit: For a reporting schema, list top 10 queries and map them to existing indexes. Remove two redundant indexes safely after verifying no regressions.
A/B analysis accelerator: Create composite indexes that speed up filtering by experiment_id and event timestamps across two large tables; show join improvements.

Mini challenge

You have a query: SELECT order_id, order_date FROM orders WHERE store_id = 77 AND order_date >= CURRENT_DATE - INTERVAL '30 days' ORDER BY order_date DESC LIMIT 100;. Propose one index that helps both filtering and sorting, and say whether it could be covering if the SELECT also needs total_amount.

Who this is for

Data Analysts who run frequent filters/aggregations on large tables.
Anyone tuning SQL for dashboards, reports, or ad-hoc analysis.

Prerequisites

Comfort with SELECT, WHERE, JOIN, GROUP BY, ORDER BY.
Know how to run EXPLAIN/EXPLAIN ANALYZE in your database.

Learning path

Review slow queries and their predicates.
Learn B-tree basics and the leading column rule.
Practice composite indexes with ORDER BY + LIMIT.
Explore covering indexes and measure index-only scans.
Compare equality-only patterns with Hash vs B-tree.
Clean up redundant indexes and measure write overhead.

Next steps

Do the exercises below and verify with EXPLAIN.
Take the quick test to check your understanding. Note: The quick test is available to everyone; only logged-in users get saved progress.
Apply indexes on a real dataset and record latency improvements.

Menu

Indexes B Tree Hash Composite Index Covering Index

Table of Contents

Why this matters

Concept explained simply

Mental model

Worked examples

How to choose the right index (quick steps)

Exercises you can do now

Common mistakes and how to self-check

Practical projects

Mini challenge

Who this is for

Prerequisites

Learning path

Next steps

Practice Exercises

Composite + Covering index for a time-ordered customer query

Instructions

Expected Output

Equality-only lookup: Hash vs B-tree

Indexes B Tree Hash Composite Index Covering Index — Quick Test

Have questions about Indexes B Tree Hash Composite Index Covering Index?

AI Assistant