Menu

Topic 7 of 8

Indexing Concepts

Learn Indexing Concepts for free with explanations, exercises, and a quick test (for Backend Engineer).

Published: January 20, 2026 | Updated: January 20, 2026

Who this is for

Backend Engineers who write queries, design schemas, and need fast, predictable performance in production APIs, jobs, and analytics read paths.

Prerequisites

  • Basic SQL (SELECT, WHERE, ORDER BY, JOIN)
  • Understanding of primary keys and foreign keys
  • Familiarity with your database’s query plan viewer (e.g., EXPLAIN)

Why this matters

Indexes are the difference between millisecond responses and timeouts. In real work you will:

  • Speed up API endpoints that filter by email, status, or date ranges.
  • Scale background jobs that scan large tables (orders, events, logs).
  • Cut cloud costs by reducing full-table scans and CPU spikes.
  • Keep SLAs during peak traffic while supporting new features.

Concept explained simply

An index is a data structure that helps the database find rows faster, like a book’s index. Instead of scanning every row, the DB jumps to the right area and narrows down quickly.

Mental model

  • B-tree index: a sorted map of keys to row locations. Good for equality, ranges, and sorting.
  • Composite index (a, b, c): effectively sorted by a, then b, then c. The leftmost prefix rule applies.
  • Every index speeds up reads but adds write cost (inserts/updates/deletes must maintain it).
Index types at a glance (when to use)
  • B-tree: default. Equality, ranges, ordering.
  • Hash: equality only (engine-specific usefulness).
  • GIN: membership/search for arrays, JSON, full-text.
  • GiST: geometric, ranges, nearest-neighbor.
  • BRIN: large, append-only tables with natural ordering (time-series).

Core concepts that guide index design

  • Selectivity and cardinality: High selectivity (few rows match) makes indexes more effective.
  • Composite indexes and leftmost prefix: (a, b, c) can use a; or a and b; it cannot start on b without a.
  • Order support: B-tree can satisfy ORDER BY if it matches index order and filters.
  • Covering index: If the index holds all needed columns, the DB can avoid visiting the table (index-only scan).
  • Write overhead: Each additional index slows writes and increases storage; only add what you need.
  • Partial index: Index just the hot subset (e.g., WHERE deleted_at IS NULL).
  • Unique index: Enforces uniqueness and speeds up lookups.
  • Functional/expressions index: Index on lower(email) or date(order_date) if queries use that expression.
  • Nulls behavior: Some engines treat NULLs specially; check how your DB handles them in indexes.

Worked examples

1) User lookup by email (equality)

-- Query
SELECT id FROM users WHERE email = 'a@b.com';
-- Index
CREATE UNIQUE INDEX idx_users_email ON users(email);

Why: High selectivity; unique guarantee; speeds logins and account lookups.

2) Orders by customer with status and recent date (filter + sort)

-- Query
SELECT *
FROM orders
WHERE customer_id = $1 AND status = 'paid' AND order_date >= now() - interval '30 days'
ORDER BY order_date DESC
LIMIT 50;

-- Candidate composite index (PostgreSQL-like)
CREATE INDEX idx_orders_customer_status_date
  ON orders(customer_id, status, order_date DESC);

Why: Matches leftmost prefix (customer_id, status) and supports the DESC ordering on order_date to avoid a sort.

3) Search by tags membership (arrays/JSON)

-- Query (PostgreSQL-style arrays)
SELECT id FROM tickets WHERE tags @> ARRAY['urgent'];

-- Index (GIN for membership)
CREATE INDEX idx_tickets_tags_gin ON tickets USING GIN(tags);

Why: GIN accelerates contains/overlap on arrays and JSON paths.

4) Partial index on active rows (soft delete)

-- Query
SELECT id FROM posts WHERE deleted_at IS NULL AND slug = $1;

-- Partial unique index only for active rows
CREATE UNIQUE INDEX idx_posts_active_slug
  ON posts(slug)
  WHERE deleted_at IS NULL;

Why: Keeps index small and enforces uniqueness only among active posts.

How to choose an index (practical steps)

  1. Identify top queries: High-traffic endpoints and slow queries.
  2. Extract patterns: Predicates in WHERE, JOIN keys, ORDER BY, LIMIT.
  3. Estimate selectivity: Prefer indexing columns that sharply reduce rows.
  4. Pick index type: B-tree for most; GIN/GiST/BRIN for special cases.
  5. Choose column order: Most selective and consistently used predicates first; then sort columns.
  6. Cover if needed: Add included columns for index-only scans when beneficial.
  7. Validate: Use your DB’s EXPLAIN/plan viewer; confirm index is chosen.
  8. Monitor: Track latency, CPU, and index bloat; prune unused indexes.
Quick self-estimation trick

Ask: Does this index let the DB jump directly to a tiny fraction of rows and return them already sorted? If yes, it’s likely a win.

Exercises

Try these. Solutions are hidden under each exercise. The quick test is available to everyone; log in to save your progress.

Exercise 1 — Feed page for published posts

Query pattern:

SELECT id, title, published_at
FROM posts
WHERE published = true AND category_id = $1
ORDER BY published_at DESC
LIMIT 20;
  • Design an index to make this fast without an extra sort.
  • Assume many categories, and only ~10% of posts are published.
Show solution
CREATE INDEX idx_posts_feed
  ON posts(category_id, published, published_at DESC);

-- Why
-- 1) Filters on category_id and published are satisfied by the leftmost prefix.
-- 2) DESC order on published_at matches the index to avoid a sort.
-- 3) Consider a partial index if queries always use published = true:
--    CREATE INDEX idx_posts_feed_active
--      ON posts(category_id, published_at DESC)
--      WHERE published = true;

Exercise 2 — Top purchasers last 7 days

Query pattern (PostgreSQL-like):

SELECT user_id, count(*) AS purchases
FROM events
WHERE type = 'purchase' AND created_at >= now() - interval '7 days'
GROUP BY user_id
ORDER BY purchases DESC
LIMIT 50;
  • Propose an index that narrows scanning and helps an index-only scan.
  • Table is append-only and large.
Show solution
-- Focus on filtering first: type and recent created_at.
-- Cover user_id to avoid table visits when possible.

CREATE INDEX idx_events_purchase_recent
  ON events(type, created_at DESC)
  INCLUDE (user_id);

-- Optional: make it partial to shrink the index (if type filter is always used):
-- CREATE INDEX idx_events_purchase_recent_partial
--   ON events(created_at DESC)
--   INCLUDE (user_id)
--   WHERE type = 'purchase';

-- Notes
-- - The dynamic 7-day boundary still benefits from the created_at ordering.
-- - INCLUDE (user_id) supports index-only scans for counting per user.
-- - For very large time-series tables, consider BRIN on created_at in addition to the above.
  • [Checklist] Does your index match the WHERE predicates’ leftmost prefix?
  • [Checklist] If you ORDER BY, does index order align to avoid a sort?
  • [Checklist] Could a partial index reduce size without breaking queries?
  • [Checklist] Can the query be satisfied by an index-only scan?
  • [Checklist] Are you okay with the extra write/storage cost?

Common mistakes and how to self-check

  • Mistake: Indexing low-selectivity columns alone (e.g., boolean). Fix: Combine with a selective column or use partial index.
  • Mistake: Wrong column order in composite indexes. Fix: Put the most selective and frequently filtered first, then sort keys.
  • Mistake: Expecting ORDER BY to use index when WHERE doesn’t match leftmost prefix. Fix: Align predicates with index order.
  • Mistake: Too many indexes slowing writes. Fix: Remove unused ones; measure write latency and size.
  • Mistake: Function on column breaks index usage (WHERE lower(email)=...). Fix: Create functional index aligned with the expression.
Self-check method
  1. Run EXPLAIN (or your DB’s plan tool) before/after creating the index.
  2. Confirm index scan is used and sorting is removed.
  3. Measure latency over realistic data volumes.

Practical projects

  • Audit your top 5 slow queries. For each, propose one index, create it in a staging database, and compare plans and timings.
  • Implement a partial index for soft-deleted rows and measure index size difference.
  • Add a GIN index for a JSON or array filter and compare performance with/without it.

Learning path

  • Before this: Basic SQL and schema design.
  • This subskill: Choosing and validating indexes for common workloads.
  • Next: Query optimization (joins, window functions), transactions and locking, and database observability.

Next steps

  • Run EXPLAIN on two of your queries and verify whether existing indexes are used.
  • Pick one partial or covering index to implement in a safe environment.
  • Take the quick test below to confirm understanding. Anyone can take it; log in to save progress.

Mini challenge

Your analytics API runs this query:

SELECT order_id
FROM line_items
WHERE product_id = $1 AND created_at BETWEEN $2 AND $3
ORDER BY created_at ASC
LIMIT 200;
  • Propose an index that avoids a sort and minimizes scanned rows.
  • How would your answer change if most products appear in >50% of rows?
See a possible approach
CREATE INDEX idx_line_items_product_time
  ON line_items(product_id, created_at ASC);

-- If product_id is not selective, consider additional filters or BRIN on created_at
-- (for very large, time-ordered data), plus caching at the application layer.

Practice Exercises

2 exercises to complete

Instructions

Make the following query fast without extra sorting:

SELECT id, title, published_at
FROM posts
WHERE published = true AND category_id = $1
ORDER BY published_at DESC
LIMIT 20;
  • Propose an index that matches filters and ordering.
  • Assume many categories; only ~10% of posts are published.
Expected Output
An index on (category_id, published, published_at DESC) or a partial index for published=true with (category_id, published_at DESC).

Indexing Concepts — Quick Test

Test your knowledge with 9 questions. Pass with 70% or higher.

9 questions70% to pass

Have questions about Indexing Concepts?

AI Assistant

Ask questions about this tool