How to learn Indexing Concepts for Database Design And Access in Backend Engineer for free

Who this is for

Backend Engineers who write queries, design schemas, and need fast, predictable performance in production APIs, jobs, and analytics read paths.

Prerequisites

Basic SQL (SELECT, WHERE, ORDER BY, JOIN)
Understanding of primary keys and foreign keys
Familiarity with your database’s query plan viewer (e.g., EXPLAIN)

Why this matters

Indexes are the difference between millisecond responses and timeouts. In real work you will:

Speed up API endpoints that filter by email, status, or date ranges.
Scale background jobs that scan large tables (orders, events, logs).
Cut cloud costs by reducing full-table scans and CPU spikes.
Keep SLAs during peak traffic while supporting new features.

Concept explained simply

An index is a data structure that helps the database find rows faster, like a book’s index. Instead of scanning every row, the DB jumps to the right area and narrows down quickly.

Mental model

B-tree index: a sorted map of keys to row locations. Good for equality, ranges, and sorting.
Composite index (a, b, c): effectively sorted by a, then b, then c. The leftmost prefix rule applies.
Every index speeds up reads but adds write cost (inserts/updates/deletes must maintain it).

Index types at a glance (when to use)

B-tree: default. Equality, ranges, ordering.
Hash: equality only (engine-specific usefulness).
GIN: membership/search for arrays, JSON, full-text.
GiST: geometric, ranges, nearest-neighbor.
BRIN: large, append-only tables with natural ordering (time-series).

Core concepts that guide index design

Selectivity and cardinality: High selectivity (few rows match) makes indexes more effective.
Composite indexes and leftmost prefix: (a, b, c) can use a; or a and b; it cannot start on b without a.
Order support: B-tree can satisfy ORDER BY if it matches index order and filters.
Covering index: If the index holds all needed columns, the DB can avoid visiting the table (index-only scan).
Write overhead: Each additional index slows writes and increases storage; only add what you need.
Partial index: Index just the hot subset (e.g., WHERE deleted_at IS NULL).
Unique index: Enforces uniqueness and speeds up lookups.
Functional/expressions index: Index on lower(email) or date(order_date) if queries use that expression.
Nulls behavior: Some engines treat NULLs specially; check how your DB handles them in indexes.

Worked examples

1) User lookup by email (equality)

-- Query
SELECT id FROM users WHERE email = 'a@b.com';
-- Index
CREATE UNIQUE INDEX idx_users_email ON users(email);

Why: High selectivity; unique guarantee; speeds logins and account lookups.

2) Orders by customer with status and recent date (filter + sort)

-- Query
SELECT *
FROM orders
WHERE customer_id = $1 AND status = 'paid' AND order_date >= now() - interval '30 days'
ORDER BY order_date DESC
LIMIT 50;

-- Candidate composite index (PostgreSQL-like)
CREATE INDEX idx_orders_customer_status_date
  ON orders(customer_id, status, order_date DESC);

Why: Matches leftmost prefix (customer_id, status) and supports the DESC ordering on order_date to avoid a sort.

3) Search by tags membership (arrays/JSON)

-- Query (PostgreSQL-style arrays)
SELECT id FROM tickets WHERE tags @> ARRAY['urgent'];

-- Index (GIN for membership)
CREATE INDEX idx_tickets_tags_gin ON tickets USING GIN(tags);

Why: GIN accelerates contains/overlap on arrays and JSON paths.

4) Partial index on active rows (soft delete)

-- Query
SELECT id FROM posts WHERE deleted_at IS NULL AND slug = $1;

-- Partial unique index only for active rows
CREATE UNIQUE INDEX idx_posts_active_slug
  ON posts(slug)
  WHERE deleted_at IS NULL;

Why: Keeps index small and enforces uniqueness only among active posts.

How to choose an index (practical steps)

Identify top queries: High-traffic endpoints and slow queries.
Extract patterns: Predicates in WHERE, JOIN keys, ORDER BY, LIMIT.
Estimate selectivity: Prefer indexing columns that sharply reduce rows.
Pick index type: B-tree for most; GIN/GiST/BRIN for special cases.
Choose column order: Most selective and consistently used predicates first; then sort columns.
Cover if needed: Add included columns for index-only scans when beneficial.
Validate: Use your DB’s EXPLAIN/plan viewer; confirm index is chosen.
Monitor: Track latency, CPU, and index bloat; prune unused indexes.

Quick self-estimation trick

Ask: Does this index let the DB jump directly to a tiny fraction of rows and return them already sorted? If yes, it’s likely a win.

Exercises

Try these. Solutions are hidden under each exercise. The quick test is available to everyone; log in to save your progress.

Exercise 1 — Feed page for published posts

Query pattern:

SELECT id, title, published_at
FROM posts
WHERE published = true AND category_id = $1
ORDER BY published_at DESC
LIMIT 20;

Design an index to make this fast without an extra sort.
Assume many categories, and only ~10% of posts are published.

Show solution

CREATE INDEX idx_posts_feed
  ON posts(category_id, published, published_at DESC);

-- Why
-- 1) Filters on category_id and published are satisfied by the leftmost prefix.
-- 2) DESC order on published_at matches the index to avoid a sort.
-- 3) Consider a partial index if queries always use published = true:
--    CREATE INDEX idx_posts_feed_active
--      ON posts(category_id, published_at DESC)
--      WHERE published = true;

Exercise 2 — Top purchasers last 7 days

Query pattern (PostgreSQL-like):

SELECT user_id, count(*) AS purchases
FROM events
WHERE type = 'purchase' AND created_at >= now() - interval '7 days'
GROUP BY user_id
ORDER BY purchases DESC
LIMIT 50;

Propose an index that narrows scanning and helps an index-only scan.
Table is append-only and large.

Show solution

-- Focus on filtering first: type and recent created_at.
-- Cover user_id to avoid table visits when possible.

CREATE INDEX idx_events_purchase_recent
  ON events(type, created_at DESC)
  INCLUDE (user_id);

-- Optional: make it partial to shrink the index (if type filter is always used):
-- CREATE INDEX idx_events_purchase_recent_partial
--   ON events(created_at DESC)
--   INCLUDE (user_id)
--   WHERE type = 'purchase';

-- Notes
-- - The dynamic 7-day boundary still benefits from the created_at ordering.
-- - INCLUDE (user_id) supports index-only scans for counting per user.
-- - For very large time-series tables, consider BRIN on created_at in addition to the above.

[Checklist] Does your index match the WHERE predicates’ leftmost prefix?
[Checklist] If you ORDER BY, does index order align to avoid a sort?
[Checklist] Could a partial index reduce size without breaking queries?
[Checklist] Can the query be satisfied by an index-only scan?
[Checklist] Are you okay with the extra write/storage cost?

Common mistakes and how to self-check

Mistake: Indexing low-selectivity columns alone (e.g., boolean). Fix: Combine with a selective column or use partial index.
Mistake: Wrong column order in composite indexes. Fix: Put the most selective and frequently filtered first, then sort keys.
Mistake: Expecting ORDER BY to use index when WHERE doesn’t match leftmost prefix. Fix: Align predicates with index order.
Mistake: Too many indexes slowing writes. Fix: Remove unused ones; measure write latency and size.
Mistake: Function on column breaks index usage (WHERE lower(email)=...). Fix: Create functional index aligned with the expression.

Self-check method

Run EXPLAIN (or your DB’s plan tool) before/after creating the index.
Confirm index scan is used and sorting is removed.
Measure latency over realistic data volumes.

Practical projects

Audit your top 5 slow queries. For each, propose one index, create it in a staging database, and compare plans and timings.
Implement a partial index for soft-deleted rows and measure index size difference.
Add a GIN index for a JSON or array filter and compare performance with/without it.

Learning path

Before this: Basic SQL and schema design.
This subskill: Choosing and validating indexes for common workloads.
Next: Query optimization (joins, window functions), transactions and locking, and database observability.

Next steps

Run EXPLAIN on two of your queries and verify whether existing indexes are used.
Pick one partial or covering index to implement in a safe environment.
Take the quick test below to confirm understanding. Anyone can take it; log in to save progress.

Mini challenge

Your analytics API runs this query:

SELECT order_id
FROM line_items
WHERE product_id = $1 AND created_at BETWEEN $2 AND $3
ORDER BY created_at ASC
LIMIT 200;

Propose an index that avoids a sort and minimizes scanned rows.
How would your answer change if most products appear in >50% of rows?

See a possible approach

CREATE INDEX idx_line_items_product_time
  ON line_items(product_id, created_at ASC);

-- If product_id is not selective, consider additional filters or BRIN on created_at
-- (for very large, time-ordered data), plus caching at the application layer.

Menu

Indexing Concepts

Table of Contents