How to learn Database Design And Access for Backend Engineer for free

Why this matters for a Backend Engineer

Reliable applications start with reliable data. Database Design and Access covers how you model entities, query efficiently, keep data consistent, and evolve schemas safely. Mastering this lets you ship features faster, handle growth, and avoid costly outages.

Translate product requirements into relational models that are easy to query and change.
Write SQL that is correct, predictable, and fast under load.
Use transactions, isolation, indexing, connection pooling, and caching to meet performance and consistency goals.

Who this is for

Backend engineers building APIs, services, or jobs that read/write relational databases.
Developers moving from scripts to production-grade services.
Anyone preparing for backend interviews or on-call duties.

Prerequisites

Comfort with one backend language (e.g., Java, Go, Python, Node.js) and basic I/O.
Basic command-line skills.
Familiarity with fundamental SQL SELECT/INSERT/UPDATE/DELETE.

Learning path

Modeling: Entities, relationships, normalization, keys.
Querying: Joins, filtering, aggregations, pagination.
Correctness: Transactions, isolation, consistency patterns.
Performance & Ops: Indexes, pooling, migrations, caching.

Practical roadmap with milestones

Model core entities
- Sketch ER diagrams for 3–5 core entities and their relationships.
- Pick primary keys and choose when to use surrogate vs natural keys.
Write baseline queries
- Implement common joins and aggregations used by your API endpoints.
- Add safe pagination with keyset or cursor-based approaches for large datasets.
Enforce correctness
- Wrap multi-step writes in transactions. Pick isolation that prevents your specific anomalies.
- Add constraints (NOT NULL, UNIQUE, FOREIGN KEY, CHECK) to shift errors left.
Speed it up
- Create targeted indexes from actual WHERE/ORDER BY clauses.
- Introduce a connection pool with sane limits. Add read-through cache for hot reads.
Evolve safely
- Adopt a migration workflow (up/down or forwards-only).
- Roll out schema changes in backward-compatible steps.

Key concept refreshers

Normalization: Reduce duplication (1NF, 2NF, 3NF) to prevent update anomalies. Denormalize only for measured performance gains.
ACID: Atomicity, Consistency, Isolation, Durability. Choose isolation per use case.
Index types: B-tree for equality/range; hash for equality (engine-dependent); partial and composite indexes for targeted speedups.

Worked examples (with code)

1) Relational modeling: posts and comments

-- Entities: users, posts, comments
CREATE TABLE users (
  id BIGSERIAL PRIMARY KEY,
  email TEXT NOT NULL UNIQUE,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE posts (
  id BIGSERIAL PRIMARY KEY,
  user_id BIGINT NOT NULL REFERENCES users(id),
  title TEXT NOT NULL,
  body TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE comments (
  id BIGSERIAL PRIMARY KEY,
  post_id BIGINT NOT NULL REFERENCES posts(id),
  user_id BIGINT NOT NULL REFERENCES users(id),
  body TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  CONSTRAINT comment_per_post_body CHECK (length(body) > 0)
);

Rationale: Foreign keys maintain referential integrity; CHECK prevents empty comments. Add indexes where queries need them (e.g., comments by post).

2) Query with join and aggregation

-- Top 5 posts by comment count in the last 7 days
SELECT p.id, p.title, COUNT(c.id) AS comment_count
FROM posts p
LEFT JOIN comments c ON c.post_id = p.id AND c.created_at >= now() - interval '7 days'
GROUP BY p.id
ORDER BY comment_count DESC
LIMIT 5;

-- Helpful index:
-- CREATE INDEX idx_comments_post_recent ON comments(post_id, created_at);

Why this index?

The query filters on created_at and groups by post_id; a composite (post_id, created_at) supports both join and range filter efficiently.

3) Safe money transfer with transactions

BEGIN;
  -- lock both rows in a consistent order to reduce deadlocks
  SELECT balance FROM accounts WHERE id = 1 FOR UPDATE;
  SELECT balance FROM accounts WHERE id = 2 FOR UPDATE;

  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;

  -- ensure invariant
  DO $$ BEGIN
    IF (SELECT balance FROM accounts WHERE id = 1) < 0 THEN
      RAISE EXCEPTION 'Insufficient funds';
    END IF;
  END $$;
COMMIT;

Isolation choice

READ COMMITTED usually suffices. Use REPEATABLE READ or SERIALIZABLE when concurrent reads must see a stable snapshot or to prevent write skew.

4) Indexing for pagination

-- Endpoint uses: WHERE user_id = $1 AND created_at >= $since
-- ORDER BY created_at DESC LIMIT 20
-- Best index (PostgreSQL):
CREATE INDEX idx_events_user_created_at ON events(user_id, created_at DESC);

-- Keyset pagination example
SELECT * FROM events
WHERE user_id = $1
  AND created_at < $last_seen_created_at
ORDER BY created_at DESC
LIMIT 20;

Why not a separate index per column?

The planner can combine indexes, but a composite index aligned to the query predicate and order is typically faster and smaller.

5) Connection pooling in Node.js (pg)

const { Pool } = require('pg');
const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20,             // concurrent connections
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000
});

async function getUser(id) {
  const client = await pool.connect();
  try {
    const { rows } = await client.query('SELECT id, email FROM users WHERE id = $1', [id]);
    return rows[0];
  } finally {
    client.release();
  }
}

Why pooling?

Creating a new DB connection per request is slow and exhausts database limits. A pool reuses connections, stabilizing latency and throughput.

6) Zero-downtime migration: add NOT NULL column with default

-- Step 1: add nullable column
ALTER TABLE orders ADD COLUMN status TEXT;

-- Step 2: backfill in batches
UPDATE orders SET status = 'pending' WHERE status IS NULL AND created_at < now() - interval '1 day' LIMIT 10000;
-- repeat batch job until all rows filled

-- Step 3: set default for new rows
ALTER TABLE orders ALTER COLUMN status SET DEFAULT 'pending';

-- Step 4: add constraint
ALTER TABLE orders ALTER COLUMN status SET NOT NULL;

Why in steps?

Large table rewrites or immediate NOT NULL with default can lock tables and spike load. Stepwise changes keep the app running.

7) Cache-aside read with TTL

// Pseudocode
function getUserProfile(userId) {
  const key = `user:profile:${userId}`;
  const cached = redis.get(key);
  if (cached) return JSON.parse(cached);

  const row = db.query('SELECT id, email, name FROM users WHERE id = $1', [userId]);
  redis.setex(key, 300, JSON.stringify(row)); // 5-minute TTL
  return row;
}

Consistency note

On writes, update the DB first, then invalidate the cache key. For critical reads, consider a short TTL or a versioned key to reduce stale data windows.

Drills and exercises

Draw an ER diagram for an e-commerce subset: users, products, orders, order_items, payments. Mark primary and foreign keys.
Write a query to find the top 10 customers by revenue in the last 30 days. Add the minimal index to make it fast.
Implement keyset pagination for a feed endpoint using a created_at cursor.
Wrap a multi-step write (create order, add items, decrement stock) in a transaction with error handling.
Create two composite indexes and measure impact with EXPLAIN ANALYZE on a realistic dataset.
Perform a safe migration that adds a non-nullable column to a large table without blocking writes.
Add a connection pool to your service and set limits that prevent database overload.
Implement cache-aside for a hot read and verify correctness after updates by invalidating keys.

Common mistakes and debugging tips

1) N+1 queries in list endpoints

Symptom: Many small queries per item. Fix: Fetch needed rows with a single JOIN or batched IN query; then map results in memory.

2) Missing or wrong indexes

Symptom: Slow queries and high CPU. Fix: Inspect WHERE and ORDER BY; create composite indexes aligned to them; drop unused indexes to save write cost.

3) Overusing SERIALIZABLE isolation

Symptom: Serialization failures under load. Fix: Use the weakest isolation that prevents your anomalies (often READ COMMITTED or REPEATABLE READ) and retry when required.

4) Locking the table during big migrations

Symptom: Timeouts and outages. Fix: Break into smaller steps, backfill in batches, consider online index builds when supported.

5) Stale cache after writes

Symptom: Users see old data. Fix: Write DB first, invalidate cache key, optionally write-through on next read. Keep TTLs reasonable.

6) Oversized connection pools

Symptom: DB saturation and timeouts. Fix: Set pool max roughly to cores×active_workers, monitor wait times, and prefer queues over unlimited concurrency.

Mini project: Feature flag service (DB-first)

Goal: Build a small service that serves feature flags per environment and user segment with fast reads and safe writes.

Model
- Tables: flags, environments, variants, rules, flag_evaluations (audit).
- Constraints: UNIQUE(flag_key, environment), FK relationships. Add CHECK for percentage between 0–100.
Queries
- Get all active flags for a user: join flags + rules by environment and segment.
- Add keyset pagination for audits.
Transactions
- Updating a flag: update rule set and insert audit row atomically.
Indexes
- flags(environment, active), rules(flag_id, priority), flag_evaluations(environment, created_at DESC).
Pooling & caching
- Use a connection pool. Cache read results per environment for 60s; invalidate on updates.
Migrations
- Ship an initial migration; later add a nullable column rollout_strategy, backfill, then enforce NOT NULL.

Stretch goals

Add a constraint or trigger to prevent circular rule dependencies.
Implement optimistic concurrency with a version integer.

Additional practical project ideas

Analytics event store: ingest, query by user and time windows, keyset paginate.
Order management API: transactional order creation, stock reservation, payment status updates.
Content platform: posts, tags, search by relevance with covering indexes.

Subskills

Relational Modeling Basics
Writing Efficient SQL Queries
Transactions And Isolation Basics
Indexing Concepts
Connection Pooling
Migrations And Schema Management
Caching Strategies Basics
Handling Data Consistency

Next steps

Apply at least one indexing improvement to a real endpoint and measure with EXPLAIN ANALYZE.
Introduce a structured migration workflow in your project and document rollback strategies.
Add a small cache layer to one hot read and instrument hit rate and latency.

Menu

Database Design And Access

Table of Contents

Why this matters for a Backend Engineer

Who this is for

Prerequisites

Learning path

Practical roadmap with milestones

Worked examples (with code)

1) Relational modeling: posts and comments

2) Query with join and aggregation

3) Safe money transfer with transactions

4) Indexing for pagination

5) Connection pooling in Node.js (pg)

6) Zero-downtime migration: add NOT NULL column with default

7) Cache-aside read with TTL

Drills and exercises

Common mistakes and debugging tips

Mini project: Feature flag service (DB-first)

Additional practical project ideas

Subskills

Next steps

Database Design And Access — Skill Exam

Topics

Caching Strategies Basics

Relational Modeling Basics

Transactions And Isolation Basics

Connection Pooling

Handling Data Consistency

Writing Efficient SQL Queries

Indexing Concepts

Migrations And Schema Management

Have questions about Database Design And Access?

AI Assistant