Why this matters
Schema changes break production if done carelessly. As a Backend Engineer you will:
- Ship features that need new columns, tables, or indexes.
- Fix performance with new indexes and constraints.
- Rename or remove legacy fields without downtime.
- Migrate data types and backfill safely on large tables.
Good migration habits prevent outages, lockups, and data loss.
Concept explained simply
A migration is a small, versioned step that changes the database structure or data. You store steps as code so the schema can be reproduced and reviewed.
Mental model
Think of your schema as a public API contract. Clients (your app, jobs, analytics) depend on it. Safe changes follow this pattern:
- Additive first (add new pieces alongside old).
- Backfill data slowly.
- Switch reads/writes to the new shape.
- Remove old pieces later.
Why transactions alone are not enough
Wrapping DDL in a transaction does not remove the risk of long locks, table rewrites, or replication lag. Some operations still block writes or reads. Safe sequencing and online operations matter more than just BEGIN/COMMIT.
Deep dive: locks and online operations
- Regular index creation can block writes; concurrent/online index creation reduces blocking.
- Adding NOT NULL with a default may rewrite large tables; prefer add column nullable, backfill, then add constraint.
- Long-running DDL can delay replication; deploy in small steps.
Production-safe rules
Worked examples
Example 1: Add a non-nullable column to a large table
Goal: Add users.timezone TEXT NOT NULL DEFAULT 'UTC' without downtime.
- Add column as nullable (fast):
ALTER TABLE users ADD COLUMN timezone TEXT; - Set default for future inserts (fast):
ALTER TABLE users ALTER COLUMN timezone SET DEFAULT 'UTC'; - Backfill in batches (application job or scheduled task):
-- Pseudocode SQL loop; implement in app/job UPDATE users SET timezone = 'UTC' WHERE timezone IS NULL AND id BETWEEN :min AND :max; -- Repeat for ranges, sleep between batches - Ensure all rows are filled (sanity check):
SELECT COUNT(*) FROM users WHERE timezone IS NULL; - Add NOT NULL constraint (now safe):
ALTER TABLE users ALTER COLUMN timezone SET NOT NULL;
Why not set NOT NULL with default immediately?
On some databases this rewrites the full table and can lock it for a long time. The additive-then-backfill approach avoids that.
Example 2: Rename a column without breaking reads
Goal: Rename customers.full_name to customers.name with zero downtime.
- Add new column:
ALTER TABLE customers ADD COLUMN name TEXT; - Dual-write in app: when saving, write to both full_name and name.
- Backfill existing rows:
UPDATE customers SET name = full_name WHERE name IS NULL; - Switch reads to name in code. Monitor errors and metrics.
- Stop writing full_name. Keep dual-read fallback briefly if needed.
- After a safe window, drop old column:
ALTER TABLE customers DROP COLUMN full_name;
Example 3: Drop a column safely
Goal: Remove orders.legacy_code used by old code paths.
- Verify no reads/writes in code. Remove usages and deploy.
- Deploy a guard: log or alert if any reference occurs.
- Take a backup/snapshot if required by policy.
- Drop the column:
ALTER TABLE orders DROP COLUMN legacy_code;
Rollback plan
If needed, add the column back and rehydrate from backups or logs. This is a forward rollback (new migration), not an implicit undo.
Example 4: Create an index without long locks
Goal: Add index on events(user_id, created_at) used by a hot query.
-- Prefer online/concurrent index creation where supported
-- Example syntax (database-specific):
CREATE INDEX CONCURRENTLY idx_events_user_created
ON events (user_id, created_at);
Monitor impact and query plans. Avoid building multiple heavy indexes simultaneously.
Exercises you will do here
These match the exercise tasks below. Do them in a dev database or a small local instance.
- Exercise 1: Plan and write a safe migration to add a unique email to users.
- Exercise 2: Safely change a column type from INTEGER to BIGINT.
Tip: Use small batches for backfills and measure execution time.
Common mistakes and self-check
- Adding NOT NULL with default on huge tables directly. Self-check: Will this rewrite the table? Can I add nullable + backfill first?
- Renaming columns directly. Self-check: Do all services use the new name? Did I dual-write and backfill?
- Building non-concurrent indexes on hot tables. Self-check: Does my index creation block writes?
- Skipping validations. Self-check: Did I run COUNT checks before enforcing constraints?
- Dropping too early. Self-check: Are logs, dashboards, and jobs updated and quiet?
Quick self-audit before running a migration
- Is it additive-first and small?
- Do I have a backfill plan with batching?
- Is there a switch-over step distinct from enforcement?
- Is monitoring in place during and after?
- Is the rollback plan forward-only?
Practical projects
- Blue-green column rename: Pick a busy table, add a new column, dual-write, backfill, switch reads, and drop the old column after a safe window.
- Online indexing: Identify a slow query and add an index using an online/concurrent method. Compare query timings before/after.
- Constraint hardening: Start with a nullable column, backfill, add NOT NULL and a CHECK constraint after validation.
Who this is for
- Backend Engineers shipping features on relational databases.
- Platform/Infra engineers building deployment pipelines for schema changes.
- Data-minded developers maintaining long-lived systems.
Prerequisites
- Comfortable with basic SQL (CREATE TABLE, ALTER TABLE, INSERT/UPDATE/DELETE).
- Familiar with your app framework's migration tool (any is fine).
- Basic understanding of transactions and locks.
Learning path
- Read the concepts and worked examples above.
- Complete Exercises 1–2 (below).
- Run the Quick Test at the end. The test is available to everyone; only logged-in users get saved progress.
- Do one Practical project this week to reinforce habits.
Next steps
- Adopt an additive-first checklist for every schema change.
- Automate backfills with batch jobs and clear metrics.
- Document a company playbook for zero-downtime patterns.
Mini challenge
Your product needs to enforce uniqueness on accounts.handle, but duplicates exist. Sketch a three-step plan to clean data, add a unique index safely, and prevent regressions. Keep each step deployable within minutes.