Why this matters
As a Data Analyst, you often work with production-like datasets. Knowing backups, restores, and migrations ensures you can safely:
- Clone production data into a staging environment for analysis or experimentation.
- Restore data after accidental deletes or bad releases.
- Change schemas (add columns, indexes) without breaking dashboards or ETL pipelines.
- Verify and communicate data cutovers during releases.
Real tasks you might face
- Restore yesterday’s snapshot to a sandbox to debug a KPI drop.
- Add a new not-null column to a 50M-row table without locking ingestion.
- Migrate a MySQL dataset to PostgreSQL for advanced analytics.
- Recover a table after a faulty ETL job deleted rows.
Concept explained simply
Backups are copies of your data you can restore later. Migrations are controlled changes to the database structure or data.
- Backup types: Full (everything), Differential (changes since last full), Incremental (changes since last backup). Some systems also use log/WAL backups for point-in-time recovery (PITR).
- Backup methods: Logical (exports SQL or data files; portable, slower) vs Physical (copies of data files; fast, engine-specific).
- Restore: bring the copy back. Verify with checksums, counts, and sample queries.
- Migrations: versioned steps that change schema/data. Aim for safe, reversible, and observable changes.
Mental model
Think of your database like a book being edited while people read it.
- Backups: periodic photocopies (full) and highlight notes of changes (logs/incrementals).
- Restore: choosing a photocopy and optionally re-applying the highlights to get to a specific moment (PITR).
- Migrations: carefully planned edits with version numbers, so you always know which edition of the book is in use.
Worked examples
PostgreSQL — Logical backup and restore with pg_dump/pg_restore
- Create a schema to back up:
CREATE TABLE public.orders (id SERIAL PRIMARY KEY, amount numeric(10,2), created_at timestamptz DEFAULT now()); INSERT INTO public.orders (amount) VALUES (49.90), (12.00), (5.50); - Backup (custom format for parallel restore):
# Shell pg_dump -Fc -d mydb -f mydb.dump - Restore into new database:
# Shell createdb mydb_restore pg_restore -d mydb_restore mydb.dump - Verify:
SELECT count(*) FROM public.orders; -- expect 3
MySQL — Consistent dump with mysqldump
- Dump with consistent snapshot (InnoDB):
# Shell mysqldump --single-transaction --routines --triggers mydb > mydb.sql - Restore:
# Shell mysql -e "CREATE DATABASE mydb_restore;" mysql mydb_restore < mydb.sql - Verify:
SELECT COUNT(*) FROM orders;
SQL Server — Full backup and restore
- Backup:
BACKUP DATABASE [MyDb] TO DISK = N'C:\backups\MyDb_full.bak' WITH INIT, COMPRESSION; - Restore to new name (paths may differ across servers):
RESTORE DATABASE [MyDb_Restore] FROM DISK = N'C:\backups\MyDb_full.bak' WITH MOVE N'MyDb' TO N'C:\data\MyDb_Restore.mdf', MOVE N'MyDb_log' TO N'C:\data\MyDb_Restore_log.ldf', RECOVERY; - Verify:
SELECT COUNT(*) FROM [MyDb_Restore].dbo.orders;
SQLite — Quick file backup
- Use the backup API via sqlite3 shell:
.backup main backup.sqlite - Restore by opening the backup file as the database.
- Verify with a row count.
Safe migration playbook (zero/low downtime)
- Plan the change
- Define goal, affected tables, expected runtime, and rollback.
- Decide online vs maintenance window.
- Take a backup or snapshot
- Ensure you can roll back if needed.
- Break down into small steps
- Prefer additive changes (add column, backfill, switch application, then remove old).
- Use concurrent/online index builds where supported.
- Deploy in phases
- Schema change (non-breaking) → backfill in batches → switch app reads/writes → drop old artifacts.
- Observe
- Monitor locks, replication lag, error rates, and query times.
- Validate
- Counts, checksums/hashes, sample queries, constraints validated.
Example: Add NOT NULL column to large table safely (PostgreSQL)
- Add nullable column:
ALTER TABLE public.orders ADD COLUMN currency text; -- nullable - Backfill in batches:
-- repeat until done UPDATE public.orders SET currency = 'USD' WHERE currency IS NULL AND id >= $start AND id < $end; - Add default (future inserts):
ALTER TABLE public.orders ALTER COLUMN currency SET DEFAULT 'USD'; - Enforce NOT NULL when ready:
ALTER TABLE public.orders ALTER COLUMN currency SET NOT NULL;
Note: In PostgreSQL 11+, adding a column with a constant DEFAULT does not rewrite the table, but the phased approach is still safest across versions.
Exercises
Try these hands-on tasks. You can compare with solutions below each task.
Exercise 1: PostgreSQL — Full backup and restore
Goal: Create a small table, back it up, drop it, restore it, and verify counts match.
- Create database demo, table orders, and insert 3 rows.
- Run pg_dump to create a custom-format dump.
- Drop and recreate the database, then restore with pg_restore.
- Verify row count equals 3.
Show solution
-- 1) Setup
createdb demo
psql demo -c "CREATE TABLE orders (id SERIAL PRIMARY KEY, amount numeric(10,2));"
psql demo -c "INSERT INTO orders (amount) VALUES (49.90), (12.00), (5.50);"
-- 2) Backup
pg_dump -Fc -d demo -f demo.dump
-- 3) Drop and restore
dropdb demo
createdb demo
pg_restore -d demo demo.dump
-- 4) Verify
psql demo -c "SELECT COUNT(*) FROM orders;" -- expect 3
Exercise 2: Safe migration — Add NOT NULL column with backfill
Goal: Add a non-null column currency with default 'USD' to a large table without long locks.
- Add column as nullable.
- Backfill in batches using id ranges.
- Set default for new inserts.
- Set NOT NULL.
Show solution
-- Step 1: Add column
ALTER TABLE public.orders ADD COLUMN currency text;
-- Step 2: Backfill in batches (repeat until done)
-- Example: batch by id ranges
UPDATE public.orders
SET currency = 'USD'
WHERE currency IS NULL
AND id BETWEEN 1 AND 10000;
-- Step 3: Default for new rows
ALTER TABLE public.orders ALTER COLUMN currency SET DEFAULT 'USD';
-- Step 4: Enforce not null
ALTER TABLE public.orders ALTER COLUMN currency SET NOT NULL;
MySQL note: Use ONLINE DDL where supported, and backfill with small transactions.
Self-check checklist
- I can create and restore a logical backup.
- I can explain full vs differential vs incremental.
- I can run a safe, phased schema migration.
- I verify restores with counts and sample queries.
Common mistakes and how to self-check
- Relying on backups you’ve never restored. Self-check: Schedule a monthly restore drill and document steps.
- Online dumps without --single-transaction in MySQL. Self-check: Confirm your dump command uses a consistent snapshot.
- Adding NOT NULL immediately on large tables. Self-check: Deploy nullable → backfill → enforce.
- Building blocking indexes in peak hours. Self-check: Use CONCURRENTLY/ONLINE options where supported.
- No verification after restore. Self-check: Always run counts, basic aggregates, and a few business-critical queries.
- Skipping permissions and extensions on restore. Self-check: Recreate roles, grants, and extensions before handing to users.
Practical projects
- Staging Clone: Automate a nightly logical backup from production and restore to a staging database, masking sensitive columns.
- PITR Playground: In PostgreSQL, enable WAL archiving, insert sample data over time, then practice restoring to a timestamp between two inserts.
- Cross-Engine Migration: Export a small MySQL dataset and import into PostgreSQL, mapping types (e.g., TINYINT → smallint, DATETIME → timestamptz) and validating row counts.
Who this is for
Data Analysts who need reliable copies of data, safe schema changes, and the ability to recover or reproduce datasets for analysis and reporting.
Prerequisites
- Basic SQL (SELECT, INSERT, UPDATE, CREATE TABLE).
- Comfort with the command line for running backup tools.
- Access to a local or test database instance.
Learning path
- Learn backup concepts (logical vs physical, PITR).
- Practice a full logical backup and restore.
- Run a small schema migration with backfill.
- Learn verification methods (counts, constraints, sample queries).
- Automate: script backups and integrity checks.
Mini challenge
Scenario: You must add a unique, non-null user_email to a 20M-row table without downtime. Plan the steps. Include: backup/snapshot, add nullable column, backfill with dedup strategy, create unique index concurrently/online, enforce NOT NULL, switch application writes, clean up. Write your ordered plan in 8–12 steps.
Next steps
- Document your team’s backup/restore runbook with exact commands for your engine.
- Set up a recurring restore drill in a sandbox and a verification script.
- Adopt a migration tool (e.g., migration scripts with versioning) and a naming convention.
Quick test
Ready to check your understanding? Take the quick test below. Note: The test is available to everyone; only logged-in users get saved progress.