How to learn Backup Restore and Migrations for SQL in Data Analyst for free

Why this matters

As a Data Analyst, you often work with production-like datasets. Knowing backups, restores, and migrations ensures you can safely:

Clone production data into a staging environment for analysis or experimentation.
Restore data after accidental deletes or bad releases.
Change schemas (add columns, indexes) without breaking dashboards or ETL pipelines.
Verify and communicate data cutovers during releases.

Real tasks you might face

Restore yesterday’s snapshot to a sandbox to debug a KPI drop.
Add a new not-null column to a 50M-row table without locking ingestion.
Migrate a MySQL dataset to PostgreSQL for advanced analytics.
Recover a table after a faulty ETL job deleted rows.

Concept explained simply

Backups are copies of your data you can restore later. Migrations are controlled changes to the database structure or data.

Backup types: Full (everything), Differential (changes since last full), Incremental (changes since last backup). Some systems also use log/WAL backups for point-in-time recovery (PITR).
Backup methods: Logical (exports SQL or data files; portable, slower) vs Physical (copies of data files; fast, engine-specific).
Restore: bring the copy back. Verify with checksums, counts, and sample queries.
Migrations: versioned steps that change schema/data. Aim for safe, reversible, and observable changes.

Mental model

Think of your database like a book being edited while people read it.

Backups: periodic photocopies (full) and highlight notes of changes (logs/incrementals).
Restore: choosing a photocopy and optionally re-applying the highlights to get to a specific moment (PITR).
Migrations: carefully planned edits with version numbers, so you always know which edition of the book is in use.

Worked examples

PostgreSQL — Logical backup and restore with pg_dump/pg_restore

Create a schema to back up:

CREATE TABLE public.orders (id SERIAL PRIMARY KEY, amount numeric(10,2), created_at timestamptz DEFAULT now());
INSERT INTO public.orders (amount) VALUES (49.90), (12.00), (5.50);

Backup (custom format for parallel restore):
```
# Shell
pg_dump -Fc -d mydb -f mydb.dump
```

Restore into new database:

# Shell
createdb mydb_restore
pg_restore -d mydb_restore mydb.dump

Verify:

SELECT count(*) FROM public.orders; -- expect 3

MySQL — Consistent dump with mysqldump

Dump with consistent snapshot (InnoDB):

# Shell
mysqldump --single-transaction --routines --triggers mydb > mydb.sql

Restore:

# Shell
mysql -e "CREATE DATABASE mydb_restore;"
mysql mydb_restore < mydb.sql

Verify:
```
SELECT COUNT(*) FROM orders;
```

SQL Server — Full backup and restore

Backup:

BACKUP DATABASE [MyDb]
TO DISK = N'C:\backups\MyDb_full.bak'
WITH INIT, COMPRESSION;

Restore to new name (paths may differ across servers):

RESTORE DATABASE [MyDb_Restore]
FROM DISK = N'C:\backups\MyDb_full.bak'
WITH MOVE N'MyDb' TO N'C:\data\MyDb_Restore.mdf',
     MOVE N'MyDb_log' TO N'C:\data\MyDb_Restore_log.ldf',
     RECOVERY;

Verify:

SELECT COUNT(*) FROM [MyDb_Restore].dbo.orders;

SQLite — Quick file backup

Use the backup API via sqlite3 shell:
```
.backup main backup.sqlite
```
Restore by opening the backup file as the database.
Verify with a row count.

Safe migration playbook (zero/low downtime)

Plan the change
- Define goal, affected tables, expected runtime, and rollback.
- Decide online vs maintenance window.
Take a backup or snapshot
- Ensure you can roll back if needed.
Break down into small steps
- Prefer additive changes (add column, backfill, switch application, then remove old).
- Use concurrent/online index builds where supported.
Deploy in phases
- Schema change (non-breaking) → backfill in batches → switch app reads/writes → drop old artifacts.
Observe
- Monitor locks, replication lag, error rates, and query times.
Validate
- Counts, checksums/hashes, sample queries, constraints validated.

Example: Add NOT NULL column to large table safely (PostgreSQL)

Add nullable column:

ALTER TABLE public.orders ADD COLUMN currency text; -- nullable

Backfill in batches:

-- repeat until done
UPDATE public.orders
SET currency = 'USD'
WHERE currency IS NULL
  AND id >= $start AND id < $end;

Add default (future inserts):

ALTER TABLE public.orders ALTER COLUMN currency SET DEFAULT 'USD';

Enforce NOT NULL when ready:

ALTER TABLE public.orders ALTER COLUMN currency SET NOT NULL;

Note: In PostgreSQL 11+, adding a column with a constant DEFAULT does not rewrite the table, but the phased approach is still safest across versions.

Exercises

Try these hands-on tasks. You can compare with solutions below each task.

Exercise 1: PostgreSQL — Full backup and restore

Goal: Create a small table, back it up, drop it, restore it, and verify counts match.

Create database demo, table orders, and insert 3 rows.
Run pg_dump to create a custom-format dump.
Drop and recreate the database, then restore with pg_restore.
Verify row count equals 3.

Show solution

-- 1) Setup
createdb demo
psql demo -c "CREATE TABLE orders (id SERIAL PRIMARY KEY, amount numeric(10,2));"
psql demo -c "INSERT INTO orders (amount) VALUES (49.90), (12.00), (5.50);"

-- 2) Backup
pg_dump -Fc -d demo -f demo.dump

-- 3) Drop and restore
dropdb demo
createdb demo
pg_restore -d demo demo.dump

-- 4) Verify
psql demo -c "SELECT COUNT(*) FROM orders;"  -- expect 3

Exercise 2: Safe migration — Add NOT NULL column with backfill

Goal: Add a non-null column currency with default 'USD' to a large table without long locks.

Add column as nullable.
Backfill in batches using id ranges.
Set default for new inserts.
Set NOT NULL.

Show solution

-- Step 1: Add column
ALTER TABLE public.orders ADD COLUMN currency text;

-- Step 2: Backfill in batches (repeat until done)
-- Example: batch by id ranges
UPDATE public.orders
SET currency = 'USD'
WHERE currency IS NULL
  AND id BETWEEN 1 AND 10000;

-- Step 3: Default for new rows
ALTER TABLE public.orders ALTER COLUMN currency SET DEFAULT 'USD';

-- Step 4: Enforce not null
ALTER TABLE public.orders ALTER COLUMN currency SET NOT NULL;

MySQL note: Use ONLINE DDL where supported, and backfill with small transactions.

Self-check checklist

I can create and restore a logical backup.
I can explain full vs differential vs incremental.
I can run a safe, phased schema migration.
I verify restores with counts and sample queries.

Common mistakes and how to self-check

Relying on backups you’ve never restored. Self-check: Schedule a monthly restore drill and document steps.
Online dumps without --single-transaction in MySQL. Self-check: Confirm your dump command uses a consistent snapshot.
Adding NOT NULL immediately on large tables. Self-check: Deploy nullable → backfill → enforce.
Building blocking indexes in peak hours. Self-check: Use CONCURRENTLY/ONLINE options where supported.
No verification after restore. Self-check: Always run counts, basic aggregates, and a few business-critical queries.
Skipping permissions and extensions on restore. Self-check: Recreate roles, grants, and extensions before handing to users.

Practical projects

Staging Clone: Automate a nightly logical backup from production and restore to a staging database, masking sensitive columns.
PITR Playground: In PostgreSQL, enable WAL archiving, insert sample data over time, then practice restoring to a timestamp between two inserts.
Cross-Engine Migration: Export a small MySQL dataset and import into PostgreSQL, mapping types (e.g., TINYINT → smallint, DATETIME → timestamptz) and validating row counts.

Who this is for

Data Analysts who need reliable copies of data, safe schema changes, and the ability to recover or reproduce datasets for analysis and reporting.

Prerequisites

Basic SQL (SELECT, INSERT, UPDATE, CREATE TABLE).
Comfort with the command line for running backup tools.
Access to a local or test database instance.

Learning path

Learn backup concepts (logical vs physical, PITR).
Practice a full logical backup and restore.
Run a small schema migration with backfill.
Learn verification methods (counts, constraints, sample queries).
Automate: script backups and integrity checks.

Mini challenge

Scenario: You must add a unique, non-null user_email to a 20M-row table without downtime. Plan the steps. Include: backup/snapshot, add nullable column, backfill with dedup strategy, create unique index concurrently/online, enforce NOT NULL, switch application writes, clean up. Write your ordered plan in 8–12 steps.

Next steps

Document your team’s backup/restore runbook with exact commands for your engine.
Set up a recurring restore drill in a sandbox and a verification script.
Adopt a migration tool (e.g., migration scripts with versioning) and a naming convention.

Quick test

Ready to check your understanding? Take the quick test below. Note: The test is available to everyone; only logged-in users get saved progress.

Menu

Backup Restore and Migrations

Table of Contents

Why this matters

Concept explained simply

Worked examples

Safe migration playbook (zero/low downtime)

Exercises

Exercise 1: PostgreSQL — Full backup and restore

Exercise 2: Safe migration — Add NOT NULL column with backfill

Self-check checklist

Common mistakes and how to self-check

Practical projects

Who this is for

Prerequisites

Learning path

Mini challenge

Next steps

Quick test

Practice Exercises

PostgreSQL — Full backup and restore

Instructions

Expected Output

Safe migration — Add NOT NULL column with backfill

Backup Restore and Migrations — Quick Test

Have questions about Backup Restore and Migrations?

AI Assistant