How to learn Database Monitoring and Maintenance Vacuum Analyze Statistics for SQL in Data Analyst for free

Heads-up: The quick test is available to everyone. If you are logged in, your progress will be saved automatically.

Why this matters

As a Data Analyst, your queries must be fast and reliable. Tables grow, rows get updated/deleted, and the optimizer relies on statistics to pick efficient plans. Routine maintenance keeps your dashboards responsive and your analyses correct.

Speed up slow dashboards by removing dead rows that bloat tables.
Keep query plans accurate with fresh column statistics.
Prevent outages caused by transaction ID wraparound (in systems like PostgreSQL).

Who this is for

Data Analysts and Analytics Engineers running SQL on shared warehouses or OLTP replicas.
Anyone who owns analytical tables and needs predictable query performance.

Prerequisites

You can connect to a SQL database and run basic SELECT/INSERT/UPDATE/DELETE.
You know how to identify table names and schemas you own.
You understand that VACUUM/ANALYZE may require appropriate permissions.

Concept explained simply

VACUUM: Reclaims space from deleted/updated rows (dead tuples) and keeps transaction IDs healthy. Normal VACUUM is online and concurrent; VACUUM FULL rewrites the table, frees more space, but requires an exclusive lock.

ANALYZE: Scans samples of your table to estimate value distributions, null ratios, and distinct counts. The optimizer uses these to choose join orders and indexes efficiently.

Mental model

Imagine a library: VACUUM removes discarded books taking shelf space, so aisles stay clear.
ANALYZE updates the catalog’s knowledge of which books are popular or rare, so the librarian can fetch them quickly.
Autovacuum (in PostgreSQL) is the nightly cleaning crew. You still need to spot-check busy sections and occasionally do a deeper clean.

Key commands and views

VACUUM; or VACUUM (VERBOSE);
VACUUM ANALYZE; to reclaim space and refresh stats together after large changes.
VACUUM FULL; deep compaction; requires exclusive lock (plan for downtime).
ANALYZE; or ANALYZE schema.table; or ANALYZE table(column1, column2);
ALTER TABLE ... SET (autovacuum_vacuum_scale_factor=..., autovacuum_vacuum_threshold=...);
ALTER TABLE ... ALTER COLUMN col SET STATISTICS 100-1000; to increase sampling for skewed columns.

Helpful monitoring queries (PostgreSQL)

-- Dead tuples and last maintenance
SELECT relname, n_live_tup, n_dead_tup, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 20;

-- Current autovacuum/vacuum activity
SELECT * FROM pg_stat_progress_vacuum;

-- Check settings relevant to vacuum
SELECT name, setting FROM pg_settings
WHERE name IN ('autovacuum', 'autovacuum_vacuum_scale_factor', 'autovacuum_vacuum_threshold',
               'autovacuum_analyze_scale_factor', 'autovacuum_analyze_threshold', 'default_statistics_target')
ORDER BY name;

-- Table bloat hint (approx; use with caution)
-- Strategy: Compare relpages (table) vs index pages, and track growth over time.
SELECT relname, n_live_tup, n_dead_tup FROM pg_stat_user_tables ORDER BY n_dead_tup DESC;

Worked examples

1) Remove dead tuples and refresh stats after a batch delete

Inspect dead tuples:

SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
WHERE relname = 'sales';

Run maintenance:

VACUUM ANALYZE sales;

Confirm improvement:

SELECT relname, n_live_tup, n_dead_tup, last_analyze
FROM pg_stat_user_tables
WHERE relname = 'sales';

Expected: n_dead_tup drops close to zero; last_analyze updates.

2) Improve plans on a skewed column with higher statistics

-- Column 'user_id' is heavily skewed (a few users generate most events)
ALTER TABLE web_events ALTER COLUMN user_id SET STATISTICS 500;
ANALYZE web_events(user_id);

-- Compare estimates vs actuals on a known filter
EXPLAIN ANALYZE
SELECT * FROM web_events WHERE user_id = 42;

Goal: The planner better estimates selectivity and may pick a more efficient index plan.

3) Watch autovacuum progress on a large table

-- In another session, a large update/delete runs on 'orders'
-- Here, you observe progress:
SELECT relid::regclass AS table_name, phase, heap_blks_total, heap_blks_scanned, heap_blks_vacuumed
FROM pg_stat_progress_vacuum
WHERE relid::regclass::text = 'orders';

Use this to time manual VACUUMs and avoid clashes with peak hours.

Practice: try it yourself

Complete the exercises below. They match the Exercises section at the bottom of this page.

Identify a table with many dead tuples and run VACUUM ANALYZE.
Raise a column's statistics target, ANALYZE, and verify plans improve.

Common mistakes and how to self-check

Confusing VACUUM with VACUUM FULL. Use FULL only when you plan downtime and need compaction via rewrite. Self-check: Does your task require an exclusive lock?
Skipping ANALYZE after big changes. Fresh data without fresh stats leads to bad plans. Self-check: Did row counts change by millions? Run ANALYZE.
Disabling autovacuum globally. This can cause table bloat and transaction ID wraparound. Self-check: Keep autovacuum on; tune per-table if needed.
Over-tuning stats on every column. Higher targets increase ANALYZE time. Self-check: Only raise for columns used in WHERE/JOIN with skew.
Vacuuming during peak traffic without awareness. While regular VACUUM is concurrent, it still uses resources. Self-check: Schedule during low-load windows.

Practical projects

Maintenance playbook: Create a weekly script with queries that list top 20 tables by dead tuples, run targeted VACUUM ANALYZE, and log before/after metrics.
Stats tuning pilot: Pick 3 slow queries. Identify one selective column each, raise its statistics target, ANALYZE, and capture EXPLAIN ANALYZE before/after.
Autovacuum tuning: For a hot table, set autovacuum_vacuum_scale_factor and autovacuum_vacuum_threshold to trigger earlier, then validate with pg_stat_user_tables over a week.

Learning path

Refresh SQL basics: SELECT, joins, indexes, and EXPLAIN.
Learn VACUUM, ANALYZE, and when to use FULL.
Practice monitoring: pg_stat_user_tables, pg_stat_progress_vacuum, and relevant settings.
Tune per-table autovacuum and column statistics for skewed data.
Automate: build a simple weekly checklist or script.

Next steps

Take the Quick Test to confirm understanding.
Automate a weekly maintenance report for your team.
Document table-specific tuning choices and revisit monthly.

Exercises

Exercise 1: Targeted VACUUM ANALYZE

Goal: Reduce dead tuples and refresh stats for one table you own.

Find a candidate with many dead tuples.
Run VACUUM ANALYZE.
Confirm improvement.

-- 1) Identify a table (example: orders)
SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;

-- 2) Maintain it
VACUUM ANALYZE orders;

-- 3) Verify
SELECT relname, n_dead_tup, last_analyze
FROM pg_stat_user_tables
WHERE relname = 'orders';

Exercise 2: Stats tuning for a skewed column

Goal: Improve plan accuracy for a frequently filtered column.

Pick a table and column commonly used in WHERE/JOIN (example: events.country_code).
Raise column statistics and ANALYZE just that column.
Compare EXPLAIN ANALYZE before/after on a representative filter.

-- Before
EXPLAIN ANALYZE SELECT * FROM events WHERE country_code = 'US';

-- Tune
ALTER TABLE events ALTER COLUMN country_code SET STATISTICS 400;
ANALYZE events(country_code);

-- After
EXPLAIN ANALYZE SELECT * FROM events WHERE country_code = 'US';

Mini challenge

You have a 200M-row table with frequent deletes. Autovacuum keeps it usable but queries are slowly degrading. Propose a plan with timing that includes: monitoring dead tuples daily, targeted VACUUM ANALYZE windows, and a rare, scheduled VACUUM FULL with expected downtime. Write the plan as 3-5 steps.

Menu

Database Monitoring and Maintenance Vacuum Analyze Statistics

Table of Contents