Menu

Topic 28 of 31

Database Monitoring and Maintenance Vacuum Analyze Statistics

Learn Database Monitoring and Maintenance Vacuum Analyze Statistics for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 19, 2025 | Updated: December 19, 2025

Heads-up: The quick test is available to everyone. If you are logged in, your progress will be saved automatically.

Why this matters

As a Data Analyst, your queries must be fast and reliable. Tables grow, rows get updated/deleted, and the optimizer relies on statistics to pick efficient plans. Routine maintenance keeps your dashboards responsive and your analyses correct.

  • Speed up slow dashboards by removing dead rows that bloat tables.
  • Keep query plans accurate with fresh column statistics.
  • Prevent outages caused by transaction ID wraparound (in systems like PostgreSQL).

Who this is for

  • Data Analysts and Analytics Engineers running SQL on shared warehouses or OLTP replicas.
  • Anyone who owns analytical tables and needs predictable query performance.

Prerequisites

Concept explained simply

VACUUM: Reclaims space from deleted/updated rows (dead tuples) and keeps transaction IDs healthy. Normal VACUUM is online and concurrent; VACUUM FULL rewrites the table, frees more space, but requires an exclusive lock.

ANALYZE: Scans samples of your table to estimate value distributions, null ratios, and distinct counts. The optimizer uses these to choose join orders and indexes efficiently.

Mental model

  • Imagine a library: VACUUM removes discarded books taking shelf space, so aisles stay clear.
  • ANALYZE updates the catalog’s knowledge of which books are popular or rare, so the librarian can fetch them quickly.
  • Autovacuum (in PostgreSQL) is the nightly cleaning crew. You still need to spot-check busy sections and occasionally do a deeper clean.

Key commands and views

  • VACUUM; or VACUUM (VERBOSE);
  • VACUUM ANALYZE; to reclaim space and refresh stats together after large changes.
  • VACUUM FULL; deep compaction; requires exclusive lock (plan for downtime).
  • ANALYZE; or ANALYZE schema.table; or ANALYZE table(column1, column2);
  • ALTER TABLE ... SET (autovacuum_vacuum_scale_factor=..., autovacuum_vacuum_threshold=...);
  • ALTER TABLE ... ALTER COLUMN col SET STATISTICS 100-1000; to increase sampling for skewed columns.
Helpful monitoring queries (PostgreSQL)
-- Dead tuples and last maintenance
SELECT relname, n_live_tup, n_dead_tup, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 20;

-- Current autovacuum/vacuum activity
SELECT * FROM pg_stat_progress_vacuum;

-- Check settings relevant to vacuum
SELECT name, setting FROM pg_settings
WHERE name IN ('autovacuum', 'autovacuum_vacuum_scale_factor', 'autovacuum_vacuum_threshold',
               'autovacuum_analyze_scale_factor', 'autovacuum_analyze_threshold', 'default_statistics_target')
ORDER BY name;

-- Table bloat hint (approx; use with caution)
-- Strategy: Compare relpages (table) vs index pages, and track growth over time.
SELECT relname, n_live_tup, n_dead_tup FROM pg_stat_user_tables ORDER BY n_dead_tup DESC;

Worked examples

1) Remove dead tuples and refresh stats after a batch delete

  1. Inspect dead tuples:
SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
WHERE relname = 'sales';
  1. Run maintenance:
VACUUM ANALYZE sales;
  1. Confirm improvement:
SELECT relname, n_live_tup, n_dead_tup, last_analyze
FROM pg_stat_user_tables
WHERE relname = 'sales';

Expected: n_dead_tup drops close to zero; last_analyze updates.

2) Improve plans on a skewed column with higher statistics

-- Column 'user_id' is heavily skewed (a few users generate most events)
ALTER TABLE web_events ALTER COLUMN user_id SET STATISTICS 500;
ANALYZE web_events(user_id);

-- Compare estimates vs actuals on a known filter
EXPLAIN ANALYZE
SELECT * FROM web_events WHERE user_id = 42;

Goal: The planner better estimates selectivity and may pick a more efficient index plan.

3) Watch autovacuum progress on a large table

-- In another session, a large update/delete runs on 'orders'
-- Here, you observe progress:
SELECT relid::regclass AS table_name, phase, heap_blks_total, heap_blks_scanned, heap_blks_vacuumed
FROM pg_stat_progress_vacuum
WHERE relid::regclass::text = 'orders';

Use this to time manual VACUUMs and avoid clashes with peak hours.

Practice: try it yourself

Complete the exercises below. They match the Exercises section at the bottom of this page.

Common mistakes and how to self-check

  • Confusing VACUUM with VACUUM FULL. Use FULL only when you plan downtime and need compaction via rewrite. Self-check: Does your task require an exclusive lock?
  • Skipping ANALYZE after big changes. Fresh data without fresh stats leads to bad plans. Self-check: Did row counts change by millions? Run ANALYZE.
  • Disabling autovacuum globally. This can cause table bloat and transaction ID wraparound. Self-check: Keep autovacuum on; tune per-table if needed.
  • Over-tuning stats on every column. Higher targets increase ANALYZE time. Self-check: Only raise for columns used in WHERE/JOIN with skew.
  • Vacuuming during peak traffic without awareness. While regular VACUUM is concurrent, it still uses resources. Self-check: Schedule during low-load windows.

Practical projects

  1. Maintenance playbook: Create a weekly script with queries that list top 20 tables by dead tuples, run targeted VACUUM ANALYZE, and log before/after metrics.
  2. Stats tuning pilot: Pick 3 slow queries. Identify one selective column each, raise its statistics target, ANALYZE, and capture EXPLAIN ANALYZE before/after.
  3. Autovacuum tuning: For a hot table, set autovacuum_vacuum_scale_factor and autovacuum_vacuum_threshold to trigger earlier, then validate with pg_stat_user_tables over a week.

Learning path

  1. Refresh SQL basics: SELECT, joins, indexes, and EXPLAIN.
  2. Learn VACUUM, ANALYZE, and when to use FULL.
  3. Practice monitoring: pg_stat_user_tables, pg_stat_progress_vacuum, and relevant settings.
  4. Tune per-table autovacuum and column statistics for skewed data.
  5. Automate: build a simple weekly checklist or script.

Next steps

  • Take the Quick Test to confirm understanding.
  • Automate a weekly maintenance report for your team.
  • Document table-specific tuning choices and revisit monthly.

Exercises

Exercise 1: Targeted VACUUM ANALYZE

Goal: Reduce dead tuples and refresh stats for one table you own.

  1. Find a candidate with many dead tuples.
  2. Run VACUUM ANALYZE.
  3. Confirm improvement.
-- 1) Identify a table (example: orders)
SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;

-- 2) Maintain it
VACUUM ANALYZE orders;

-- 3) Verify
SELECT relname, n_dead_tup, last_analyze
FROM pg_stat_user_tables
WHERE relname = 'orders';

Exercise 2: Stats tuning for a skewed column

Goal: Improve plan accuracy for a frequently filtered column.

  1. Pick a table and column commonly used in WHERE/JOIN (example: events.country_code).
  2. Raise column statistics and ANALYZE just that column.
  3. Compare EXPLAIN ANALYZE before/after on a representative filter.
-- Before
EXPLAIN ANALYZE SELECT * FROM events WHERE country_code = 'US';

-- Tune
ALTER TABLE events ALTER COLUMN country_code SET STATISTICS 400;
ANALYZE events(country_code);

-- After
EXPLAIN ANALYZE SELECT * FROM events WHERE country_code = 'US';

Mini challenge

You have a 200M-row table with frequent deletes. Autovacuum keeps it usable but queries are slowly degrading. Propose a plan with timing that includes: monitoring dead tuples daily, targeted VACUUM ANALYZE windows, and a rare, scheduled VACUUM FULL with expected downtime. Write the plan as 3-5 steps.

Practice Exercises

2 exercises to complete

Instructions

Pick a table with many dead tuples and run a targeted VACUUM ANALYZE. Verify that dead tuples drop and last_analyze updates.

-- Find candidates
SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;

-- Maintain (replace 'orders' with your table)
VACUUM ANALYZE orders;

-- Verify
SELECT relname, n_dead_tup, last_analyze
FROM pg_stat_user_tables
WHERE relname = 'orders';
Expected Output
The chosen table shows a significant decrease in n_dead_tup (ideally near zero) and last_analyze is updated to the current timestamp.

Database Monitoring and Maintenance Vacuum Analyze Statistics — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Database Monitoring and Maintenance Vacuum Analyze Statistics?

AI Assistant

Ask questions about this tool