Who this is for
You build or maintain analytics pipelines and dashboards, and you want practical ways to catch spikes, drops, delays, and data drift before stakeholders do. Suitable for analytics engineers, BI developers, and data-savvy analysts.
Prerequisites
- Basic SQL (GROUP BY, WINDOW functions)
- Understanding of KPIs/metrics (counts, sums, rates)
- Familiarity with time series concepts (daily/weekly granularity)
Why this matters
In real analytics engineering work, you will:
- Detect broken pipelines when daily row counts drop to near zero.
- Catch revenue spikes caused by duplicate loads before they land in executive reports.
- Spot freshness issues when tables stop updating after a failed job.
- Identify schema or distribution drift (e.g., suddenly 40% null emails) to prevent silent data quality degradation.
Anomaly detection is a safety net that triggers timely, actionable alerts with minimal noise.
Concept explained simply
An anomaly is a data point or pattern that departs from what’s normal for your metric. You define “normal” by a baseline (rules, statistics, or history). If a new value falls outside a tolerance band, you flag it.
Mental model
Think of each metric as a heartbeat. Healthy heartbeats vary but follow a rhythm (seasonality). You learn the rhythm from recent history and raise a flag when beats are too fast, too slow, or missing.
Core methods you can use today
1) Simple thresholds (fastest)
Define fixed min/max bounds from domain knowledge.
- Pros: Easy, interpretable, no history needed.
- Cons: Not adaptive to growth or seasonality.
- Use when: You know safe ranges (e.g., email open_rate between 0 and 0.8).
-- Absolute threshold example
SELECT date, order_count,
CASE WHEN order_count < 100 THEN 'ANOMALY' END AS flag
FROM daily_orders;
2) Z-score / standard deviation bands
Compare value to rolling mean and standard deviation. Flag when |z| > k (commonly 3).
- Pros: Adapts to level changes, simple math.
- Cons: Sensitive to outliers; assumes roughly normal noise.
- Use when: Metric is fairly symmetric and stable in variance.
-- 28-day rolling baseline with z-score
WITH b AS (
SELECT
date,
metric,
AVG(metric) OVER (ORDER BY date ROWS BETWEEN 28 PRECEDING AND 1 PRECEDING) AS mu,
STDDEV_POP(metric) OVER (ORDER BY date ROWS BETWEEN 28 PRECEDING AND 1 PRECEDING) AS sigma
FROM daily_metric
)
SELECT *,
CASE WHEN sigma > 0 AND ABS((metric - mu)/sigma) >= 3 THEN 'ANOMALY' END AS flag
FROM b;
3) IQR / robust bounds (for outliers)
Use median and interquartile range (IQR). Flag if value < Q1 - 1.5*IQR or > Q3 + 1.5*IQR.
- Pros: Robust to extreme values.
- Cons: Needs a history window; less sensitive to subtle shifts.
- Use when: Data has outliers or is non-normal.
4) Rolling seasonal baselines
Compare today to the same weekday or to a rolling 7/14/28-day window to respect weekly patterns.
- Pros: Handles seasonality; reduces false positives.
- Cons: Needs enough history; window choices matter.
- Use when: You have day-of-week effects (traffic lower on weekends).
5) Percent change monitors
Flag if value changes by more than X% vs baseline (e.g., previous day or rolling mean).
- Pros: Unit-free, business-friendly.
- Cons: Can over-trigger on low volumes.
- Use when: Stakeholders care about relative moves.
6) Freshness and volume checks
Monitor time since last update and daily row counts.
- Pros: Essential for pipelines; very interpretable.
- Cons: Doesn’t capture subtle data drift.
- Use when: Ensuring SLAs and job success.
7) Distribution drift (PSI/K-S) — simple view
Compare current distribution to historical. Flag if drift score exceeds threshold.
- Pros: Catches silent schema/data changes.
- Cons: More setup; choose bins and reference period carefully.
- Use when: Monitoring critical categorical/continuous fields.
Worked examples
Example 1 — Daily orders z-score monitor
Goal: Flag extreme spikes/drops in daily order_count.
WITH hist AS (
SELECT
date, order_count,
AVG(order_count) OVER (ORDER BY date ROWS BETWEEN 28 PRECEDING AND 1 PRECEDING) AS mu,
STDDEV_POP(order_count) OVER (ORDER BY date ROWS BETWEEN 28 PRECEDING AND 1 PRECEDING) AS sigma
FROM daily_orders
)
SELECT date, order_count,
ROUND((order_count - mu)/NULLIF(sigma,0), 2) AS z,
CASE WHEN sigma > 0 AND ABS((order_count - mu)/sigma) >= 3 THEN 'ANOMALY' END AS flag
FROM hist
ORDER BY date DESC
LIMIT 14;
Interpretation: If z ≤ -3, investigate missing loads or upstream filters. If z ≥ 3, check duplicate loads or currency issues.
Example 2 — Freshness SLA
Goal: Ensure table updated within 2 hours of schedule.
SELECT table_name,
TIMESTAMPDIFF('minute', last_updated_at, CURRENT_TIMESTAMP) AS minutes_since_update,
CASE WHEN TIMESTAMPDIFF('minute', last_updated_at, CURRENT_TIMESTAMP) > 120 THEN 'STALE' END AS flag
FROM table_freshness_status;
Interpretation: A STALE flag triggers a pipeline check and a rollback/reload if needed.
Example 3 — Revenue percent-change with weekday baseline
Goal: Alert if today’s revenue differs by more than 35% vs the average of the same weekday over the last 8 weeks.
WITH w AS (
SELECT *, EXTRACT(DOW FROM date) AS dow FROM daily_revenue
), ref AS (
SELECT w1.date, w1.revenue,
(SELECT AVG(w2.revenue) FROM w w2
WHERE w2.dow = w1.dow AND w2.date BETWEEN w1.date - INTERVAL '56 day' AND w1.date - INTERVAL '7 day') AS dow_avg
FROM w w1
)
SELECT date, revenue,
ROUND(100.0 * (revenue - dow_avg)/NULLIF(dow_avg,0), 1) AS pct_diff,
CASE WHEN ABS((revenue - dow_avg)/NULLIF(dow_avg,0)) >= 0.35 THEN 'ANOMALY' END AS flag
FROM ref
ORDER BY date DESC
LIMIT 8;
Interpretation: Reduces false alarms by respecting weekly seasonality.
How to implement safely (step-by-step)
- Choose metrics: counts, sums, rates, null_rate, distinct_rate, freshness minutes.
- Pick baselines: thresholds for SLAs; rolling mean/std or IQR for values; weekday averages for seasonal metrics.
- Build a monitoring table: one row per metric-date with baseline, bounds, and flag.
- Add context: include pipeline/job ids, source table, and last code deploy hash if available.
- Alert wisely: group anomalies, add severity (high when zero-rows or stale), and include an “how to triage” runbook snippet.
- Review and tune: track precision/recall of alerts; adjust windows and thresholds to reduce noise.
Exercises
Try these and compare with the solutions. Progress is saved for logged-in users; the exercises work for everyone.
Exercise 1 — Rolling z-score alert
Create SQL to compute a 21-day rolling mean and std for daily_sessions and flag anomalies when |z| ≥ 3. Output: date, daily_sessions, mu, sigma, z, flag.
Hints
- Use a window frame: ROWS BETWEEN 21 PRECEDING AND 1 PRECEDING.
- Guard division by zero with NULLIF(sigma, 0).
Expected output
Recent dates with a column flag = 'ANOMALY' for extreme spikes/drops.
Exercise 2 — IQR bounds for null_rate
For a daily null_rate_email metric (0..1), compute Q1, Q3, and IQR over the last 30 historical days (excluding today). Flag today as anomalous if it falls outside [Q1 - 1.5*IQR, Q3 + 1.5*IQR].
Hints
- Use APPROX_PERCENTILE or PERCENTILE_CONT if your warehouse supports it.
- Exclude the current date from the reference window.
Expected output
A single row for today with columns: q1, q3, iqr, lower_bound, upper_bound, flag.
Exercise checklist
- Window functions used correctly
- Reference window excludes the current row
- Division-by-zero guarded
- Bounds and flags readable by non-technical teammates
Common mistakes and self-check
- Too little history: Using 3–5 days leads to unstable baselines. Self-check: Do alerts swing wildly? Use ≥ 21–28 days where possible.
- Ignoring seasonality: Comparing weekends to weekdays. Self-check: Plot by weekday averages and compare.
- Alerting on noise: Using |z| ≥ 2 on highly volatile metrics. Self-check: Track false positives and tighten to 3 or use IQR.
- No freshness guardrails: Value checks won’t help if the table didn’t load. Self-check: Always include a freshness SLA.
- Single-point spikes only: Missing level shifts. Self-check: Require 2+ consecutive anomalies or use a rolling mean comparison.
Practical projects (portfolio-ready)
- Metric monitor mart: Build a table that tracks 5 metrics (volume, revenue, null_rate, dedupe_rate, freshness_min). Add rolling baselines and flags.
- Seasonal alerting: Implement weekday-aware revenue alerts and compare false positive counts vs non-seasonal alerts over one month.
- Drift watch: For a signup funnel, monitor country distribution weekly. Compute a simple PSI and alert when PSI ≥ 0.25.
Mini challenge
You have a daily active_users metric with strong weekend dips and occasional marketing spikes. Design a detection approach that balances sensitivity and noise. Specify:
- Baseline choice (e.g., weekday-average last 8 weeks)
- Alert rule (e.g., ±30% vs baseline OR |z| ≥ 3)
- Noise control (e.g., require 2 consecutive days or severity tiers)
Suggested approach
Use weekday-average over 8–12 weeks; alert when |pct_diff| ≥ 30% or |z| ≥ 3; require 2 consecutive anomalies for low-severity, immediate alert for high-severity (drop > 60% or zero).
Learning path
- Start: Freshness and volume monitors for each core table.
- Next: Rolling baselines (z-score/IQR) for key business metrics.
- Then: Seasonality-aware comparisons (weekday or 7/28-day).
- Advanced: Distribution drift for critical columns; control charts for processes.
Next steps
- Implement 1 freshness and 2 value monitors today.
- Set alert severities and add runbook steps to each alert.
- Review alert precision weekly and tune thresholds/windows.