Topic Not Found

Who this is for

You build or maintain analytics pipelines and dashboards, and you want practical ways to catch spikes, drops, delays, and data drift before stakeholders do. Suitable for analytics engineers, BI developers, and data-savvy analysts.

Prerequisites

Basic SQL (GROUP BY, WINDOW functions)
Understanding of KPIs/metrics (counts, sums, rates)
Familiarity with time series concepts (daily/weekly granularity)

Why this matters

In real analytics engineering work, you will:

Detect broken pipelines when daily row counts drop to near zero.
Catch revenue spikes caused by duplicate loads before they land in executive reports.
Spot freshness issues when tables stop updating after a failed job.
Identify schema or distribution drift (e.g., suddenly 40% null emails) to prevent silent data quality degradation.

Anomaly detection is a safety net that triggers timely, actionable alerts with minimal noise.

Concept explained simply

An anomaly is a data point or pattern that departs from what’s normal for your metric. You define “normal” by a baseline (rules, statistics, or history). If a new value falls outside a tolerance band, you flag it.

Mental model

Think of each metric as a heartbeat. Healthy heartbeats vary but follow a rhythm (seasonality). You learn the rhythm from recent history and raise a flag when beats are too fast, too slow, or missing.

Core methods you can use today

1) Simple thresholds (fastest)

Define fixed min/max bounds from domain knowledge.

Pros: Easy, interpretable, no history needed.
Cons: Not adaptive to growth or seasonality.
Use when: You know safe ranges (e.g., email open_rate between 0 and 0.8).

-- Absolute threshold example
SELECT date, order_count,
       CASE WHEN order_count < 100 THEN 'ANOMALY' END AS flag
FROM daily_orders;

2) Z-score / standard deviation bands

Compare value to rolling mean and standard deviation. Flag when |z| > k (commonly 3).

Pros: Adapts to level changes, simple math.
Cons: Sensitive to outliers; assumes roughly normal noise.
Use when: Metric is fairly symmetric and stable in variance.

-- 28-day rolling baseline with z-score
WITH b AS (
  SELECT
    date,
    metric,
    AVG(metric) OVER (ORDER BY date ROWS BETWEEN 28 PRECEDING AND 1 PRECEDING) AS mu,
    STDDEV_POP(metric) OVER (ORDER BY date ROWS BETWEEN 28 PRECEDING AND 1 PRECEDING) AS sigma
  FROM daily_metric
)
SELECT *,
  CASE WHEN sigma > 0 AND ABS((metric - mu)/sigma) >= 3 THEN 'ANOMALY' END AS flag
FROM b;

3) IQR / robust bounds (for outliers)

Use median and interquartile range (IQR). Flag if value < Q1 - 1.5*IQR or > Q3 + 1.5*IQR.

Pros: Robust to extreme values.
Cons: Needs a history window; less sensitive to subtle shifts.
Use when: Data has outliers or is non-normal.

4) Rolling seasonal baselines

Compare today to the same weekday or to a rolling 7/14/28-day window to respect weekly patterns.

Pros: Handles seasonality; reduces false positives.
Cons: Needs enough history; window choices matter.
Use when: You have day-of-week effects (traffic lower on weekends).

5) Percent change monitors

Flag if value changes by more than X% vs baseline (e.g., previous day or rolling mean).

Pros: Unit-free, business-friendly.
Cons: Can over-trigger on low volumes.
Use when: Stakeholders care about relative moves.

6) Freshness and volume checks

Monitor time since last update and daily row counts.

Pros: Essential for pipelines; very interpretable.
Cons: Doesn’t capture subtle data drift.
Use when: Ensuring SLAs and job success.

7) Distribution drift (PSI/K-S) — simple view

Compare current distribution to historical. Flag if drift score exceeds threshold.

Pros: Catches silent schema/data changes.
Cons: More setup; choose bins and reference period carefully.
Use when: Monitoring critical categorical/continuous fields.

Worked examples

Example 1 — Daily orders z-score monitor

Goal: Flag extreme spikes/drops in daily order_count.

WITH hist AS (
  SELECT
    date, order_count,
    AVG(order_count) OVER (ORDER BY date ROWS BETWEEN 28 PRECEDING AND 1 PRECEDING) AS mu,
    STDDEV_POP(order_count) OVER (ORDER BY date ROWS BETWEEN 28 PRECEDING AND 1 PRECEDING) AS sigma
  FROM daily_orders
)
SELECT date, order_count,
       ROUND((order_count - mu)/NULLIF(sigma,0), 2) AS z,
       CASE WHEN sigma > 0 AND ABS((order_count - mu)/sigma) >= 3 THEN 'ANOMALY' END AS flag
FROM hist
ORDER BY date DESC
LIMIT 14;

Interpretation: If z ≤ -3, investigate missing loads or upstream filters. If z ≥ 3, check duplicate loads or currency issues.

Example 2 — Freshness SLA

Goal: Ensure table updated within 2 hours of schedule.

SELECT table_name,
       TIMESTAMPDIFF('minute', last_updated_at, CURRENT_TIMESTAMP) AS minutes_since_update,
       CASE WHEN TIMESTAMPDIFF('minute', last_updated_at, CURRENT_TIMESTAMP) > 120 THEN 'STALE' END AS flag
FROM table_freshness_status;

Interpretation: A STALE flag triggers a pipeline check and a rollback/reload if needed.

Example 3 — Revenue percent-change with weekday baseline

Goal: Alert if today’s revenue differs by more than 35% vs the average of the same weekday over the last 8 weeks.

WITH w AS (
  SELECT *, EXTRACT(DOW FROM date) AS dow FROM daily_revenue
), ref AS (
  SELECT w1.date, w1.revenue,
         (SELECT AVG(w2.revenue) FROM w w2
           WHERE w2.dow = w1.dow AND w2.date BETWEEN w1.date - INTERVAL '56 day' AND w1.date - INTERVAL '7 day') AS dow_avg
  FROM w w1
)
SELECT date, revenue,
       ROUND(100.0 * (revenue - dow_avg)/NULLIF(dow_avg,0), 1) AS pct_diff,
       CASE WHEN ABS((revenue - dow_avg)/NULLIF(dow_avg,0)) >= 0.35 THEN 'ANOMALY' END AS flag
FROM ref
ORDER BY date DESC
LIMIT 8;

Interpretation: Reduces false alarms by respecting weekly seasonality.

How to implement safely (step-by-step)

Choose metrics: counts, sums, rates, null_rate, distinct_rate, freshness minutes.
Pick baselines: thresholds for SLAs; rolling mean/std or IQR for values; weekday averages for seasonal metrics.
Build a monitoring table: one row per metric-date with baseline, bounds, and flag.
Add context: include pipeline/job ids, source table, and last code deploy hash if available.
Alert wisely: group anomalies, add severity (high when zero-rows or stale), and include an “how to triage” runbook snippet.
Review and tune: track precision/recall of alerts; adjust windows and thresholds to reduce noise.

Exercises

Try these and compare with the solutions. Progress is saved for logged-in users; the exercises work for everyone.

Exercise 1 — Rolling z-score alert

Create SQL to compute a 21-day rolling mean and std for daily_sessions and flag anomalies when |z| ≥ 3. Output: date, daily_sessions, mu, sigma, z, flag.

Hints

Use a window frame: ROWS BETWEEN 21 PRECEDING AND 1 PRECEDING.
Guard division by zero with NULLIF(sigma, 0).

Expected output

Recent dates with a column flag = 'ANOMALY' for extreme spikes/drops.

Exercise 2 — IQR bounds for null_rate

For a daily null_rate_email metric (0..1), compute Q1, Q3, and IQR over the last 30 historical days (excluding today). Flag today as anomalous if it falls outside [Q1 - 1.5*IQR, Q3 + 1.5*IQR].

Hints

Use APPROX_PERCENTILE or PERCENTILE_CONT if your warehouse supports it.
Exclude the current date from the reference window.

Expected output

A single row for today with columns: q1, q3, iqr, lower_bound, upper_bound, flag.

Exercise checklist

Window functions used correctly
Reference window excludes the current row
Division-by-zero guarded
Bounds and flags readable by non-technical teammates

Common mistakes and self-check

Too little history: Using 3–5 days leads to unstable baselines. Self-check: Do alerts swing wildly? Use ≥ 21–28 days where possible.
Ignoring seasonality: Comparing weekends to weekdays. Self-check: Plot by weekday averages and compare.
Alerting on noise: Using |z| ≥ 2 on highly volatile metrics. Self-check: Track false positives and tighten to 3 or use IQR.
No freshness guardrails: Value checks won’t help if the table didn’t load. Self-check: Always include a freshness SLA.
Single-point spikes only: Missing level shifts. Self-check: Require 2+ consecutive anomalies or use a rolling mean comparison.

Practical projects (portfolio-ready)

Metric monitor mart: Build a table that tracks 5 metrics (volume, revenue, null_rate, dedupe_rate, freshness_min). Add rolling baselines and flags.
Seasonal alerting: Implement weekday-aware revenue alerts and compare false positive counts vs non-seasonal alerts over one month.
Drift watch: For a signup funnel, monitor country distribution weekly. Compute a simple PSI and alert when PSI ≥ 0.25.

Mini challenge

You have a daily active_users metric with strong weekend dips and occasional marketing spikes. Design a detection approach that balances sensitivity and noise. Specify:

Baseline choice (e.g., weekday-average last 8 weeks)
Alert rule (e.g., ±30% vs baseline OR |z| ≥ 3)
Noise control (e.g., require 2 consecutive days or severity tiers)

Suggested approach

Use weekday-average over 8–12 weeks; alert when |pct_diff| ≥ 30% or |z| ≥ 3; require 2 consecutive anomalies for low-severity, immediate alert for high-severity (drop > 60% or zero).

Learning path

Start: Freshness and volume monitors for each core table.
Next: Rolling baselines (z-score/IQR) for key business metrics.
Then: Seasonality-aware comparisons (weekday or 7/28-day).
Advanced: Distribution drift for critical columns; control charts for processes.

Next steps

Implement 1 freshness and 2 value monitors today.
Set alert severities and add runbook steps to each alert.
Review alert precision weekly and tune thresholds/windows.

Menu

Anomaly Detection Basics

Table of Contents

Who this is for

Prerequisites

Why this matters

Concept explained simply

Mental model

Core methods you can use today

Worked examples

How to implement safely (step-by-step)

Exercises

Exercise 1 — Rolling z-score alert

Exercise 2 — IQR bounds for null_rate

Exercise checklist

Common mistakes and self-check

Practical projects (portfolio-ready)

Mini challenge

Learning path

Next steps

Practice Exercises

Rolling z-score alert for daily_sessions

Instructions

Expected Output

IQR bounds for null_rate_email

Anomaly Detection Basics — Quick Test

Have questions about Anomaly Detection Basics?

AI Assistant