luvv to helpDiscover the Best Free Online Tools
Topic 9 of 13

Anomaly Spotting

Learn Anomaly Spotting for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 19, 2025 | Updated: December 19, 2025

Why this matters

Anomaly spotting is a core skill in exploratory analysis. You will use it to:

  • Monitor product and business metrics (e.g., sudden drop in conversions).
  • Catch data quality issues early (e.g., missing batches, duplicated events).
  • Detect fraud or abuse (e.g., unusual spike in refund rate).
  • Find operational problems (e.g., API latency surge after a deploy).

Done well, it saves time, prevents bad decisions, and guides targeted investigation.

Concept explained simply

An anomaly is a data point or pattern that deviates from what you would reasonably expect given the usual behavior and context.

Think of your data as a daily commute. Some days are a bit faster or slower (normal variation). A closed bridge is a clear anomaly. Context matters: a rainy Monday may be slower than a sunny Sunday.

Mental model

  • Baseline: What is normal? Estimate it with statistics (median/IQR, mean/std) and context (season, weekday, campaign).
  • Surprise score: How far is the new point from the baseline? Use Z-score, robust Z (median/MAD), or IQR fences.
  • Context gates: Compare apples to apples (e.g., Mondays vs Mondays, region A vs region A).
  • Confirm and explain: Visualize, segment, and rule out data issues before declaring a true anomaly.
Robust vs classical measures

Classical: mean and standard deviation. Sensitive to outliers. Robust: median and MAD (Median Absolute Deviation) or IQR (Interquartile Range). Prefer robust methods when you expect outliers or skewed data.

Common thresholds
  • Z-score: |z| > 3 (classical), or robust |zrobust| > 3.5.
  • IQR rule: below Q1 − 1.5×IQR or above Q3 + 1.5×IQR.

Thresholds are heuristics. Tune them to balance misses vs false alarms.

Types of anomalies

  • Point anomaly: a single data point is extreme.
  • Contextual anomaly: unusual relative to context (e.g., low sales for Black Friday).
  • Collective anomaly: a sequence deviates (e.g., gradual drift or plateau).
  • Change-point: the underlying level/variance changes (e.g., after a release).
  • Data-quality anomaly: missing/duplicated data, schema change, delayed loads.

How to spot anomalies step-by-step

  1. Define the metric and grain: what are you measuring and at what frequency (hourly/daily) or segment (country, device)?
  2. Visualize: line plot or histogram. Look for sudden spikes/drops, flatlines, or variance changes.
  3. Pick a baseline: use recent history, same weekday, or the same season. For skewed data, start with median/IQR or MAD.
  4. Compute a surprise score: Z-score, robust Z, or IQR fences. For time series with seasonality, decompose or compare within the same context (e.g., Mondays).
  5. Flag candidates: apply a threshold (e.g., |z| > 3, or outside IQR fences).
  6. Segment to confirm: split by channel, region, device. True anomalies often appear in some segments but not all.
  7. Rule out data issues: check missing data, duplicates, delayed ingestion, tracking changes, recent ETL changes.
  8. Document and act: write what happened, suspected cause, and follow-up checks. Share plots.
Time series tip

Remove seasonality (e.g., 7-day rolling median or STL decomposition). Then apply anomaly rules to residuals.

Worked examples

Example 1: Daily orders spike (robust Z with MAD)

Data (orders over 14 days): 98, 102, 101, 99, 100, 97, 250, 103, 96, 102, 99, 101, 98, 20

  1. Median = 99.5
  2. MAD = 2.0 (median of absolute deviations)
  3. Robust Z = 0.6745 × (x − 99.5) / 2.0
  4. Flags: Day 7 (250) and Day 14 (20) have |robust Z| ≫ 3.5 → anomalies.

Example 2: Conversion rate drop (contextual)

Weekday conversion rate baseline (past 8 Mondays): mean 3.2%, std 0.2%. Today Monday is 2.5%.

  • Z = (2.5 − 3.2) / 0.2 = −3.5 → candidate anomaly.
  • Segment by traffic source: drop concentrated in Paid Search. Check ad changes and landing page.

Example 3: IQR for session duration

Sample durations (min): 1.2, 1.3, 1.4, 1.5, 1.6, 8.0

  • Q1 = 1.3, Q3 = 1.55 → IQR = 0.25
  • Upper fence = 1.55 + 1.5×0.25 = 1.925 → 8.0 is an outlier.
Why robust methods shine

One extreme point can inflate mean and std, hiding true anomalies. Median/MAD and IQR resist this influence.

Practical checklist

  • I plotted the series and noted context (weekday, season, release, campaign).
  • I chose a baseline that matches context (same weekday/segment).
  • I used robust stats (median/MAD or IQR) when data looked skewed.
  • I set a clear threshold and kept it consistent for the analysis.
  • I segmented to validate the anomaly and reduce false positives.
  • I checked for data-quality issues before root cause analysis.
  • I documented findings and next steps with plots.

Exercises

Note: The quick test is available to everyone; only logged-in users get saved progress.

Exercise 1 — Flag anomalies with median and MAD

Dataset (daily orders over 14 days):

98, 102, 101, 99, 100, 97, 250, 103, 96, 102, 99, 101, 98, 20
  1. Compute the median and MAD (Median Absolute Deviation) for the full series.
  2. Compute robust Z for each point: rz = 0.6745 × (x − median) / MAD.
  3. Flag anomalies where |rz| > 3.5.
Need a nudge?
  • Median is the middle of the sorted list; with even N, average the two middle values.
  • MAD is the median of the absolute deviations from the median.

Common mistakes and self-check

  • Using global mean/std on seasonal data: leads to false positives. Self-check: compare within same weekday/season.
  • Declaring anomalies without segmentation: you may miss the real source. Self-check: slice by channel/region/device.
  • Ignoring data issues: a late ETL can look like a drop. Self-check: verify counts, nulls, pipeline logs, and recent schema changes.
  • Threshold hopping: changing thresholds until you get the answer you want. Self-check: predefine and justify thresholds.
  • Overfitting the window: using too small a baseline window. Self-check: test stability across adjacent windows.
Self-audit mini list
  • Did I choose the right context for baseline?
  • Did I visualize raw and residual (de-seasonalized) series?
  • Did I verify data completeness and timeliness?

Mini challenge

You monitor three metrics: daily signups, activation rate, and support tickets. Signups are normal, activation rate drops by 20% on mobile iOS only, and support tickets spike for “payment fail”. Outline a 5-step plan to confirm anomaly and find the cause using segmentation and robust baselines. Write your steps and checks.

Who this is for

  • Data analysts who explore and monitor metrics.
  • Anyone owning dashboards, alerts, or experiment monitoring.

Prerequisites

  • Comfort with basic statistics (mean, median, variance, percentiles).
  • Ability to plot time series and distributions.
  • Basic spreadsheet or Python/R skills for simple calculations.

Learning path

  1. Descriptive stats refresh (center, spread, percentiles).
  2. Visual EDA for time series and distributions.
  3. Robust anomaly rules (IQR, MAD-based Z).
  4. Contextual analysis (seasonality, segmentation).
  5. Root cause routines and documentation.

Practical projects

  • Build a weekly anomaly review: pick 3 KPIs, define baselines, apply robust detection, summarize findings.
  • Create a one-pager playbook: checklist, thresholds, and data quality checks your team can reuse.
  • Simulate anomalies by injecting spikes/drops into sample data and verify your method catches them.

Next steps

  • Apply robust baselines to your top KPI and set a consistent threshold.
  • Introduce one segmentation cut to validate anomalies (e.g., device type).
  • Design a simple anomalies log with date, metric, method, threshold, segments, and outcome.

Practice Exercises

1 exercises to complete

Instructions

Use the dataset (daily orders over 14 days):

98, 102, 101, 99, 100, 97, 250, 103, 96, 102, 99, 101, 98, 20
  1. Compute the median and MAD for the series.
  2. Compute robust Z for each point: rz = 0.6745 × (x − median) / MAD.
  3. Flag anomalies where |rz| > 3.5 and list their day indices (1-based).
Expected Output
Anomalies flagged at days 7 and 14.

Anomaly Spotting — Quick Test

Test your knowledge with 7 questions. Pass with 70% or higher.

7 questions70% to pass

Have questions about Anomaly Spotting?

AI Assistant

Ask questions about this tool