Note: The quick test is available to everyone; log in to save your progress.
Why this matters
As a Product Analyst, you are the early warning system for product health. Well-designed alerts catch issues and opportunities before customers or executives do.
- Detect conversion drops after a release before revenue takes a hit.
- Catch data pipeline breaks by monitoring freshness and record counts.
- Spot anomalous spikes indicating fraud, promo effects, or tracking errors.
- Route clear, actionable notifications to the right people without alert fatigue.
Concept explained simply
An alert is a rule that watches a metric and notifies people when something unusual or important happens.
- Metric: What you track (e.g., DAU, signup conversion, revenue).
- Condition: When to fire (e.g., down 15% vs baseline, above 95th percentile).
- Baseline: What is normal (e.g., last 7 days median, same weekday average).
- Window: Time grain and period checked (e.g., hourly, daily, rolling 7 days).
- Gating: Minimum data volume or consecutive breaches to reduce noise.
- Routing: Who gets notified and where (Slack, email, pager).
- Runbook: What to do when it fires (first checks, owners, escalation).
Mental model: A good smoke detector
- Sensitive to real smoke, not to steam from the shower (use baselines and gates).
- Loud enough for the right people, not the entire building (targeted routing).
- Tested regularly with a checklist (review false positives and missed incidents).
Core setup steps
- Pick the metric and owner: Define exactly how it is calculated and who maintains it.
- Build a reliable query: Use the correct filters, time zone, and time grain. Exclude test traffic.
- Establish a baseline: Start with a 7-day median. For weekday seasonality, compare to the same weekday.
- Choose a condition: Relative change (e.g., down 20% vs baseline) is more robust than absolute thresholds. Add gates: minimum N events and 2 consecutive breaches.
- Set schedule and latency: Align alert frequency with data availability and pipeline delays.
- Route and escalate: Send low-severity to a channel; high-severity to an on-call role. Add quiet hours if needed.
- Document a runbook: Include top 3 checks, likely causes, dashboards to open, and escalation contacts.
Tips to reduce false positives
- Use medians or trimmed means to limit outlier impact.
- Compare to same weekday for daily metrics with strong seasonality.
- Require a minimum sample size (e.g., at least 500 sessions).
- Use two-sided bands when spikes are as important as drops.
Worked examples
Example 1: Signup conversion rate drop
- Metric: Signup conversion rate = signups / product sessions (daily).
- Baseline: 7-day median of same weekday.
- Condition: Today is 20% lower than baseline.
- Gates: At least 5,000 sessions today; 2 consecutive days below threshold.
- Routing: Product Growth channel and on-call analyst during business hours.
- Runbook: Check tracking events version, release notes, and funnel step drop.
Why this works
Relative change plus weekday baseline catches true drops while ignoring typical Monday vs Sunday differences.
Example 2: DAU anomaly with weekday seasonality
- Metric: DAU by UTC day.
- Baseline: Average of the last 4 occurrences of the same weekday.
- Condition: DAU is 12% below baseline.
- Gates: At least 80% of expected events ingested by noon; de-duplicate to 1 alert per day.
- Routing: Platform channel; escalate to SRE on persistent breach.
Extra guardrail
Add a data freshness alert on the raw events table to distinguish real DAU drops from ingestion delays.
Example 3: Revenue spike or drop (two-sided)
- Metric: Gross revenue (hourly).
- Baseline: Rolling 14-day hourly median for the same hour of day.
- Condition: Outside ±25% band vs baseline for 2 consecutive hours.
- Segments: Alert separately for top 3 countries by revenue share.
- Routing: Sales ops and finance; high-severity if impact exceeds a set dollar threshold.
Why two-sided
Spikes can indicate promo wins or fraud. Both deserve fast attention.
Quality checks and self-check
- Does the alert ignore expected seasonality and focus on unusual change?
- Is there a clear owner and a documented runbook?
- Can you reproduce the metric manually and match the alert logic?
- After two weeks, what is the false positive rate? Target less than 10%.
Exercises
Complete the exercise below. Then use the checklist to validate your setup.
- Create an alert for signup conversion rate with a weekday baseline and relative-change condition.
- Add volume gating and consecutive-breach logic.
- Write a 5-line runbook for when it fires.
Exercise checklist
- Baseline uses the same weekday or a seasonality-aware method.
- Condition is relative (percentage change) with a practical threshold.
- Gates: minimum events and consecutive breaches included.
- Routing is to a specific channel and role, not a large group.
- Runbook lists first checks and an escalation contact.
Common mistakes
- Using absolute thresholds for seasonal metrics. Fix: compare to a seasonality-aware baseline.
- Alerting on ratios with tiny denominators. Fix: add minimum volume gates.
- Firing on single-point noise. Fix: require 2 consecutive breaches.
- Ignoring data latency. Fix: align schedule with ingestion and add freshness alerts.
- Sending alerts to everyone. Fix: route to a small, accountable group.
Mini challenge
Your mobile funnel step-2 completion rate is naturally lower on weekends. Design an alert that avoids weekend false alarms but still catches true drops. Write your baseline, condition, gates, and who gets notified.
Hint
Use a same-weekday or same-day-of-week hourly baseline and a relative threshold with minimum session counts.
Practical projects
- Product health pack: Build a dashboard with 5 alerts (DAU, conversion, revenue, error rate, data freshness) and a shared runbook.
- Release guardrails: Create temporary high-sensitivity alerts for 2 weeks after major releases; auto-disable after the window.
- Segmented alerts: Clone a revenue alert per top market and compare false-positive rates vs a global alert.
Who this is for
- Product Analysts who monitor product health and feature impacts.
- Data and BI Analysts responsible for metric quality and reporting.
Prerequisites
- Comfort with core metrics, ratios, and time-series basics.
- Ability to build reliable queries or dashboard tiles for your BI tool.
- Familiarity with seasonality and basic statistics (median, percent change).
Learning path
- Before this: Metric definitions and governance; building dashboards.
- Alongside: Data freshness monitoring; incident response basics.
- After this: Anomaly detection tuning; automated post-incident reviews.
Next steps
- Run a two-week pilot of your alerts and track false positives and misses.
- Refine thresholds and gates based on observed behavior.
- Share the runbook and do a 10-minute team drill.
Quick Test
Take the quick test to check your understanding. Everyone can take it; log in to save your progress.