Cohort Analysis for Product Analysts
Cohort analysis groups users by a shared starting event (such as signup, first purchase, or first feature use) and tracks their behavior over time. For a Product Analyst, it is the backbone of retention, monetization, and feature adoption insights. It helps you answer: Are newer users sticking around? Which acquisition channels bring higher LTV? Did a feature release improve activation for recent cohorts?
Who this is for
- Product Analysts and Data Analysts working with event data and product KPIs
- Growth/PM professionals who need evidence for decisions on activation, retention, and monetization
- Engineers or marketers collaborating on data-driven product changes
Prerequisites
- Basic SQL (SELECT, WHERE, GROUP BY, JOIN)
- Comfort with time functions (DATE_TRUNC, DATE_DIFF) and window functions (LAG, MIN)
- Familiarity with your product's key events: signup, activation, retention activity, purchase
Why it matters
- Retention: See if product changes improve how long users stay active
- Feature adoption: Track which cohorts adopt features and how quickly
- LTV and payback: Understand revenue accumulation by cohort and acquisition channel
- Quality of growth: Separate growth volume from user quality by channel and campaign
Learning path
- Define cohorts by key event: signup, first activation, first purchase, or first feature use
- Build retention tables: day/week/month retention and survival curves
- Rolling retention: compute and interpret rolling vs classic retention
- Adoption & monetization: feature adoption cohorts, revenue per cohort, LTV curves
- Channel cohorts: acquisition quality and payback by channel
- Interpret shifts: isolate product changes vs audience mix or seasonality
- Communicate insights: crisp narrative, caveats, and next steps
Milestone checklist
- Identify and document your key cohort-defining event
- Create a cohort_id such as signup_month for each user
- Build a retention pivot (cohort rows x age-in-days columns)
- Compute week 1, week 4, and month 3 retention
- Build rolling retention for comparison
- Chart LTV per cohort and note payback time per channel
- Write a 5-sentence insight summary with an action recommendation
Worked examples
Example 1: Signup cohort retention (SQL)
Tables: users(user_id, signup_at, channel), events(user_id, event_name, event_time).
-- 1) Derive each user's signup cohort (month)
WITH u AS (
SELECT
user_id,
DATE_TRUNC('month', signup_at) AS cohort_month,
signup_at
FROM users
),
-- 2) Daily activity after signup
activity AS (
SELECT
u.cohort_month,
e.user_id,
DATE_DIFF('day', u.signup_at, e.event_time) AS day_age
FROM u
JOIN events e ON e.user_id = u.user_id
WHERE e.event_time >= u.signup_at
AND e.event_name = 'active'
),
-- 3) Cohort size (denominator)
cohort_size AS (
SELECT cohort_month, COUNT(DISTINCT user_id) AS users
FROM u
GROUP BY 1
),
-- 4) Retention per day_age
retention AS (
SELECT cohort_month, day_age, COUNT(DISTINCT user_id) AS retained
FROM activity
WHERE day_age BETWEEN 0 AND 30
GROUP BY 1,2
)
SELECT r.cohort_month, r.day_age,
ROUND(100.0 * r.retained / cs.users, 2) AS retention_pct
FROM retention r
JOIN cohort_size cs USING (cohort_month)
ORDER BY 1,2;
Tip: day_age 0 is D0 retention (users active on signup day). Use consistent timezones for signup_at and event_time.
Example 2: Rolling retention (SQL)
Rolling retention at day N means the user was active on or after day N (not exactly on day N). This often appears higher than classic retention.
WITH u AS (
SELECT user_id, signup_at, DATE_TRUNC('month', signup_at) AS cohort_month
FROM users
),
last_seen AS (
SELECT u.user_id, u.cohort_month,
MAX(DATE_DIFF('day', u.signup_at, e.event_time)) AS max_day_age
FROM u
LEFT JOIN events e ON e.user_id = u.user_id AND e.event_name='active'
GROUP BY 1,2
),
cohort_size AS (
SELECT cohort_month, COUNT(*) AS users FROM u GROUP BY 1
)
SELECT l.cohort_month, n AS day_age,
ROUND(100.0 * COUNT_IF(l.max_day_age >= n) / cs.users, 2) AS rolling_retention_pct
FROM last_seen l
JOIN cohort_size cs USING (cohort_month)
CROSS JOIN UNNEST(GENERATE_ARRAY(0, 30)) AS n
GROUP BY 1,2
ORDER BY 1,2;
Example 3: Feature adoption cohorts
Define a cohort by users' first use of a specific feature and measure subsequent retention or usage intensity.
WITH first_feature AS (
SELECT user_id, MIN(event_time) AS first_use_at
FROM events
WHERE event_name = 'use_ai_editor'
GROUP BY 1
),
cohorts AS (
SELECT user_id, DATE_TRUNC('week', first_use_at) AS cohort_week, first_use_at
FROM first_feature
),
next_activity AS (
SELECT c.cohort_week,
DATE_DIFF('day', c.first_use_at, e.event_time) AS day_age,
e.user_id
FROM cohorts c
JOIN events e ON e.user_id = c.user_id
WHERE e.event_time >= c.first_use_at AND e.event_name='active'
)
SELECT cohort_week, day_age, COUNT(DISTINCT user_id) AS users_active
FROM next_activity
WHERE day_age BETWEEN 0 AND 14
GROUP BY 1,2
ORDER BY 1,2;
Adoption rate baseline: distinct users who ever triggered the feature / eligible user base (during period).
Example 4: Monetization cohorts and LTV
Compute cumulative revenue per cohort and LTV curve. Tables: payments(user_id, amount, paid_at).
WITH u AS (
SELECT user_id, DATE_TRUNC('month', signup_at) AS cohort_month, signup_at
FROM users
),
rev AS (
SELECT u.cohort_month,
DATE_DIFF('day', u.signup_at, p.paid_at) AS day_age,
p.amount
FROM payments p
JOIN u ON u.user_id = p.user_id
WHERE p.paid_at >= u.signup_at
),
cohort_size AS (
SELECT cohort_month, COUNT(DISTINCT user_id) AS users FROM u GROUP BY 1
),
ltv AS (
SELECT cohort_month, day_age, SUM(amount) AS revenue
FROM rev
WHERE day_age BETWEEN 0 AND 180
GROUP BY 1,2
)
SELECT l.cohort_month, l.day_age,
ROUND(SUM(l.revenue) OVER (PARTITION BY l.cohort_month ORDER BY l.day_age
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) / cs.users, 2) AS ltv_per_user
FROM ltv l
JOIN cohort_size cs USING (cohort_month)
ORDER BY 1,2;
Interpretation: payback occurs when cumulative gross margin per user exceeds acquisition cost per user. If you only have revenue, consider an estimated margin rate for approximation.
Example 5: Channel-based cohorts
Compare retention and LTV for cohorts split by acquisition channel.
WITH base AS (
SELECT user_id,
DATE_TRUNC('month', signup_at) AS cohort_month,
channel,
signup_at
FROM users
),
active AS (
SELECT b.cohort_month, b.channel,
DATE_DIFF('day', b.signup_at, e.event_time) AS day_age,
e.user_id
FROM base b
JOIN events e ON e.user_id = b.user_id AND e.event_name='active'
WHERE e.event_time >= b.signup_at
),
cohort_size AS (
SELECT cohort_month, channel, COUNT(DISTINCT user_id) AS users
FROM base
GROUP BY 1,2
),
ret AS (
SELECT cohort_month, channel, day_age, COUNT(DISTINCT user_id) AS retained
FROM active
WHERE day_age IN (0,7,28)
GROUP BY 1,2,3
)
SELECT r.cohort_month, r.channel, r.day_age,
ROUND(100.0 * r.retained / cs.users, 2) AS retention_pct
FROM ret r
JOIN cohort_size cs USING (cohort_month, channel)
ORDER BY 1,2,3;
Readout: channels with higher D28 retention or higher 90-day LTV merit budget scaling; validate attribution consistency before decisions.
Example 6: Interpreting cohort shifts
Scenario: D7 retention improved for the last two signup cohorts, but LTV is flat. Hypotheses:
- Activation UX improved (higher early activity), but monetization step unchanged
- Acquisition mix shifted to a less monetizing audience
- Pricing test increased trial starts (retention proxy) but decreased conversion
How to validate: slice retention and conversion by channel and device; compare funnel step-through for old vs new cohorts; check seasonality and promo calendars using previous-year cohorts.
Build a retention chart: step-by-step
- Pick a key event (e.g., signup) and time grain (day or week).
- Create cohort_id by truncating the key event timestamp to your reporting grain (week or month).
- Compute age as event_time minus key event time.
- Count distinct active users by cohort_id and age.
- Divide by cohort size (distinct users in that cohort).
- Visualize as a heatmap or line per cohort.
Quick data checks before plotting
- Timezones consistent for signup_at and event_time
- No duplicate users in cohort size
- Filter future ages beyond current max age to avoid downward bias
- Stable denominator: exclude test users and internal traffic
Drills and exercises
- Write a query that produces D0, D1, D7, D28 retention for monthly signup cohorts
- Convert classic retention to rolling retention for the same dataset and compare the curves
- Build a feature adoption cohort for a chosen event and compute 7-day adoption rate
- Create a 90-day LTV curve by signup cohort and identify payback day using an assumed margin rate
- Separate cohorts by channel and device; note which combination underperforms by D28
- Draft a 5-line narrative explaining a cohort shift and one recommended experiment
Common mistakes and debugging tips
Mixing timezones or calendar grains
Always align to a single timezone. If product logs UTC but dashboards show local time, cohort edges will misalign. Use explicit conversions and document the standard.
Leaking future users into earlier cohorts
Derive cohort_id only from the key event timestamp for each user. Do not use first activity after signup to set cohort boundaries.
Denominator drift
Retention denominators should be the original cohort population. Do not subtract churned users or add late signups. Exclude known test/internal users up front.
Age censoring
For the most recent cohorts, later ages are incomplete. Either hide ages beyond available exposure or annotate that values are censored.
Attribution inconsistency in channel cohorts
Confirm that channel is measured at the same moment (e.g., signup attribution) across cohorts. Shifts in attribution models can masquerade as performance changes.
Mini project: Activation-to-Revenue cohort dashboard
Goal: Build a single view that tracks signup cohorts from activation through revenue and highlights channel differences.
- Define signup cohort_month and cohort_size
- Pick an activation event (e.g., completed_onboarding) and compute activation rate per cohort
- Compute D7 and D28 classic retention from 'active' events
- Compute 90-day LTV per cohort; estimate payback using assumed margin and CPA
- Slice all metrics by acquisition channel
- Write a short narrative: biggest positive/negative cohort shifts and your next experiment
Deliverables checklist
- Cohort table with columns: cohort_month, size, activation_rate, D7_ret, D28_ret, LTV_90d
- One line chart for retention and one for LTV by cohort
- Bullet summary with one action per insight
Communicating cohort insights
Use this simple structure:
- Headline: what changed and by how much (e.g., D28 retention up +3.1 pp for Feb cohort)
- Drivers: which segments or channels drove the change
- Confidence: known caveats (censoring, attribution shifts)
- Action: one recommended experiment or rollout decision
- Follow-up metric: what you will monitor next
Subskills
- Cohort Definition By Key Event — Define cohorts by signup, activation, purchase, or first feature use; choose grains and IDs wisely. Time: 45–90 min
- Retention Curve Analysis — Build classic retention tables and read decay patterns and curves. Time: 45–90 min
- Rolling Retention — Compute rolling retention and interpret differences vs classic. Time: 40–70 min
- Feature Adoption Cohorts — Measure adoption over time and compare cohorts. Time: 45–90 min
- Monetization Cohorts — Build LTV curves and estimate payback. Time: 60–120 min
- Channel Based Cohorts — Compare cohort quality by acquisition channel. Time: 45–90 min
- Interpreting Cohort Shifts — Separate product effects from audience and seasonality. Time: 45–90 min
- Communicating Cohort Insights — Turn findings into a concise narrative and action plan. Time: 30–60 min
Next steps
- Apply cohorts to your last 6 months of signups; present a one-page readout
- Pair cohort analysis with funnels to find where retention breaks
- Learn segmentation and experimentation to validate cohort hypotheses
- Add survival analysis for advanced retention modeling when needed