How to learn Cohort Analysis for Product Analyst for free

Cohort Analysis for Product Analysts

Cohort analysis groups users by a shared starting event (such as signup, first purchase, or first feature use) and tracks their behavior over time. For a Product Analyst, it is the backbone of retention, monetization, and feature adoption insights. It helps you answer: Are newer users sticking around? Which acquisition channels bring higher LTV? Did a feature release improve activation for recent cohorts?

Who this is for

Product Analysts and Data Analysts working with event data and product KPIs
Growth/PM professionals who need evidence for decisions on activation, retention, and monetization
Engineers or marketers collaborating on data-driven product changes

Prerequisites

Basic SQL (SELECT, WHERE, GROUP BY, JOIN)
Comfort with time functions (DATE_TRUNC, DATE_DIFF) and window functions (LAG, MIN)
Familiarity with your product's key events: signup, activation, retention activity, purchase

Why it matters

Retention: See if product changes improve how long users stay active
Feature adoption: Track which cohorts adopt features and how quickly
LTV and payback: Understand revenue accumulation by cohort and acquisition channel
Quality of growth: Separate growth volume from user quality by channel and campaign

Learning path

Define cohorts by key event: signup, first activation, first purchase, or first feature use
Build retention tables: day/week/month retention and survival curves
Rolling retention: compute and interpret rolling vs classic retention
Adoption & monetization: feature adoption cohorts, revenue per cohort, LTV curves
Channel cohorts: acquisition quality and payback by channel
Interpret shifts: isolate product changes vs audience mix or seasonality
Communicate insights: crisp narrative, caveats, and next steps

Milestone checklist

Identify and document your key cohort-defining event
Create a cohort_id such as signup_month for each user
Build a retention pivot (cohort rows x age-in-days columns)
Compute week 1, week 4, and month 3 retention
Build rolling retention for comparison
Chart LTV per cohort and note payback time per channel
Write a 5-sentence insight summary with an action recommendation

Worked examples

Example 1: Signup cohort retention (SQL)

Tables: users(user_id, signup_at, channel), events(user_id, event_name, event_time).

-- 1) Derive each user's signup cohort (month)
WITH u AS (
  SELECT
    user_id,
    DATE_TRUNC('month', signup_at) AS cohort_month,
    signup_at
  FROM users
),
-- 2) Daily activity after signup
activity AS (
  SELECT
    u.cohort_month,
    e.user_id,
    DATE_DIFF('day', u.signup_at, e.event_time) AS day_age
  FROM u
  JOIN events e ON e.user_id = u.user_id
  WHERE e.event_time >= u.signup_at
    AND e.event_name = 'active'
),
-- 3) Cohort size (denominator)
cohort_size AS (
  SELECT cohort_month, COUNT(DISTINCT user_id) AS users
  FROM u
  GROUP BY 1
),
-- 4) Retention per day_age
retention AS (
  SELECT cohort_month, day_age, COUNT(DISTINCT user_id) AS retained
  FROM activity
  WHERE day_age BETWEEN 0 AND 30
  GROUP BY 1,2
)
SELECT r.cohort_month, r.day_age,
       ROUND(100.0 * r.retained / cs.users, 2) AS retention_pct
FROM retention r
JOIN cohort_size cs USING (cohort_month)
ORDER BY 1,2;

Tip: day_age 0 is D0 retention (users active on signup day). Use consistent timezones for signup_at and event_time.

Example 2: Rolling retention (SQL)

Rolling retention at day N means the user was active on or after day N (not exactly on day N). This often appears higher than classic retention.

WITH u AS (
  SELECT user_id, signup_at, DATE_TRUNC('month', signup_at) AS cohort_month
  FROM users
),
last_seen AS (
  SELECT u.user_id, u.cohort_month,
         MAX(DATE_DIFF('day', u.signup_at, e.event_time)) AS max_day_age
  FROM u
  LEFT JOIN events e ON e.user_id = u.user_id AND e.event_name='active'
  GROUP BY 1,2
),
cohort_size AS (
  SELECT cohort_month, COUNT(*) AS users FROM u GROUP BY 1
)
SELECT l.cohort_month, n AS day_age,
       ROUND(100.0 * COUNT_IF(l.max_day_age >= n) / cs.users, 2) AS rolling_retention_pct
FROM last_seen l
JOIN cohort_size cs USING (cohort_month)
CROSS JOIN UNNEST(GENERATE_ARRAY(0, 30)) AS n
GROUP BY 1,2
ORDER BY 1,2;

Example 3: Feature adoption cohorts

Define a cohort by users' first use of a specific feature and measure subsequent retention or usage intensity.

WITH first_feature AS (
  SELECT user_id, MIN(event_time) AS first_use_at
  FROM events
  WHERE event_name = 'use_ai_editor'
  GROUP BY 1
),
cohorts AS (
  SELECT user_id, DATE_TRUNC('week', first_use_at) AS cohort_week, first_use_at
  FROM first_feature
),
next_activity AS (
  SELECT c.cohort_week,
         DATE_DIFF('day', c.first_use_at, e.event_time) AS day_age,
         e.user_id
  FROM cohorts c
  JOIN events e ON e.user_id = c.user_id
  WHERE e.event_time >= c.first_use_at AND e.event_name='active'
)
SELECT cohort_week, day_age, COUNT(DISTINCT user_id) AS users_active
FROM next_activity
WHERE day_age BETWEEN 0 AND 14
GROUP BY 1,2
ORDER BY 1,2;

Adoption rate baseline: distinct users who ever triggered the feature / eligible user base (during period).

Example 4: Monetization cohorts and LTV

Compute cumulative revenue per cohort and LTV curve. Tables: payments(user_id, amount, paid_at).

WITH u AS (
  SELECT user_id, DATE_TRUNC('month', signup_at) AS cohort_month, signup_at
  FROM users
),
rev AS (
  SELECT u.cohort_month,
         DATE_DIFF('day', u.signup_at, p.paid_at) AS day_age,
         p.amount
  FROM payments p
  JOIN u ON u.user_id = p.user_id
  WHERE p.paid_at >= u.signup_at
),
cohort_size AS (
  SELECT cohort_month, COUNT(DISTINCT user_id) AS users FROM u GROUP BY 1
),
ltv AS (
  SELECT cohort_month, day_age, SUM(amount) AS revenue
  FROM rev
  WHERE day_age BETWEEN 0 AND 180
  GROUP BY 1,2
)
SELECT l.cohort_month, l.day_age,
       ROUND(SUM(l.revenue) OVER (PARTITION BY l.cohort_month ORDER BY l.day_age
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) / cs.users, 2) AS ltv_per_user
FROM ltv l
JOIN cohort_size cs USING (cohort_month)
ORDER BY 1,2;

Interpretation: payback occurs when cumulative gross margin per user exceeds acquisition cost per user. If you only have revenue, consider an estimated margin rate for approximation.

Example 5: Channel-based cohorts

Compare retention and LTV for cohorts split by acquisition channel.

WITH base AS (
  SELECT user_id,
         DATE_TRUNC('month', signup_at) AS cohort_month,
         channel,
         signup_at
  FROM users
),
active AS (
  SELECT b.cohort_month, b.channel,
         DATE_DIFF('day', b.signup_at, e.event_time) AS day_age,
         e.user_id
  FROM base b
  JOIN events e ON e.user_id = b.user_id AND e.event_name='active'
  WHERE e.event_time >= b.signup_at
),
cohort_size AS (
  SELECT cohort_month, channel, COUNT(DISTINCT user_id) AS users
  FROM base
  GROUP BY 1,2
),
ret AS (
  SELECT cohort_month, channel, day_age, COUNT(DISTINCT user_id) AS retained
  FROM active
  WHERE day_age IN (0,7,28)
  GROUP BY 1,2,3
)
SELECT r.cohort_month, r.channel, r.day_age,
       ROUND(100.0 * r.retained / cs.users, 2) AS retention_pct
FROM ret r
JOIN cohort_size cs USING (cohort_month, channel)
ORDER BY 1,2,3;

Readout: channels with higher D28 retention or higher 90-day LTV merit budget scaling; validate attribution consistency before decisions.

Example 6: Interpreting cohort shifts

Scenario: D7 retention improved for the last two signup cohorts, but LTV is flat. Hypotheses:

Activation UX improved (higher early activity), but monetization step unchanged
Acquisition mix shifted to a less monetizing audience
Pricing test increased trial starts (retention proxy) but decreased conversion

How to validate: slice retention and conversion by channel and device; compare funnel step-through for old vs new cohorts; check seasonality and promo calendars using previous-year cohorts.

Build a retention chart: step-by-step

Pick a key event (e.g., signup) and time grain (day or week).
Create cohort_id by truncating the key event timestamp to your reporting grain (week or month).
Compute age as event_time minus key event time.
Count distinct active users by cohort_id and age.
Divide by cohort size (distinct users in that cohort).
Visualize as a heatmap or line per cohort.

Quick data checks before plotting

Timezones consistent for signup_at and event_time
No duplicate users in cohort size
Filter future ages beyond current max age to avoid downward bias
Stable denominator: exclude test users and internal traffic

Drills and exercises

Write a query that produces D0, D1, D7, D28 retention for monthly signup cohorts
Convert classic retention to rolling retention for the same dataset and compare the curves
Build a feature adoption cohort for a chosen event and compute 7-day adoption rate
Create a 90-day LTV curve by signup cohort and identify payback day using an assumed margin rate
Separate cohorts by channel and device; note which combination underperforms by D28
Draft a 5-line narrative explaining a cohort shift and one recommended experiment

Common mistakes and debugging tips

Mixing timezones or calendar grains

Always align to a single timezone. If product logs UTC but dashboards show local time, cohort edges will misalign. Use explicit conversions and document the standard.

Leaking future users into earlier cohorts

Derive cohort_id only from the key event timestamp for each user. Do not use first activity after signup to set cohort boundaries.

Denominator drift

Retention denominators should be the original cohort population. Do not subtract churned users or add late signups. Exclude known test/internal users up front.

Age censoring

For the most recent cohorts, later ages are incomplete. Either hide ages beyond available exposure or annotate that values are censored.

Attribution inconsistency in channel cohorts

Confirm that channel is measured at the same moment (e.g., signup attribution) across cohorts. Shifts in attribution models can masquerade as performance changes.

Mini project: Activation-to-Revenue cohort dashboard

Goal: Build a single view that tracks signup cohorts from activation through revenue and highlights channel differences.

Define signup cohort_month and cohort_size
Pick an activation event (e.g., completed_onboarding) and compute activation rate per cohort
Compute D7 and D28 classic retention from 'active' events
Compute 90-day LTV per cohort; estimate payback using assumed margin and CPA
Slice all metrics by acquisition channel
Write a short narrative: biggest positive/negative cohort shifts and your next experiment

Deliverables checklist

Cohort table with columns: cohort_month, size, activation_rate, D7_ret, D28_ret, LTV_90d
One line chart for retention and one for LTV by cohort
Bullet summary with one action per insight

Communicating cohort insights

Use this simple structure:

Headline: what changed and by how much (e.g., D28 retention up +3.1 pp for Feb cohort)
Drivers: which segments or channels drove the change
Confidence: known caveats (censoring, attribution shifts)
Action: one recommended experiment or rollout decision
Follow-up metric: what you will monitor next

Subskills

Cohort Definition By Key Event — Define cohorts by signup, activation, purchase, or first feature use; choose grains and IDs wisely. Time: 45–90 min
Retention Curve Analysis — Build classic retention tables and read decay patterns and curves. Time: 45–90 min
Rolling Retention — Compute rolling retention and interpret differences vs classic. Time: 40–70 min
Feature Adoption Cohorts — Measure adoption over time and compare cohorts. Time: 45–90 min
Monetization Cohorts — Build LTV curves and estimate payback. Time: 60–120 min
Channel Based Cohorts — Compare cohort quality by acquisition channel. Time: 45–90 min
Interpreting Cohort Shifts — Separate product effects from audience and seasonality. Time: 45–90 min
Communicating Cohort Insights — Turn findings into a concise narrative and action plan. Time: 30–60 min

Next steps

Apply cohorts to your last 6 months of signups; present a one-page readout
Pair cohort analysis with funnels to find where retention breaks
Learn segmentation and experimentation to validate cohort hypotheses
Add survival analysis for advanced retention modeling when needed

Menu

Cohort Analysis

Table of Contents

Cohort Analysis for Product Analysts

Who this is for

Prerequisites

Why it matters

Learning path

Worked examples

Example 1: Signup cohort retention (SQL)

Example 2: Rolling retention (SQL)

Example 3: Feature adoption cohorts

Example 4: Monetization cohorts and LTV

Example 5: Channel-based cohorts

Example 6: Interpreting cohort shifts

Build a retention chart: step-by-step

Drills and exercises

Common mistakes and debugging tips

Mini project: Activation-to-Revenue cohort dashboard

Communicating cohort insights

Subskills

Next steps

Cohort Analysis — Skill Exam

Topics

Monetization Cohorts

Cohort Definition By Key Event

Retention Curve Analysis

Rolling Retention

Feature Adoption Cohorts

Channel Based Cohorts

Interpreting Cohort Shifts

Communicating Cohort Insights

Have questions about Cohort Analysis?

AI Assistant