Why this matters
As a Product Analyst, you will routinely answer questions like: Do users who sign up in June come back a week later? How does week-4 retention look after a new onboarding flow? Which cohorts are churning fastest? Retention and cohort queries let you quantify customer stickiness, compare launches, and prioritize product bets.
- Monitor Day-1 and Day-7 retention after a feature release.
- Compare monthly cohorts to detect seasonality or channel quality.
- Spot churn early and flag segments for lifecycle campaigns.
Concept explained simply
A cohort is a group of users who share a starting event and time bucket (for example, users who signed up in the same week). Retention measures how many of those users return (perform any qualifying activity) at a later time bucket.
Mental model
- Pick the start: usually user signup (or first purchase).
- Bucket into cohorts: by day/week/month (using DATE_TRUNC).
- Define a return window: exact day (D7), week window (W4), or rolling window (any activity in days 1–30).
- Count unique users who return in that window and divide by the cohort size.
Data model used in examples
We assume two tables (rename columns to your schema as needed):
users(user_id, signup_at TIMESTAMP)
events(user_id, event_at TIMESTAMP, event_name TEXT)
"Activity" usually means any meaningful product event (session, open_app, view_item, purchase). Adjust filters to match your product.
Worked examples
Example 1 — Monthly cohorts, Day-7 retention
Question: For each signup month, what percent of users return exactly 7 days after signup?
WITH base AS (
SELECT
u.user_id,
(DATE_TRUNC('month', u.signup_at))::date AS cohort_month,
u.signup_at::date AS signup_date
FROM users u
),
acts AS (
SELECT e.user_id, e.event_at::date AS activity_date
FROM events e
-- optionally filter: WHERE e.event_name IN ('open_app','session','purchase')
)
SELECT
b.cohort_month,
COUNT(DISTINCT b.user_id) AS cohort_size,
COUNT(DISTINCT CASE WHEN (a.activity_date - b.signup_date) = 7 THEN b.user_id END) AS retained_d7,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN (a.activity_date - b.signup_date) = 7 THEN b.user_id END)
/ NULLIF(COUNT(DISTINCT b.user_id), 0), 1) AS d7_retention_pct
FROM base b
LEFT JOIN acts a ON a.user_id = b.user_id
GROUP BY 1
ORDER BY 1;
Notes: In some SQL dialects use DATEDIFF(day, b.signup_date, a.activity_date) = 7.
Example 2 — Weekly cohorts, Week-4 retention (days 21–27)
Question: What share of each weekly cohort is active in week 4 after signup (days 21–27 inclusive)?
WITH base AS (
SELECT
u.user_id,
DATE_TRUNC('week', u.signup_at)::date AS cohort_week,
u.signup_at::date AS signup_date
FROM users u
),
acts AS (
SELECT e.user_id, e.event_at::date AS activity_date
FROM events e
)
SELECT
b.cohort_week,
COUNT(DISTINCT b.user_id) AS cohort_size,
COUNT(DISTINCT CASE
WHEN (a.activity_date - b.signup_date) BETWEEN 21 AND 27 THEN b.user_id END
) AS retained_w4,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN (a.activity_date - b.signup_date) BETWEEN 21 AND 27 THEN b.user_id END)
/ NULLIF(COUNT(DISTINCT b.user_id), 0), 1) AS w4_retention_pct
FROM base b
LEFT JOIN acts a ON a.user_id = b.user_id
GROUP BY 1
ORDER BY 1;
Example 3 — Rolling 30-day retention (any activity days 1–30)
Question: For each signup month, what percent of users return at least once in the first 30 days?
WITH base AS (
SELECT
u.user_id,
DATE_TRUNC('month', u.signup_at)::date AS cohort_month,
u.signup_at::date AS signup_date
FROM users u
),
acts AS (
SELECT e.user_id, e.event_at::date AS activity_date
FROM events e
)
SELECT
b.cohort_month,
COUNT(DISTINCT b.user_id) AS cohort_size,
COUNT(DISTINCT CASE
WHEN (a.activity_date - b.signup_date) BETWEEN 1 AND 30 THEN b.user_id END
) AS retained_30d,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN (a.activity_date - b.signup_date) BETWEEN 1 AND 30 THEN b.user_id END)
/ NULLIF(COUNT(DISTINCT b.user_id), 0), 1) AS d30_rolling_retention_pct
FROM base b
LEFT JOIN acts a ON a.user_id = b.user_id
GROUP BY 1
ORDER BY 1;
Query patterns and templates
- Cohort key: DATE_TRUNC('week'|'month', signup_at)
- Day index: activity_date::date - signup_date::date (use DATEDIFF in some dialects)
- Avoid double counting: COUNT(DISTINCT user_id) in both numerator and denominator
- Filter meaningful events only to reduce noise
Template — daily grid (cohort x day)
WITH base AS (
SELECT u.user_id,
DATE_TRUNC('month', u.signup_at)::date AS cohort_month,
u.signup_at::date AS signup_date
FROM users u
), acts AS (
SELECT e.user_id, e.event_at::date AS activity_date
FROM events e
), joined AS (
SELECT b.cohort_month, b.user_id,
(a.activity_date - b.signup_date) AS day_n
FROM base b
JOIN acts a ON a.user_id = b.user_id
WHERE a.activity_date >= b.signup_date
)
SELECT cohort_month,
day_n,
COUNT(DISTINCT user_id) AS active_users
FROM joined
WHERE day_n BETWEEN 0 AND 30
GROUP BY 1,2
ORDER BY 1,2;
Use this to build heatmaps (retention curves) in your BI tool.
Exercises
Try these in your environment. The quick test is at the end of the page. Test is available to everyone; only logged-in users get saved progress.
Exercise 1 — Day-7 retention by signup month
Compute D7 retention for each signup month using users and events. Treat any event as activity. Output: cohort_month, cohort_size, retained_d7, d7_retention_pct.
Show solution
WITH base AS (
SELECT u.user_id,
DATE_TRUNC('month', u.signup_at)::date AS cohort_month,
u.signup_at::date AS signup_date
FROM users u
), acts AS (
SELECT e.user_id, e.event_at::date AS activity_date
FROM events e
)
SELECT b.cohort_month,
COUNT(DISTINCT b.user_id) AS cohort_size,
COUNT(DISTINCT CASE WHEN (a.activity_date - b.signup_date) = 7 THEN b.user_id END) AS retained_d7,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN (a.activity_date - b.signup_date) = 7 THEN b.user_id END)
/ NULLIF(COUNT(DISTINCT b.user_id), 0), 1) AS d7_retention_pct
FROM base b
LEFT JOIN acts a ON a.user_id = b.user_id
GROUP BY 1
ORDER BY 1;
Expected shape (example):
cohort_month | cohort_size | retained_d7 | d7_retention_pct
2024-06-01 | 1200 | 420 | 35.0
2024-07-01 | 950 | 320 | 33.7
2024-08-01 | 1100 | 385 | 35.0
- Checklist: used DATE_TRUNC for cohorts
- Used COUNT(DISTINCT) for users
- Compared day index to 7 exactly
- Handled divide-by-zero with NULLIF
Exercise 2 — Rolling 14-day retention
For each signup week, compute the percent of users who return at least once in the first 14 days (days 1–14). Output: cohort_week, cohort_size, retained_14d, d14_rolling_retention_pct.
Show solution
WITH base AS (
SELECT u.user_id,
DATE_TRUNC('week', u.signup_at)::date AS cohort_week,
u.signup_at::date AS signup_date
FROM users u
), acts AS (
SELECT e.user_id, e.event_at::date AS activity_date
FROM events e
)
SELECT b.cohort_week,
COUNT(DISTINCT b.user_id) AS cohort_size,
COUNT(DISTINCT CASE WHEN (a.activity_date - b.signup_date) BETWEEN 1 AND 14 THEN b.user_id END) AS retained_14d,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN (a.activity_date - b.signup_date) BETWEEN 1 AND 14 THEN b.user_id END)
/ NULLIF(COUNT(DISTINCT b.user_id), 0), 1) AS d14_rolling_retention_pct
FROM base b
LEFT JOIN acts a ON a.user_id = b.user_id
GROUP BY 1
ORDER BY 1;
- Checklist: cohort by week
- Rolling window 1–14 days
- Distinct users in numerator and denominator
Common mistakes
- Counting events, not users. Fix: COUNT(DISTINCT user_id) for retention.
- Using the wrong window. Be explicit: exact day (== 7) vs rolling window (BETWEEN 1 AND 7).
- Timezone drift. Align timestamps to a single timezone and cast to ::date after.
- Leaky cohorts. Ensure each user has one cohort (based on first signup). If multiple signups exist, use MIN(signup_at) per user.
- Including pre-signup events. Filter activity_date >= signup_date.
Self-check tips
- Spot-check a few users manually to verify day indices.
- For fresh cohorts, retention on late days should be near zero (insufficient time). If not, windows are wrong.
- Cohort size should equal DISTINCT users starting in that period, regardless of events.
Practical projects
- Build a monthly cohort heatmap (0–60 days) and annotate feature launch dates.
- Split D7 retention by acquisition channel (add channel to users) and compare.
- Create a dashboard: D1, D7, D30 rolling retention for total and by platform (web/mobile).
Who this is for
- Product Analysts who need to quantify user stickiness and compare cohorts.
- Data/BI Analysts supporting growth, lifecycle, or product teams.
Prerequisites
- Comfort with SQL SELECT, GROUP BY, JOIN.
- Familiarity with date functions (DATE_TRUNC, casting to date, DATEDIFF equivalents).
- Basic understanding of user-event schemas.
Learning path
- Before this: Event schema basics, window and date functions.
- This lesson: Cohorts, fixed-day and rolling retention.
- Next: Segment-level retention (channel, geo), survival curves, and funnel-to-retention analyses.
Next steps
- Parameterize your retention window (1, 7, 30) for easy comparisons.
- Add segments (platform, plan, country) and watch how cohorts diverge.
- Publish a weekly retention report with a short narrative of changes.
Mini challenge
Find the first cohort where D7 retention drops by more than 5 percentage points versus the previous cohort. List the cohort pair and the top 3 segments contributing to the drop.
Quick Test
Take the quick test below to check your understanding. Anyone can take it; only logged-in users will have progress saved.