Why this matters
Defining cohorts by a key event is the foundation of retention, activation, and growth analysis. Product Analysts use this to answer questions like:
- Are users who complete onboarding retaining better than those who only sign up?
- How do first purchasers from last month behave over the next 12 weeks?
- Which accounts adopted a new feature and stayed active?
Who this is for
- Product Analysts and Data Analysts setting up retention/adoption dashboards
- Growth/Marketing Analysts measuring activation and conversion cohorts
- PMs validating success metrics for new features
Prerequisites
- Basic SQL (SELECT, GROUP BY, WHERE, JOIN)
- Understanding of events data (event_name, event_time, user_id/account_id)
- Comfort with time bucketing (daily/weekly/monthly)
Concept explained simply
A cohort is a group of users or accounts who share a starting moment. In key-event cohorts, that moment is when they first do a specific event (for example: Signup, First Purchase, Invite Sent).
Mental model
- Pick the key event that represents the moment you want to track from.
- Stamp each user/account with the timestamp of their first qualifying occurrence.
- Bucket that timestamp into daily/weekly/monthly cohorts.
- Use that cohort label consistently in downstream analysis.
Why “first” occurrence?
Using the first occurrence anchors the cohort at a single, stable moment, preventing users/accounts from jumping between cohorts over time. If you need cohorts for repeat behaviors, define a separate cohort per cycle with clear rules, but keep each definition stable.
Key decisions when defining cohorts
- Key event: Which event truly marks the start (signup, activation, first purchase, feature adoption)?
- Level: User-level or account-level (B2C vs B2B)?
- Occurrence: First qualifying event only (typical) vs any/most recent (rare).
- Filters: Production-only, success statuses, plan types, traffic sources.
- Time bucketing: Daily, weekly, or monthly (depends on volume and decision cadence).
- Timezone: Align to business timezone to avoid off-by-one-day issues.
- Late data: Decide whether late-arriving events can backfill cohorts.
Good defaults
- First occurrence
- User-level for consumer products, account-level for B2B
- Monthly cohorts for strategic views, weekly for experiments
- Filter to production events; exclude sandbox/test/bots
Worked examples
Example 1: Signup cohort (user-level, monthly)
Goal: Group users by the month they first completed Signup.
-- users' first signup timestamp
WITH first_signup AS (
SELECT
user_id,
MIN(event_time) AS cohort_ts
FROM events
WHERE event_name = 'Signup'
AND COALESCE(properties:env, properties->>'env') NOT IN ('sandbox','test')
GROUP BY user_id
)
SELECT
DATE_TRUNC('month', cohort_ts) AS cohort_month,
COUNT(*) AS users_in_cohort
FROM first_signup
GROUP BY 1
ORDER BY 1;
Notes:
- Use business timezone if available (convert event_time before bucketing).
- Exclude non-production events.
Example 2: First purchase cohort (user-level, weekly)
Goal: Group users by the week they first had a successful paid order.
WITH paid_orders AS (
SELECT user_id, event_time
FROM events
WHERE event_name = 'Purchase'
AND COALESCE(properties:status, properties->>'status') = 'success'
), first_paid AS (
SELECT user_id, MIN(event_time) AS cohort_ts
FROM paid_orders
GROUP BY user_id
)
SELECT
DATE_TRUNC('week', cohort_ts) AS cohort_week,
COUNT(*) AS users_in_cohort
FROM first_paid
GROUP BY 1
ORDER BY 1;
Notes: Ensures refunds or failed attempts don’t create misleading cohorts.
Example 3: Feature adoption cohort (account-level, daily)
Goal: Group B2B accounts by the first time any member used the feature "InviteSent".
WITH feature_events AS (
SELECT account_id, event_time
FROM events
WHERE event_name = 'InviteSent'
AND COALESCE(properties:env, properties->>'env') = 'prod'
AND account_id IS NOT NULL
), first_feature AS (
SELECT account_id, MIN(event_time) AS cohort_ts
FROM feature_events
GROUP BY account_id
)
SELECT
DATE_TRUNC('day', cohort_ts) AS cohort_day,
COUNT(*) AS accounts_in_cohort
FROM first_feature
GROUP BY 1
ORDER BY 1;
Notes: Use account_id to avoid double-counting across users in the same company.
Edge cases and QA ideas
- Users with multiple signups: still only one cohort (first time).
- Accounts merged: re-compute first event for the merged ID to avoid drift.
- QA: Randomly sample 10 IDs and manually verify their first event times.
How to build your cohort step-by-step
- Choose the key event and write down the business definition.
- Add necessary filters (production-only, success states).
- Pick the entity level (user_id or account_id) and the business timezone.
- Compute first occurrence per entity.
- Bucket into daily/weekly/monthly.
- Validate counts and a few sampled entities.
- Publish the cohort definition as a view or table for reuse.
Template SQL you can adapt
WITH base AS (
SELECT
entity_id, -- user_id or account_id
event_time,
event_name,
properties
FROM events
WHERE event_name = '{{KEY_EVENT}}'
AND {{FILTERS}}
), first_event AS (
SELECT entity_id, MIN(event_time) AS cohort_ts
FROM base
GROUP BY entity_id
)
SELECT
DATE_TRUNC('{{BUCKET}}', cohort_ts) AS cohort_{{BUCKET}},
COUNT(*) AS entities_in_cohort
FROM first_event
GROUP BY 1
ORDER BY 1;
Exercises
Do these to make the concepts stick.
Exercise 1 — Define a monthly Signup cohort (user-level)
Using an events table with columns (user_id, event_name, event_time, properties), produce a table of monthly cohorts showing how many users first signed up each month. Exclude sandbox/test events (properties.env).
- Output columns: cohort_month, users_in_cohort
Show solution
WITH signups AS (
SELECT user_id, event_time
FROM events
WHERE event_name = 'Signup'
AND COALESCE(properties:env, properties->>'env') NOT IN ('sandbox','test')
), first_signup AS (
SELECT user_id, MIN(event_time) AS cohort_ts
FROM signups
GROUP BY user_id
)
SELECT
DATE_TRUNC('month', cohort_ts) AS cohort_month,
COUNT(*) AS users_in_cohort
FROM first_signup
GROUP BY 1
ORDER BY 1;
Exercise 2 — Define a feature adoption cohort (account-level)
For a B2B product, define cohorts by first use of event "ExportReport" at the account level. Exclude accounts where properties.account_type = 'internal'. Use weekly buckets.
- Output columns: cohort_week, accounts_in_cohort
Show solution
WITH feature AS (
SELECT account_id, event_time
FROM events
WHERE event_name = 'ExportReport'
AND COALESCE(properties:account_type, properties->>'account_type') != 'internal'
AND account_id IS NOT NULL
), first_use AS (
SELECT account_id, MIN(event_time) AS cohort_ts
FROM feature
GROUP BY account_id
)
SELECT
DATE_TRUNC('week', cohort_ts) AS cohort_week,
COUNT(*) AS accounts_in_cohort
FROM first_use
GROUP BY 1
ORDER BY 1;
Exercise checklist
- You used first occurrence per entity
- You applied the required filters (production/internal/test)
- You chose the correct entity level (user vs account)
- You bucketed time correctly (week or month)
- You validated a few random IDs manually
Common mistakes and how to self-check
- Using any occurrence instead of first: users drift between cohorts. Fix: MIN(event_time).
- Mixing user_id and account_id: double-counting. Fix: pick one entity per cohort.
- Ignoring timezone: off-by-one-day buckets. Fix: convert event_time to business timezone before DATE_TRUNC.
- Including non-production traffic: inflates cohorts. Fix: filter env/test flags.
- Using ingestion time instead of event_time: wrong cohort dates. Fix: always use event_time.
- Not freezing definition: dashboards change silently. Fix: publish a stable view and document it.
Quick self-check queries
- Compare total entities in cohort table to distinct entities in base events after filters — they should match.
- Spot-check 10 entities: confirm their cohort date equals their earliest qualifying event.
- Check that no entity appears in more than one cohort bucket.
Practical projects
- Activation dashboard: Cohort users by first "CompletedOnboarding" and track 8-week activity.
- E-commerce growth: Cohort by first successful purchase; segment by acquisition channel.
- SaaS adoption: Account-level cohort by first "InviteSent"; track seats added over 12 weeks.
Learning path
- Define cohort by key event (this lesson)
- Retention and re-engagement metrics
- Revenue and contribution cohorts
- Segmentation by attributes and experiments
- Communicate and document cohort definitions for the org
Next steps
- Turn your SQL into a reusable view or table.
- Share a one-paragraph definition of your cohort in team docs.
- Build a chart showing cohort sizes over time and investigate anomalies.
Mini challenge
Scenario: Your product has a "DocumentUpload" feature. Marketing wants to measure adoption and retention from this moment.
- Pick the entity (user or account) and justify it.
- Define the cohort date using first qualifying event and any needed filters.
- Choose the bucket size and timezone.
- Write a one-sentence business definition and outline the SQL CTEs you would create.
Quick Test
You can take the quick test for free. If you are logged in, your progress will be saved automatically.