Why this matters
Defining acquisition cohorts correctly is the foundation of trustworthy retention, LTV, and payback analyses. Marketing Analysts use cohorts to answer real questions like:
- Which channels bring users who retain after 3 months?
- How long until a cohort pays back its CAC?
- Did our onboarding change improve activation for new cohorts?
Get the definition wrong, and every downstream metric can mislead decisions and budgets.
Concept explained simply
An acquisition cohort is a group of users clustered by when they first became your users—based on a chosen “acquisition event” (for example: first signup or first purchase). The cohort label is typically a date bucket like 2025-01-15 (daily), 2025-W02 (weekly), or 2025-01 (monthly).
Mental model
Think of school classes: the “Class of 2025” is everyone who started in the same year. A cohort is the “class” of users who started using your product in the same time bucket.
Core elements of a cohort definition
- Acquisition event: the first qualifying event that makes someone “yours” (e.g., first signup, first purchase).
- Identity key: user_id or customer_id used to group events to the same person/entity.
- Date grain: day, week, or month of acquisition.
- Timestamp source and timezone: which timestamp field and which timezone standard to use.
- Channel attribution snapshot: the channel/tag you freeze at the time of acquisition.
- De-duplication rule: first qualifying event only; ignore later ones.
- Exclusions: remove test, internal, bots, or incomplete signups.
- Backfill policy: how to handle late-arriving data or merged identities.
Typical choices and when to use them
- Use signup as acquisition if your product is free or a trial and value starts at account creation.
- Use first purchase as acquisition if you care about paying customers and CAC payback.
- Use monthly grain for small volumes; weekly or daily for larger volumes or fast experiments.
- Use a single canonical timezone (often product timezone or UTC) and document it.
Worked examples
Example 1: Mobile app (freemium)
Goal: Understand retention from first signup.
- Acquisition event: first_signup
- Identity: user_id
- Grain: weekly (YYYY-Www)
- Timezone: product default (e.g., UTC)
- Channel: install_source at signup
- Exclusions: internal testers; device_limit > 10 in 1 day (likely bots)
Interpretation: Cohort 2025-W02 includes all users whose first signup happened during week 2 of 2025.
Example 2: E-commerce (conversion-first)
Goal: LTV and CAC payback on paying customers.
- Acquisition event: first_purchase (exclude canceled/refunded purchases within 24h)
- Identity: customer_id
- Grain: monthly
- Timezone: store timezone
- Channel: last_non_direct_click at first purchase
- Exclusions: employees, fraud-flagged orders
Interpretation: Cohort 2025-01 includes customers whose first valid purchase occurred in Jan 2025.
Example 3: B2B SaaS (lead to opportunity)
Goal: Activation and expansion from marketing-sourced signups.
- Acquisition event: first_verified_signup (email verified + workspace created)
- Identity: account_id (primary), user_id (secondary)
- Grain: weekly
- Timezone: UTC standardized
- Channel: UTM_source captured at signup, frozen
- Exclusions: free trial signups without verification
Interpretation: Cohort 2025-W03 = first verified signups that week; later multi-seat users still belong to the signup cohort.
Steps to define acquisition cohorts
- Clarify the business question. Is the focus on adoption (signup) or monetization (purchase)?
- Pick the acquisition event. Choose the earliest meaningful event; define filters (e.g., verified, non-refund).
- Choose identity. Decide user_id vs customer_id/account_id; specify merge rules.
- Set cohort grain and timezone. Monthly for low volume; weekly/daily for high volume or experiments.
- Freeze channel at acquisition. Decide the attribution model and exactly which fields to snapshot.
- Define exclusions. Internal/test users, bots, invalid events; document detection rules.
- Document backfill and late data handling. For late events and id merges, specify how cohorts are recomputed.
- QA with samples. Manually verify random users to ensure correct cohort assignment.
Data you need (checklist)
- Identity: user_id/customer_id/account_id
- Event table with timestamps (signup, purchase, etc.)
- Attribution fields at acquisition (source, medium, campaign)
- Time zone decision and reproducible conversion logic
- Exclusion flags (internal, bot, fraud)
- Merge history (id_links) if identities can change
Edge cases and rules
- Merged identities: If two user_ids later merge, use the earliest qualifying event across both; keep a reproducible merge rule.
- Refunded first purchase: If your acquisition event is purchase, exclude or re-evaluate when a first purchase is fully refunded.
- Backfilled events: Document whether historical cohorts will be recomputed nightly.
- Time zone drift: If source systems store mixed timezones, standardize before bucketing.
- Internal/bot traffic: Maintain a list of domains, IPs, or device heuristics to exclude.
Common mistakes and how to self-check
- Mistake: Using any purchase as acquisition, not the first. Self-check: Verify that each user appears in exactly one cohort.
- Mistake: Channel changing after acquisition. Self-check: Confirm channel fields are frozen at the first qualifying event.
- Mistake: Timezone inconsistencies. Self-check: For a sample of users near midnight, confirm cohort buckets match the documented timezone.
- Mistake: Missing exclusions. Self-check: Confirm internal and bot flags are filtered out before cohort assignment.
- Mistake: Identity collisions. Self-check: For merged accounts, ensure the earlier event is the cohort anchor.
Exercises
Mirror of the interactive exercises below. Do them now before the quick test.
Exercise 1 — Define cohorts for three scenarios
For each scenario, choose acquisition event, identity, grain, timezone, exclusions, and channel snapshot. Write a short config and one-sentence rationale.
- Streaming app trial: users start on a 7-day free trial; value starts at watching content.
- Marketplace: sellers matter, not buyers; measure seller retention and LTV.
- Consumer fintech: focus on funded accounts (first successful deposit), not just signups.
Hints
- Pick the earliest event that represents real value for your business model.
- Monthly grain often suffices at small scale; weekly for rapid experimentation.
- Freeze attribution at the first qualifying event.
Expected output shape
{
"scenario": "...",
"acq_event": "...",
"identity": "...",
"grain": "daily|weekly|monthly",
"timezone": "UTC|ProductTZ",
"channel_snapshot": "...",
"exclusions": ["..."] ,
"rationale": "..."
}Exercise 2 — Assign cohorts from raw events
Given raw events below, assign each user to a cohort (monthly) using first_verified_signup. Timezone: UTC. Exclude internal emails (@company.test).
Raw rows
user_id, event, ts, email, channel u1, signup, 2025-01-31T23:50:00Z, a@x.com, ads u1, email_verified, 2025-02-01T00:02:00Z, a@x.com, ads u2, signup, 2025-02-03T10:00:00Z, b@company.test, organic u3, signup, 2025-02-10T09:00:00Z, c@y.com, referral u3, email_verified, 2025-02-12T11:00:00Z, c@y.com, referral
Rule: first_verified_signup = first signup followed by email_verified within 72h.
Produce: user_id, cohort_month (YYYY-MM), channel_at_acq.
Expected output
u1, 2025-02, ads u3, 2025-02, referral
- Checklist: Did you pick one clear acquisition event?
- Checklist: Did you ensure each user appears in only one cohort?
- Checklist: Did you freeze channel at acquisition?
- Checklist: Did you apply timezone and exclusions consistently?
Practical projects
- Build a cohort assignment query: Create a reproducible SQL (or notebook) that outputs one row per user with cohort label, channel, and acquisition timestamp.
- Cohort QA dashboard: For the latest three cohorts, show user counts by channel and outlier detection (e.g., sudden spikes).
- Compare definitions: Run retention curves for signup vs first purchase cohorts and summarize differences in a one-pager.
Who this is for
- Marketing Analysts measuring channel performance, retention, and LTV.
- Product Analysts validating onboarding and activation.
- Data-savvy marketers running acquisition experiments.
Prerequisites
- Basic SQL or spreadsheet skills.
- Understanding of marketing channels and attribution concepts.
- Comfort with timestamps and timezones.
Learning path
- Start here: Define acquisition cohorts (this lesson).
- Next: Retention metrics by cohort (D1/D7/D30, churn rate).
- Then: Revenue/LTV by cohort and CAC payback.
- Advanced: Channel-mix modeling and cohort forecasting.
Next steps
- Finalize a written cohort definition for your team (one-pager with rules and examples).
- Implement the cohort assignment pipeline and schedule QA checks.
- Share early insights and get feedback from marketing and product stakeholders.
Mini challenge
Your signup cohorts show a sudden +40% spike in a single day. In one paragraph, list 3 possible causes and 3 checks you would run to confirm or rule out each cause.
Quick test note
The quick test is available to everyone. Only logged-in users will have their progress saved.