Why this matters
As a Product Analyst, most core metrics (activation, conversion, retention, engagement) originate from product event tables. Understanding how these tables are structured, how to deduplicate, and how to join them correctly is essential for trustworthy dashboards, experiments, and product decisions.
- Define the correct grain: one row per event.
- Count DAU/WAU/MAU without double-counting.
- Build funnels with time windows (e.g., sign up → activate within 7 days).
- Compute retention from first-touch cohorts.
- Handle duplicates, late events, and time zones safely.
Concept explained simply
A product event table is a chronological log of what users did in your product. Each row answers: who did what, when, and sometimes where/how.
Mental model
Think of an event table as a timestamped diary:
- Who: user_id (and sometimes anonymous_id or device_id).
- Did what: event_name (e.g., 'view_item', 'signup', 'purchase').
- When: event_time (UTC timestamp).
- Where/how: properties (JSON with page, plan, amount, etc.), device, country, session_id.
Everything you calculate later (funnels, retention, sessions) is built from this diary. If the diary is messy (duplicates, wrong time zone), your metrics will be off.
Core fields in product event tables
- event_id: unique per event (may be missing or unreliable in some pipelines).
- user_id (or anonymous_id): the actor; for logged-out traffic you may have device_id or anonymous_id.
- event_name: the action (signup, activate, view_item, add_to_cart, purchase, click_cta, etc.).
- event_time: UTC timestamp; convert to business time zone for daily/weekly grouping.
- session_id: identifier for a session; if absent, you can derive sessions using time gaps (e.g., 30 minutes).
- properties: semi-structured JSON (e.g., {"plan":"Pro","source":"email","amount":49.99}).
- context fields: platform, device, country, page_url, referrer (if available).
Important data quality considerations
- Grain: one row per event. Never aggregate the base table permanently; aggregate in queries.
- Duplicates: event ingestion can retry; deduplicate with window functions.
- Late events: some events arrive late; use watermarks or partition backfill windows.
- Time zones: store in UTC; convert for reporting with a consistent business time zone.
- Sessionization: define a gap (e.g., 30 minutes) and rebuild sessions if session_id is missing.
Worked examples
Example 1 — DAU, WAU, MAU from events
Goal: Count distinct active users by day/week/month based on any qualifying event.
-- Daily Active Users (DAU)
SELECT
DATE(event_time AT TIME ZONE 'UTC') AS activity_date,
COUNT(DISTINCT user_id) AS dau
FROM events
WHERE user_id IS NOT NULL
GROUP BY 1
ORDER BY 1;
-- Weekly Active Users (WAU) using calendar weeks
SELECT
DATE_TRUNC('week', event_time AT TIME ZONE 'UTC') AS week_start,
COUNT(DISTINCT user_id) AS wau
FROM events
WHERE user_id IS NOT NULL
GROUP BY 1
ORDER BY 1;
-- Rolling 30-day MAU snapshot per day
WITH per_day AS (
SELECT DATE(event_time AT TIME ZONE 'UTC') AS d, user_id
FROM events
WHERE user_id IS NOT NULL
GROUP BY 1, 2
)
SELECT
d,
COUNT(DISTINCT user_id) OVER (
ORDER BY d
RANGE BETWEEN INTERVAL '29 days' PRECEDING AND CURRENT ROW
) AS mau_rolling_30d
FROM (
SELECT DISTINCT d FROM per_day
) days
JOIN per_day USING (d)
ORDER BY d;
Tip: If your warehouse lacks RANGE on intervals, compute MAU by joining a 30-day date window.
Example 2 — 3-step funnel within 7 days
Goal: Users who view_item → add_to_cart → purchase, in order, within 7 days of the first step.
WITH first_view AS (
SELECT user_id, MIN(event_time) AS view_ts
FROM events
WHERE event_name = 'view_item'
GROUP BY 1
), add_cart AS (
SELECT e.user_id, MIN(e.event_time) AS cart_ts
FROM events e
JOIN first_view v USING (user_id)
WHERE e.event_name = 'add_to_cart'
AND e.event_time BETWEEN v.view_ts AND v.view_ts + INTERVAL '7 days'
GROUP BY 1
), purchase AS (
SELECT e.user_id, MIN(e.event_time) AS purchase_ts
FROM events e
JOIN add_cart c USING (user_id)
WHERE e.event_name = 'purchase'
AND e.event_time BETWEEN c.cart_ts AND c.cart_ts + INTERVAL '7 days'
GROUP BY 1
)
SELECT
COUNT(*) AS step1_viewers,
(SELECT COUNT(*) FROM add_cart) AS step2_added,
(SELECT COUNT(*) FROM purchase) AS step3_purchased,
ROUND(100.0 * (SELECT COUNT(*) FROM add_cart) / NULLIF(COUNT(*),0), 2) AS view_to_cart_pct,
ROUND(100.0 * (SELECT COUNT(*) FROM purchase) / NULLIF((SELECT COUNT(*) FROM add_cart),0), 2) AS cart_to_purchase_pct
FROM first_view;
Example 3 — D1 retention by signup cohort
Goal: Of users who signed up on a date, what percent returned the next day?
WITH first_signup AS (
SELECT user_id, DATE(MIN(event_time)) AS signup_date, MIN(event_time) AS signup_ts
FROM events
WHERE event_name = 'signup'
GROUP BY 1
), d1_return AS (
SELECT DISTINCT e.user_id, DATE(e.event_time) AS activity_date
FROM events e
JOIN first_signup s USING (user_id)
WHERE e.event_time >= s.signup_ts + INTERVAL '1 day'
AND e.event_time < s.signup_ts + INTERVAL '2 days'
)
SELECT
s.signup_date,
COUNT(*) AS signups,
COUNT(dr.user_id) AS d1_returned,
ROUND(100.0 * COUNT(dr.user_id) / NULLIF(COUNT(*),0), 2) AS d1_retention_pct
FROM first_signup s
LEFT JOIN d1_return dr ON dr.user_id = s.user_id
GROUP BY 1
ORDER BY 1;
Patterns and recipes
Deduplicating events
-- Example: dedup by event_id if present; if not, use a surrogate key
WITH ranked AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY COALESCE(event_id, CONCAT(user_id, ':', event_name, ':', DATE_TRUNC('second', event_time)))
ORDER BY event_time
) AS rn
FROM events
)
SELECT * FROM ranked WHERE rn = 1;
Deriving sessions with a 30-minute gap
WITH sequenced AS (
SELECT
user_id,
event_time,
LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_time
FROM events
), sessionized AS (
SELECT
*,
CASE WHEN prev_time IS NULL OR event_time - prev_time > INTERVAL '30 minutes' THEN 1 ELSE 0 END AS new_session_flag
FROM sequenced
)
SELECT
user_id,
event_time,
SUM(new_session_flag) OVER (PARTITION BY user_id ORDER BY event_time) AS session_number
FROM sessionized;
Extracting JSON properties safely
-- Adapt function names to your warehouse
SELECT
user_id,
event_time,
event_name,
-- Example approaches (pick one your warehouse supports):
/* BigQuery */ JSON_EXTRACT_SCALAR(properties, '$.plan') AS plan,
/* Snowflake */ TRY_TO_VARCHAR(parse_json(properties):plan) AS plan_sf,
/* Postgres */ (properties::jsonb ->> 'plan') AS plan_pg
FROM events;
Ordering events and comparing steps
SELECT
user_id,
event_time,
event_name,
LAG(event_name) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_event,
LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_time
FROM events;
Exercises you can try now
These mirror the exercises below. Try them before opening solutions.
Exercise 1 — Weekly signup-to-activation funnel
Use a single events table with columns: event_id, user_id, event_name, event_time (UTC), session_id, properties (JSON), country.
- Define the first signup per user (deduplicate by taking the earliest event_time for event_name='signup').
- Find if the same user has event_name='activate' within 7 days after their first signup.
- Aggregate by the signup week (DATE_TRUNC('week', signup_time)).
- Output: week_start, signups, activations_within_7d, activation_rate_pct.
Hints
- Two CTEs: first_signup and activation.
- Join activation back to first_signup on user_id.
- Use NULLIF in the rate denominator to avoid division by zero.
- Checklist before you run it:
- Did you convert timestamps consistently (UTC or your business time zone)?
- Is each user counted once for signup?
- Is activation strictly after signup and within 7 days?
Common mistakes and how to self-check
- Mixing grains: joining event-level rows to user-level aggregates without grouping first. Self-check: Is your output at user, cohort, or event level? Ensure GROUP BY matches.
- Timezone drift: grouping by UTC date when your business day is another time zone. Self-check: Compare counts in UTC vs business time zone for a sample day.
- Double-counted signups: counting multiple signup events for one user. Self-check: Count distinct users vs total signup events.
- Unordered funnels: allowing purchase before view_item to count. Self-check: Enforce time windows and ordering with MIN timestamps.
- Ignoring late events: computing recent metrics without a watermark. Self-check: Track a "data freshness" timestamp and re-run the last few days/weeks.
Practical projects
- Event mart: Build a cleaned events view with deduplication, standardized time zone, and extracted common JSON fields (plan, amount, source).
- Activation dashboard: Weekly signup → activation within 7 days by channel and plan.
- Retention cohorts: D1, W1, and D7 retention by acquisition source and platform.
- Session analytics: Derive sessions (30-minute gap) and report average session length and events per session.
Mini challenge
In your events table, a user can have multiple 'activate' events. Modify the Example 2 funnel so that for each user you use only their first 'activate' and first 'purchase' after activation. Then compute view → activate and activate → purchase conversion. Write it in one query with CTEs.
Learning path
- Understand event grain, keys, and time zones.
- Master window functions (ROW_NUMBER, LAG/LEAD) for deduping and ordering.
- Extract and normalize JSON properties you use often.
- Build funnels with time windows and proper ordering.
- Compute first-touch cohorts and retention.
- Sessionization and engagement metrics.
- QA checks: duplicates, late data, and reconciliation vs. source counts.
Who this is for
- Product Analysts and Data Analysts who report product metrics.
- New analysts learning event-driven analytics.
- PMs with SQL basics who want reliable self-serve analysis.
Prerequisites
- Basic SQL (SELECT, WHERE, GROUP BY, JOIN).
- Comfort with timestamps and date functions.
- Optional: JSON extraction functions in your warehouse.
Next steps
- Do the exercise above, then take the Quick Test below.
- The test is available to everyone. If you log in, your progress will be saved.
- Apply the patterns to your own product events and publish a short metrics note (assumptions, definitions, and caveats).