luvv to helpDiscover the Best Free Online Tools
Topic 1 of 8

Product Event Tables Understanding

Learn Product Event Tables Understanding for free with explanations, exercises, and a quick test (for Product Analyst).

Published: December 22, 2025 | Updated: December 22, 2025

Why this matters

As a Product Analyst, most core metrics (activation, conversion, retention, engagement) originate from product event tables. Understanding how these tables are structured, how to deduplicate, and how to join them correctly is essential for trustworthy dashboards, experiments, and product decisions.

  • Define the correct grain: one row per event.
  • Count DAU/WAU/MAU without double-counting.
  • Build funnels with time windows (e.g., sign up → activate within 7 days).
  • Compute retention from first-touch cohorts.
  • Handle duplicates, late events, and time zones safely.

Concept explained simply

A product event table is a chronological log of what users did in your product. Each row answers: who did what, when, and sometimes where/how.

Mental model

Think of an event table as a timestamped diary:

  • Who: user_id (and sometimes anonymous_id or device_id).
  • Did what: event_name (e.g., 'view_item', 'signup', 'purchase').
  • When: event_time (UTC timestamp).
  • Where/how: properties (JSON with page, plan, amount, etc.), device, country, session_id.

Everything you calculate later (funnels, retention, sessions) is built from this diary. If the diary is messy (duplicates, wrong time zone), your metrics will be off.

Core fields in product event tables

  • event_id: unique per event (may be missing or unreliable in some pipelines).
  • user_id (or anonymous_id): the actor; for logged-out traffic you may have device_id or anonymous_id.
  • event_name: the action (signup, activate, view_item, add_to_cart, purchase, click_cta, etc.).
  • event_time: UTC timestamp; convert to business time zone for daily/weekly grouping.
  • session_id: identifier for a session; if absent, you can derive sessions using time gaps (e.g., 30 minutes).
  • properties: semi-structured JSON (e.g., {"plan":"Pro","source":"email","amount":49.99}).
  • context fields: platform, device, country, page_url, referrer (if available).
Important data quality considerations
  • Grain: one row per event. Never aggregate the base table permanently; aggregate in queries.
  • Duplicates: event ingestion can retry; deduplicate with window functions.
  • Late events: some events arrive late; use watermarks or partition backfill windows.
  • Time zones: store in UTC; convert for reporting with a consistent business time zone.
  • Sessionization: define a gap (e.g., 30 minutes) and rebuild sessions if session_id is missing.

Worked examples

Example 1 — DAU, WAU, MAU from events

Goal: Count distinct active users by day/week/month based on any qualifying event.

-- Daily Active Users (DAU)
SELECT
  DATE(event_time AT TIME ZONE 'UTC') AS activity_date,
  COUNT(DISTINCT user_id) AS dau
FROM events
WHERE user_id IS NOT NULL
GROUP BY 1
ORDER BY 1;

-- Weekly Active Users (WAU) using calendar weeks
SELECT
  DATE_TRUNC('week', event_time AT TIME ZONE 'UTC') AS week_start,
  COUNT(DISTINCT user_id) AS wau
FROM events
WHERE user_id IS NOT NULL
GROUP BY 1
ORDER BY 1;

-- Rolling 30-day MAU snapshot per day
WITH per_day AS (
  SELECT DATE(event_time AT TIME ZONE 'UTC') AS d, user_id
  FROM events
  WHERE user_id IS NOT NULL
  GROUP BY 1, 2
)
SELECT
  d,
  COUNT(DISTINCT user_id) OVER (
    ORDER BY d
    RANGE BETWEEN INTERVAL '29 days' PRECEDING AND CURRENT ROW
  ) AS mau_rolling_30d
FROM (
  SELECT DISTINCT d FROM per_day
) days
JOIN per_day USING (d)
ORDER BY d;

Tip: If your warehouse lacks RANGE on intervals, compute MAU by joining a 30-day date window.

Example 2 — 3-step funnel within 7 days

Goal: Users who view_item → add_to_cart → purchase, in order, within 7 days of the first step.

WITH first_view AS (
  SELECT user_id, MIN(event_time) AS view_ts
  FROM events
  WHERE event_name = 'view_item'
  GROUP BY 1
), add_cart AS (
  SELECT e.user_id, MIN(e.event_time) AS cart_ts
  FROM events e
  JOIN first_view v USING (user_id)
  WHERE e.event_name = 'add_to_cart'
    AND e.event_time BETWEEN v.view_ts AND v.view_ts + INTERVAL '7 days'
  GROUP BY 1
), purchase AS (
  SELECT e.user_id, MIN(e.event_time) AS purchase_ts
  FROM events e
  JOIN add_cart c USING (user_id)
  WHERE e.event_name = 'purchase'
    AND e.event_time BETWEEN c.cart_ts AND c.cart_ts + INTERVAL '7 days'
  GROUP BY 1
)
SELECT
  COUNT(*) AS step1_viewers,
  (SELECT COUNT(*) FROM add_cart) AS step2_added,
  (SELECT COUNT(*) FROM purchase) AS step3_purchased,
  ROUND(100.0 * (SELECT COUNT(*) FROM add_cart) / NULLIF(COUNT(*),0), 2) AS view_to_cart_pct,
  ROUND(100.0 * (SELECT COUNT(*) FROM purchase) / NULLIF((SELECT COUNT(*) FROM add_cart),0), 2) AS cart_to_purchase_pct
FROM first_view;
Example 3 — D1 retention by signup cohort

Goal: Of users who signed up on a date, what percent returned the next day?

WITH first_signup AS (
  SELECT user_id, DATE(MIN(event_time)) AS signup_date, MIN(event_time) AS signup_ts
  FROM events
  WHERE event_name = 'signup'
  GROUP BY 1
), d1_return AS (
  SELECT DISTINCT e.user_id, DATE(e.event_time) AS activity_date
  FROM events e
  JOIN first_signup s USING (user_id)
  WHERE e.event_time >= s.signup_ts + INTERVAL '1 day'
    AND e.event_time < s.signup_ts + INTERVAL '2 days'
)
SELECT
  s.signup_date,
  COUNT(*) AS signups,
  COUNT(dr.user_id) AS d1_returned,
  ROUND(100.0 * COUNT(dr.user_id) / NULLIF(COUNT(*),0), 2) AS d1_retention_pct
FROM first_signup s
LEFT JOIN d1_return dr ON dr.user_id = s.user_id
GROUP BY 1
ORDER BY 1;

Patterns and recipes

Deduplicating events
-- Example: dedup by event_id if present; if not, use a surrogate key
WITH ranked AS (
  SELECT
    *,
    ROW_NUMBER() OVER (
      PARTITION BY COALESCE(event_id, CONCAT(user_id, ':', event_name, ':', DATE_TRUNC('second', event_time)))
      ORDER BY event_time
    ) AS rn
  FROM events
)
SELECT * FROM ranked WHERE rn = 1;
Deriving sessions with a 30-minute gap
WITH sequenced AS (
  SELECT
    user_id,
    event_time,
    LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_time
  FROM events
), sessionized AS (
  SELECT
    *,
    CASE WHEN prev_time IS NULL OR event_time - prev_time > INTERVAL '30 minutes' THEN 1 ELSE 0 END AS new_session_flag
  FROM sequenced
)
SELECT
  user_id,
  event_time,
  SUM(new_session_flag) OVER (PARTITION BY user_id ORDER BY event_time) AS session_number
FROM sessionized;
Extracting JSON properties safely
-- Adapt function names to your warehouse
SELECT
  user_id,
  event_time,
  event_name,
  -- Example approaches (pick one your warehouse supports):
  /* BigQuery */ JSON_EXTRACT_SCALAR(properties, '$.plan') AS plan,
  /* Snowflake */ TRY_TO_VARCHAR(parse_json(properties):plan) AS plan_sf,
  /* Postgres */ (properties::jsonb ->> 'plan') AS plan_pg
FROM events;
Ordering events and comparing steps
SELECT
  user_id,
  event_time,
  event_name,
  LAG(event_name) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_event,
  LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_time
FROM events;

Exercises you can try now

These mirror the exercises below. Try them before opening solutions.

Exercise 1 — Weekly signup-to-activation funnel

Use a single events table with columns: event_id, user_id, event_name, event_time (UTC), session_id, properties (JSON), country.

  • Define the first signup per user (deduplicate by taking the earliest event_time for event_name='signup').
  • Find if the same user has event_name='activate' within 7 days after their first signup.
  • Aggregate by the signup week (DATE_TRUNC('week', signup_time)).
  • Output: week_start, signups, activations_within_7d, activation_rate_pct.
Hints
  • Two CTEs: first_signup and activation.
  • Join activation back to first_signup on user_id.
  • Use NULLIF in the rate denominator to avoid division by zero.
  • Checklist before you run it:
    • Did you convert timestamps consistently (UTC or your business time zone)?
    • Is each user counted once for signup?
    • Is activation strictly after signup and within 7 days?

Common mistakes and how to self-check

  • Mixing grains: joining event-level rows to user-level aggregates without grouping first. Self-check: Is your output at user, cohort, or event level? Ensure GROUP BY matches.
  • Timezone drift: grouping by UTC date when your business day is another time zone. Self-check: Compare counts in UTC vs business time zone for a sample day.
  • Double-counted signups: counting multiple signup events for one user. Self-check: Count distinct users vs total signup events.
  • Unordered funnels: allowing purchase before view_item to count. Self-check: Enforce time windows and ordering with MIN timestamps.
  • Ignoring late events: computing recent metrics without a watermark. Self-check: Track a "data freshness" timestamp and re-run the last few days/weeks.

Practical projects

  • Event mart: Build a cleaned events view with deduplication, standardized time zone, and extracted common JSON fields (plan, amount, source).
  • Activation dashboard: Weekly signup → activation within 7 days by channel and plan.
  • Retention cohorts: D1, W1, and D7 retention by acquisition source and platform.
  • Session analytics: Derive sessions (30-minute gap) and report average session length and events per session.

Mini challenge

In your events table, a user can have multiple 'activate' events. Modify the Example 2 funnel so that for each user you use only their first 'activate' and first 'purchase' after activation. Then compute view → activate and activate → purchase conversion. Write it in one query with CTEs.

Learning path

  1. Understand event grain, keys, and time zones.
  2. Master window functions (ROW_NUMBER, LAG/LEAD) for deduping and ordering.
  3. Extract and normalize JSON properties you use often.
  4. Build funnels with time windows and proper ordering.
  5. Compute first-touch cohorts and retention.
  6. Sessionization and engagement metrics.
  7. QA checks: duplicates, late data, and reconciliation vs. source counts.

Who this is for

  • Product Analysts and Data Analysts who report product metrics.
  • New analysts learning event-driven analytics.
  • PMs with SQL basics who want reliable self-serve analysis.

Prerequisites

  • Basic SQL (SELECT, WHERE, GROUP BY, JOIN).
  • Comfort with timestamps and date functions.
  • Optional: JSON extraction functions in your warehouse.

Next steps

  • Do the exercise above, then take the Quick Test below.
  • The test is available to everyone. If you log in, your progress will be saved.
  • Apply the patterns to your own product events and publish a short metrics note (assumptions, definitions, and caveats).

Practice Exercises

1 exercises to complete

Instructions

Your warehouse has a single events table with columns: event_id, user_id, event_name, event_time (UTC), session_id, properties (JSON), country.

  1. Compute the first signup per user (event_name='signup').
  2. Find users who activated (event_name='activate') within 7 days after that first signup.
  3. Aggregate by signup week (DATE_TRUNC('week', signup_time)).
  4. Return: week_start, signups, activations_within_7d, activation_rate_pct.
Data quality rules
  • One signup per user: use MIN(event_time) for 'signup'.
  • Activation must be strictly after signup and < 7 days later.
  • Use consistent time zone (assume UTC here).
Expected Output
A weekly time series with columns: week_start (date), signups (int), activations_within_7d (int), activation_rate_pct (numeric, e.g., 42.35). Weeks with zero signups may be absent or show signups=0 depending on your calendar table.

Product Event Tables Understanding — Quick Test

Test your knowledge with 6 questions. Pass with 70% or higher.

6 questions70% to pass

Have questions about Product Event Tables Understanding?

AI Assistant

Ask questions about this tool