Why this matters
Product Analysts constantly merge data from different levels: users (who someone is) and sessions (what happened during a visit). Clean joins prevent inflated metrics, wrong funnels, and misleading attribution.
- Attribution: add session_source to events to see which channels drive activation.
- Conversion rates: join user properties (plan, country) to sessions to segment results.
- Retention: connect events to the right session and user to measure meaningful cohorts.
Who this is for
- Product Analysts who work with web/app analytics and event data.
- Anyone writing SQL over user, session, and event tables.
Prerequisites
- Comfort with SELECT, WHERE, GROUP BY, JOIN basics.
- Know primary keys like user_id and session_id in your dataset.
- Understand timestamps and simple window functions (ROW_NUMBER).
Concept explained simply
User-level tables describe people (one row per user). Session-level tables describe visits (many sessions per user). Event-level tables describe actions (many events per session).
- User-level join: join by user_id. Safe when bringing user attributes to sessions or events.
- Session-level join: join by session_id. Use this to attach session attributes (source, device) to events.
- Time-aware join: when an attribute changes over time (plan), choose the version valid at the event timestamp.
Mental model
Think in levels and directions:
- Top-down: start at the lowest granularity you need for the metric. For conversion rate by session_source, start at sessions; for button CTR, start at events.
- Bring attributes down, not aggregates up. Join descriptive columns to the level you analyze, then aggregate at that level. If you must bring aggregates, pre-aggregate first.
- Protect counts. After joins, count distinct keys of the level you care about.
Worked examples
Example 1 — Add user attributes to sessions (user-level to session-level)
Goal: Add user country to each session.
Tables
users(user_id, signup_dt, country)
sessions(session_id, user_id, session_start, session_end, session_source)
SELECT
s.session_id,
s.user_id,
s.session_start,
s.session_source,
u.country
FROM sessions s
LEFT JOIN users u
ON s.user_id = u.user_id;
Notes: LEFT JOIN keeps all sessions, even if a user record is missing. Safe because it is many sessions to one user row.
Example 2 — Attach session_source to events (session-level to event-level)
Goal: See which session_source generated events. Avoid inflating session counts.
Tables
events(event_id, session_id, user_id, event_name, ts)
sessions(session_id, user_id, session_source)
-- Correctly attribute events while protecting session counts later
WITH events_with_src AS (
SELECT e.*, s.session_source
FROM events e
LEFT JOIN sessions s
ON e.session_id = s.session_id
)
SELECT session_source, COUNT(*) AS events
FROM events_with_src
GROUP BY session_source;
-- If you later need sessions by source AFTER joining events, protect with DISTINCT
SELECT session_source, COUNT(DISTINCT session_id) AS sessions
FROM events_with_src
GROUP BY session_source;
Key idea: joining events to sessions multiplies rows (one session to many events). Use COUNT(DISTINCT session_id) when counting sessions after such joins.
Example 3 — Time-aware join: plan at the moment of the event
Goal: For each event, attach the user plan that was active at event time.
Tables
plans(user_id, plan, valid_from) -- changes over time
events(event_id, user_id, ts)
WITH plans_versioned AS (
SELECT
p.user_id,
p.plan,
p.valid_from,
LEAD(p.valid_from) OVER (PARTITION BY p.user_id ORDER BY p.valid_from) AS valid_to
FROM plans p
),
plan_at_event AS (
SELECT
e.event_id,
e.user_id,
e.ts,
pv.plan
FROM events e
LEFT JOIN plans_versioned pv
ON e.user_id = pv.user_id
AND e.ts >= pv.valid_from
AND (pv.valid_to IS NULL OR e.ts < pv.valid_to)
)
SELECT * FROM plan_at_event;
Alternative: use ROW_NUMBER with valid_from <= ts and keep the latest row at or before ts.
Common mistakes and self-checks
- Counting sessions after joining to events without DISTINCT. Self-check: compare counts before and after join; they should match when using COUNT(DISTINCT session_id).
- Joining events to users directly and assuming session attributes are preserved. Self-check: to bring session attributes, join through session_id.
- Ignoring time validity for changing attributes (plan, pricing). Self-check: verify that the chosen plan existed at event timestamp.
- Using INNER JOIN and losing data. Self-check: for coverage metrics, use LEFT JOIN from the base level (sessions or users) to optional data (events).
- Duplicating user rows by joining to multi-row user attributes without pre-aggregation. Self-check: pre-aggregate or window to one row per user before joining.
Learning path
- Review your data model: identify primary keys and grain for users, sessions, events.
- Practice safe joins: user-to-session, session-to-event, and time-aware joins.
- Protect metrics: use DISTINCT and pre-aggregations where needed.
- Validate with spot-checks: compare pre/post join counts.
Practical projects
- Build a session-to-signup funnel with session_source attribution.
- Create an events dashboard where each event includes user country and current plan at event time.
- Compute conversion rate by device and plan, ensuring correct grains.
Exercises
Complete these in your SQL environment. A Quick Test is at the end; test is available to everyone. Only logged-in users get saved progress.
- Exercise 1: Add user country to sessions and compute sessions by country.
- Exercise 2: Attribute purchases to session_source without inflating session counts.
Checklist before you run your queries
- Did you choose the correct base grain (users, sessions, or events)?
- Are you using LEFT JOIN from the base table to optional tables?
- If counting sessions after joining to events, are you using COUNT(DISTINCT session_id)?
- If attributes change over time, did you ensure time-valid selection?
Mini challenge
Create a query that returns, for each session, whether the user made their first-ever purchase in that session and what the session_source was. Hint: identify the user’s first purchase event with ROW_NUMBER over user_id by timestamp, then compare its session_id to each session.
Next steps
- Apply these joins to your product metrics and validate results against trusted dashboards.
- Share a snippet with teammates and ask for a grain and counts review.
- Proceed to more advanced topics: windowed attribution, cohort retention joins.