Why this matters
User identity resolution stitches events from the same person across devices, browsers, and sessions. As a Product Analyst, you need trustworthy funnels, retention, and attribution. Without correct identity, you will miscount users, overstate drop-offs, and miss cohorts (e.g., a user browses on web and purchases in app).
- Real tasks you will face:
- Designing login/logout tracking so anonymous and known identities merge correctly.
- Auditing a drop in conversion that is actually duplicate user counting.
- Joining CRM users (email) to product analytics (device IDs) for lifecycle and LTV analysis.
Who this is for
Product Analysts and adjacent roles (Product Managers, Data Engineers) who work with event data and need accurate user-level insights.
Prerequisites
- Basic event tracking concepts (events, properties, user properties).
- Familiarity with login flows and session concepts.
- Beginner SQL (SELECT, JOIN, GROUP BY).
Concept explained simply
Identity resolution is the process of saying: these records belong to the same person. We do this by connecting identifiers.
- Deterministic: exact matches (same user_id, same email hash). Highest accuracy.
- Probabilistic: likely matches (same IP/device fingerprint/time overlap). Use carefully; prefer deterministic for product analytics.
Jargon decoder
- anonymous_id: Temporary ID for a device/browser before login.
- device_id: Stable ID per device or install (e.g., mobile app installation ID).
- user_id: Stable ID for a logged-in account (comes from your auth system).
- identify / alias / setUser: Vendor-specific calls to attach traits and link anonymous to known identity.
- reset / signOut: Clears local identity context so the next user doesn’t inherit someone else’s data.
Mental model
Imagine an identity graph. Each node is an identifier (anonymous_id, device_id, user_id, email hash). Edges form when events show two identifiers used by the same person in a trusted moment (e.g., login). The graph lets you traverse from any identifier to a canonical user.
Practical rule: A login event with both anonymous_id (or device_id) and user_id is a trusted “merge moment”.
Instrumentation essentials
- On first visit/open: generate anonymous_id (web) or device_id (mobile). Track events with it.
- On successful login/sign-up:
- Send an identify/alias event that links the current anonymous_id/device_id to user_id.
- Future events use user_id as the primary identifier; keep device_id too.
- On logout/switch account:
- Send a signOut/reset event and clear local identity so a new anonymous_id is created.
- On reinstall or cookie clear: expect a new anonymous_id/device_id and merge again upon next login.
Implementation notes to avoid pain later
- Always include timestamps, user_id (if known), device_id/anonymous_id, and event_id (a unique UUID) in each event.
- Emit a dedicated login event that includes both user_id and the pre-login anonymous_id/device_id to create a deterministic edge.
- Keep a user_traits object (email hash, signup date, plan) updated via identify calls.
- Do not merge different user_ids unless your business rules clearly allow account linking.
Worked examples
Example 1: Web anon to logged-in merge
- User visits site (no login). Events carry anonymous_id=A123.
- User signs up and is assigned user_id=U777.
- Login event includes anonymous_id=A123 and user_id=U777.
- Identify call links A123 to U777. Future events use user_id=U777 (and still carry device_id/anonymous_id for context).
Result: Pre-login events and post-login events are counted as one user U777.
Example 2: Mobile app reinstall
- Before reinstall: device_id=D555, user not logged in; events tracked with D555.
- Reinstall: new device_id=D999 appears.
- User logs in with user_id=U777. Login event contains user_id=U777 and device_id=D999.
- System links D999 to U777; past D555 data remains attached if U777 had previously logged in on D555.
Result: Two device_ids (D555 and D999) resolve to one user U777.
Example 3: CRM import to analytics
- CRM provides user_id=U42, email hash=Eabc, signup_date.
- Analytics has anonymous events with email captured on a purchase event (hashed) Eabc.
- Deterministic join on email hash links Eabc to U42.
- Backfill unify so historical anonymous events with Eabc attach to U42.
Result: Marketing and product events unify for a single user profile.
Data modeling and queries
Keep two structures:
- events table: one row per event with user_id (nullable), device_id/anonymous_id, event_id, event_name, timestamp.
- identity_edges table: each row links an identifier to a user_id at a specific time (e.g., from login, alias, CRM load).
Simple SQL-like unification (pseudocode):
-- Build the latest mapping of identifier -> user_id
WITH latest_edges AS (
SELECT identifier_type, identifier_value, user_id,
ROW_NUMBER() OVER (PARTITION BY identifier_type, identifier_value ORDER BY edge_time DESC) AS rn
FROM identity_edges
),
resolved_edges AS (
SELECT * FROM latest_edges WHERE rn = 1
)
SELECT e.event_id,
COALESCE(e.user_id, re.user_id) AS resolved_user_id,
e.device_id,
e.anonymous_id,
e.event_name,
e.timestamp
FROM events e
LEFT JOIN resolved_edges re
ON (re.identifier_type = 'anonymous_id' AND re.identifier_value = e.anonymous_id)
OR (re.identifier_type = 'device_id' AND re.identifier_value = e.device_id);
Then aggregate by resolved_user_id for funnels, retention, and LTV.
Privacy and consent safeguards
- Respect consent: do not link identifiers until the user grants consent where required.
- Hash emails before sending to analytics tools that shouldn’t store PII in plain text.
- Provide a user deletion path: when a user requests deletion, remove edges and events referencing them.
- Avoid probabilistic matching for decision-critical metrics unless legal and validated.
Common mistakes and how to self-check
- Mistake: Not sending a dedicated login/alias event. Outcome: pre- and post-login events remain split. Fix: add a merge event that includes both IDs.
- Mistake: Forgetting to reset on logout. Outcome: sessions from multiple people on shared devices merge incorrectly. Fix: call reset/signOut and clear cookies/storage.
- Mistake: Over-merging different accounts. Outcome: corrupted user history. Fix: merge only on deterministic keys and clear rules.
- Mistake: Missing event_id. Outcome: duplicate events after retries. Fix: include idempotent event_id.
Self-check routine
- Pick a recent purchase. Can you trace all their events across web and app to one resolved_user_id?
- Log out/in on the same device with two different accounts. Do you see two distinct resolved_user_ids?
- Clear cookies, browse, then log in. Do anonymous events attach to the logged-in user in reporting within 24 hours?
Exercises
Do these to cement the skill. The Quick Test is available to everyone; only logged-in users will have progress saved.
- Exercise 1: Design an identity map for your product
List all identifiers you have (anonymous_id, device_id, user_id, email hash), when they appear, and which events create edges. Include login, signup, password reset, and logout. - Exercise 2: Write a unification query (SQL-like)
Given example events, write a query that produces one resolved_user_id per event using identity_edges (from login/alias) and fallbacks. - Exercise 3: Instrumentation plan for login/logout
Define the exact sequence of events and fields for login, signup, logout, and app reinstall. Include when to call identify/alias and when to reset.
Exercise checklist
- [ ] Each identifier has a clear source and format.
- [ ] Login flow shows the pre-login and post-login IDs in one event.
- [ ] Logout flow resets identifiers.
- [ ] SQL handles duplicates and picks latest edge.
Practical projects
- Audit an existing product: map current identity behavior, find gaps, and propose fixes.
- Build a backfill job: link historical anonymous events with known users using a trusted key (e.g., email hash).
- Create a QA playbook: steps for engineers and analysts to validate merges across environments.
Learning path
- Before: Event schemas, naming conventions, and client/server tracking basics.
- Now: Identity resolution flows (this page).
- Next: Cohort building, retention analysis, revenue attribution using resolved_user_id.
Mini challenge
Your web analytics shows 20% drop in conversion after a cookie policy update. Hypothesis: identity breaks between pre-login and checkout. In one paragraph, propose 3 checks and a quick fix that could restore correct merging without changing business logic.
Next steps
- Implement the login/alias event with both IDs in your staging environment.
- Run the unification query against last 7 days and compare metrics to current dashboards.
- Take the Quick Test below to confirm understanding. Note: The test is available to everyone; only logged-in users will have progress saved.