How to learn Funnel And Retention Computation for Python pandas in Product Analyst for free

Why this matters

As a Product Analyst, you will routinely answer questions like: Where do users drop off in the onboarding funnel? Which acquisition cohort retains better on Day 1 and Day 7? With pandas, you can compute funnels and retention quickly from raw event logs to support product decisions, prioritization, and experiment readouts.

Pinpoint friction in onboarding or checkout.
Compare retention across cohorts, features, or experiments.
Track progress after UI or policy changes.

Who this is for and prerequisites

Who this is for: Product Analysts and data-minded PMs who need reliable funnel and retention metrics from event logs.

Prerequisites:

Basic Python and pandas (DataFrame, groupby, pivot_table).
Date/time handling in pandas.
Comfort with unique user counting and event logs.

Learning path

Clean and normalize events (timestamps, user_id, event names).
Build strict sequential funnels (per-user first timestamps per step).
Compute cohort-based retention (D0/D1/D7… matrices).
Add rolling retention and session windows.
Segment by channel, country, or experiment variant.

Concept explained simply

Funnel: A sequence of steps users should complete (e.g., visit → signup → view → add_to_cart → purchase). We count how many users reach each step in order, and compute step-to-step conversion rates.

Retention: For each cohort (usually by first_seen or signup date), we measure what fraction of users return and do any activity on subsequent days/weeks (D0, D1, D7, etc.).

Mental model

Think of a funnel as a set of doors in a hallway. A user can proceed only if they pass each prior door in order.
Think of retention as how many users come back to the building after 1, 7, 30 days from their first visit.

Worked examples

Example 1: Strict sequential funnel counts and conversion

Events schema assumption: user_id, event, ts (UTC).

import pandas as pd
from datetime import datetime

data = [
    ("u1","landing","2024-01-01 09:00"),
    ("u1","signup","2024-01-01 09:01"),
    ("u1","view_product","2024-01-01 09:05"),
    ("u1","add_to_cart","2024-01-01 09:06"),
    ("u1","purchase","2024-01-01 09:10"),
    ("u2","landing","2024-01-01 09:02"),
    ("u2","view_product","2024-01-01 09:06"),
    ("u2","add_to_cart","2024-01-01 09:09"),
    ("u3","landing","2024-01-01 10:10"),
    ("u3","signup","2024-01-01 10:12"),
    ("u3","view_product","2024-01-01 10:20"),
    ("u4","landing","2024-01-01 11:00"),
    ("u4","signup","2024-01-01 11:02"),
    ("u4","view_product","2024-01-01 11:03"),
    ("u4","add_to_cart","2024-01-01 11:06"),
    ("u5","landing","2024-01-01 12:00"),
    ("u5","view_product","2024-01-01 12:04"),
    ("u5","purchase","2024-01-01 12:06"),  # skips signup and add_to_cart
    ("u6","landing","2024-01-01 12:10"),
]

df = pd.DataFrame(data, columns=["user_id","event","ts"])
df["ts"] = pd.to_datetime(df["ts"])

steps = ["landing","signup","view_product","add_to_cart","purchase"]
# Keep first occurrence per user per event
first_hits = (df.sort_values(["user_id","event","ts"]) 
               .drop_duplicates(["user_id","event"], keep="first"))
# Pivot to wide: one column per step with first timestamp
wide = first_hits.pivot_table(index="user_id", columns="event", values="ts", aggfunc="min")[steps]

# Build sequential masks: user counts for each step require previous steps in order
mask = wide.notna().copy()
for i in range(1, len(steps)):
    prev, cur = steps[i-1], steps[i]
    mask[cur] = mask[cur] & mask[prev] & (wide[cur] >= wide[prev])

counts = {s: int(mask[s].sum()) for s in steps}

# Step-to-step conversion
conv = {}
for i in range(len(steps)-1):
    a, b = steps[i], steps[i+1]
    conv[f"{a}→{b}"] = (counts[b] / counts[a]) if counts[a] else 0.0

overall = counts[steps[-1]] / counts[steps[0]]

out = pd.DataFrame({
    "step": steps,
    "users": [counts[s] for s in steps]
})
print(out)
print(pd.Series(conv))
print({"overall_conversion": overall})

Key points:

We count each user once per step (first timestamp).
Sequential constraint ensures nobody “jumps” steps.

What to watch for

Timezone normalization before grouping.
Consistent event naming (e.g., add_to_cart vs add).

Example 2: Cohort retention matrix (D0/D1/D2)

import pandas as pd

events = [
    ("u1","signup","2024-01-01"),
    ("u1","open","2024-01-01"),
    ("u1","open","2024-01-02"),
    ("u1","open","2024-01-03"),
    ("u2","signup","2024-01-01"),
    ("u2","open","2024-01-01"),
    ("u2","open","2024-01-03"),
    ("u3","signup","2024-01-02"),
    ("u3","open","2024-01-02"),
    ("u3","open","2024-01-03"),
    ("u4","signup","2024-01-02"),
    ("u4","open","2024-01-02"),
]

df = pd.DataFrame(events, columns=["user_id","event","date"])
df["date"] = pd.to_datetime(df["date"]).dt.normalize()

first_seen = (df[df.event=="signup"]
              .sort_values(["user_id","date"]) 
              .drop_duplicates("user_id")
              .rename(columns={"date":"cohort_date"})
              .loc[:, ["user_id","cohort_date"]])

activity = df.merge(first_seen, on="user_id", how="left")
activity["age_days"] = (activity["date"] - activity["cohort_date"]).dt.days

# Keep activity events only (including D0) for retention
act = activity[activity["event"] != "signup"].copy()

ret_counts = (act.groupby(["cohort_date","age_days"]).user_id.nunique()
                .unstack("age_days", fill_value=0))

# Convert to rates by dividing by cohort size
cohort_size = first_seen.groupby("cohort_date").user_id.nunique()
ret_rate = ret_counts.div(cohort_size, axis=0).round(3)

print(ret_counts[[0,1,2]])
print(ret_rate[[0,1,2]])

Tip: D0 is activity on the signup day. Many teams report both counts and rates.

Example 3: 7-day rolling retention

Definition used here: user is retained on Day 7 if they had any activity between day 1 and day 7 after first_seen (inclusive).

import pandas as pd

df = pd.DataFrame({
    "user_id": ["u1","u1","u2","u2","u3"],
    "event":   ["signup","open","signup","open","signup"],
    "date":    pd.to_datetime(["2024-01-01","2024-01-04","2024-01-01","2024-01-10","2024-01-02"]).normalize()
})

first_seen = (df[df.event=="signup"].sort_values(["user_id","date"]) 
                                .drop_duplicates("user_id")
                                .rename(columns={"date":"cohort_date"})
                                [["user_id","cohort_date"]])

act = df.merge(first_seen, on="user_id", how="left")
act["age_days"] = (act["date"] - act["cohort_date"]).dt.days

# Retained within (1..7) days after first_seen
retained = (act[(act.event!="signup") & (act.age_days.between(1,7))]
              .user_id.drop_duplicates())

cohort_users = first_seen.user_id.drop_duplicates()
r7_rate = round(len(set(retained)) / len(set(cohort_users)), 3)
print({"R7_rolling": r7_rate})

Exercises (practice inside your notebook)

These mirror the exercises listed below. Aim to write clean, reusable code.

Sequential funnel. Build a strict funnel counts table and step conversions.
Retention matrix. Produce D0/D1/D2 counts and rates for two cohorts.

Checklist before you finish:
- Events normalized to UTC dates.
- First occurrence per user per step used.
- Sequential constraint enforced for funnels.
- Retention computed from signup cohorts with D0 included.
- nunique used to avoid double-counting users per cell.

Common mistakes and how to self-check

Counting events instead of users. Fix: use nunique(user_id) or drop_duplicates([user_id, step]).
Ignoring sequence. Fix: compare first timestamps per step and enforce nondecreasing order.
Mixed timezones or dates. Fix: normalize to UTC and to midnight where appropriate.
Missing D0 in retention. Fix: include activity on signup day for D0.
Denominator drift. Fix: in step conversion, denominator is the previous step count under sequential rules.

Self-check prompts

If you shuffle events randomly, do your counts change? They should not.
If a user performs purchase without prior steps, do they appear in earlier steps? They should not.
Do cohort sizes match the number of unique signups per cohort_date?

Practical projects

Onboarding funnel review: build weekly funnel for visit → signup → complete_profile → first_action; ship a one-page summary with insights and proposed fixes.
Retention deep-dive: compute D0/D1/D7 retention by acquisition channel; identify two hypotheses to improve D1.
Experiment readout template: parameterize your funnel and retention functions to compare control vs variant.

Quick Test

Anyone can take the Quick Test for free. If you are logged in, your score and progress will be saved.

Next steps

Parameterize steps and windows so the same code works for multiple products.
Add segmentation (country, device, campaign) to funnels and retention.
Automate weekly reporting and anomaly watch (e.g., sudden funnel step drops).

Mini challenge

You have events for a “start_trial → add_payment → convert_paid” funnel. Build a function that accepts an ordered list of steps and returns a DataFrame with users per step and step-to-step conversion. Then extend it to filter by acquisition_channel and compare step conversions across channels. Keep the same sequential rule: users must pass each prior step in order.

Instructions

Create the DataFrame below, enforce the sequential order visit → signup → view → add → purchase, and output users per step and step-to-step conversion rates.

import pandas as pd

events = [
    ("u1","2024-01-01 09:00","visit"),
    ("u1","2024-01-01 09:01","signup"),
    ("u1","2024-01-01 09:05","view"),
    ("u1","2024-01-01 09:06","add"),
    ("u1","2024-01-01 09:10","purchase"),
    ("u2","2024-01-01 09:02","visit"),
    ("u2","2024-01-01 09:06","view"),
    ("u2","2024-01-01 09:09","add"),
    ("u3","2024-01-01 10:10","visit"),
    ("u3","2024-01-01 10:12","signup"),
    ("u3","2024-01-01 10:20","view"),
    ("u4","2024-01-01 11:00","visit"),
    ("u4","2024-01-01 11:02","signup"),
    ("u4","2024-01-01 11:03","view"),
    ("u4","2024-01-01 11:06","add"),
    ("u5","2024-01-01 12:00","visit"),
    ("u5","2024-01-01 12:04","view"),
    ("u5","2024-01-01 12:06","purchase"),
    ("u6","2024-01-01 12:10","visit"),
]

df = pd.DataFrame(events, columns=["user_id","ts","event"])

Output a small summary DataFrame and a dictionary of conversion rates.

Menu

Funnel And Retention Computation

Table of Contents

Why this matters

Who this is for and prerequisites

Learning path

Concept explained simply

Worked examples

Example 1: Strict sequential funnel counts and conversion

Example 2: Cohort retention matrix (D0/D1/D2)

Example 3: 7-day rolling retention

Exercises (practice inside your notebook)

Common mistakes and how to self-check

Practical projects

Quick Test

Next steps

Mini challenge

Practice Exercises

Build a strict sequential funnel and compute conversions

Instructions

Expected Output

Compute a D0/D1/D2 retention matrix

Funnel And Retention Computation — Quick Test

Have questions about Funnel And Retention Computation?

AI Assistant