How to learn Automation Of Product Reports for Python pandas in Product Analyst for free

Why this matters

As a Product Analyst, you often need to deliver daily or weekly dashboards: signups, activation, conversion, retention, revenue by channel. Automating these with pandas saves hours, reduces manual errors, and makes results reproducible.

Daily metrics rollups (DAU, events, conversion).
Weekly channel performance and CAC proxy tracking.
Cohort summaries and retention tables for stakeholders.
Automated anomaly flags to catch data issues early.

Note: Everyone can take the exercises and quick test for free. Only logged-in users have their progress saved.

Concept explained simply

Think of automation as a small factory:

Inputs: raw files or queries (CSV, Parquet, API, SQL extract).
Transforms: clean, validate, aggregate in pandas.
Outputs: CSV/Excel files or charts saved to a folder.
Schedule: run daily/weekly with consistent parameters.

Mental model

Make the pipeline deterministic: the same input produces the same output. Parameterize the date and period, keep functions small, and add checks to fail fast when data looks wrong.

A practical workflow

Define KPIs and columns needed (inputs).
Write a small loader that picks the correct files (by date or latest).
Transform with pure pandas (groupby, pivot_table, joins).
Validate (row counts, unique keys, non-null rates).
Export tidy outputs with clear filenames that include the date.
Schedule and log basic run info (start/end, counts).

Worked examples

Example 1 — Daily metrics report (DAU, signups, conversion)

Goal: Generate a daily CSV/Excel with DAU, events, signups, and conversion.

import pandas as pd
from pathlib import Path
from datetime import date

# 1) Load latest events file
# Expect columns: user_id, event, ts (timestamp ISO)
files = sorted(Path("data").glob("events_*.csv"))
assert files, "No events files found in data/"
raw = pd.read_csv(files[-1])

# 2) Basic cleanup and typing
raw["ts"] = pd.to_datetime(raw["ts"], errors="coerce")
raw = raw.dropna(subset=["user_id", "event", "ts"])  # fail fast on malformed rows
raw["date"] = raw["ts"].dt.date

# 3) Aggregate
metrics = (
    raw.groupby("date").agg(
        dau=("user_id", "nunique"),
        events=("event", "count"),
        signups=("event", lambda s: (s == "signup").sum())
    ).reset_index()
)
metrics["conv_rate"] = (metrics["signups"] / metrics["dau"]).fillna(0).round(4)

# 4) By-event shape (optional breakdown)
by_event = (
    raw.pivot_table(
        index="date",
        columns="event",
        values="user_id",
        aggfunc=pd.Series.nunique,
        fill_value=0
    ).reset_index()
)

# 5) Validations
assert metrics["dau"].ge(0).all(), "DAU must be non-negative"
assert metrics["date"].is_unique, "Duplicate date aggregation detected"

# 6) Export
run_date = date.today().isoformat()
metrics.to_csv(f"out/daily_metrics_{run_date}.csv", index=False)
by_event.to_csv(f"out/daily_by_event_{run_date}.csv", index=False)
print("Saved daily reports for", run_date)

Result: two files in out/ with today’s date. Idempotent: re-running the same day rewrites the same files.

Example 2 — Weekly channel performance

Goal: Combine sessions with campaign UTM tags to summarize weekly users and signup rate by channel.

import pandas as pd

# Load (examples; adapt paths)
sessions = pd.read_csv("data/sessions.csv")  # user_id, started_at, utm_source
users = pd.read_csv("data/users.csv")        # user_id, signup_at

# Types
sessions["started_at"] = pd.to_datetime(sessions["started_at"], errors="coerce")
users["signup_at"] = pd.to_datetime(users["signup_at"], errors="coerce")

# Join to know who signed up
df = sessions.merge(users[["user_id", "signup_at"]], on="user_id", how="left")
df["week"] = df["started_at"].dt.to_period("W-SUN").dt.start_time

# Weekly by channel
weekly = (
    df.groupby(["week", "utm_source"]).agg(
        unique_users=("user_id", "nunique"),
        signups=("signup_at", lambda s: s.notna().sum())
    ).reset_index()
)
weekly["signup_rate"] = (weekly["signups"] / weekly["unique_users"]).fillna(0).round(4)

# Basic checks
assert set(["week", "utm_source", "unique_users", "signups", "signup_rate"]).issubset(weekly.columns)

weekly.to_csv("out/weekly_channel.csv", index=False)

Example 3 — Simple anomaly flags (day-over-day)

Goal: Flag suspicious drops or spikes in DAU beyond a threshold.

import pandas as pd

metrics = pd.read_csv("out/daily_metrics_2025-01-01.csv")  # or the latest produced
metrics["date"] = pd.to_datetime(metrics["date"]) 
metrics = metrics.sort_values("date")

metrics["dau_dod_pct"] = metrics["dau"].pct_change().round(4)
threshold = 0.3  # 30% change
metrics["flag_anomaly"] = metrics["dau_dod_pct"].abs().gt(threshold)

metrics.to_csv("out/daily_metrics_with_flags.csv", index=False)

This adds a boolean column flagging days to review.

Reusable snippets

Load latest file in a folder: use Path.glob and pick the last after sort.
Parameterize dates: accept a date string or default to today/yesterday.
Consistent exports: include date in filenames and keep tidy column names.
Validations: assert expected columns exist; assert unique keys; check row counts.

from datetime import date, timedelta

def default_run_date():
    return (date.today() - timedelta(days=1)).isoformat()

Quality checks that save you

Schema check: expected_cols ⊆ df.columns; if missing, stop.
Key uniqueness: per-day aggregates should have unique dates.
Range checks: metrics not negative; rates in [0,1].
Row-volume sanity: sudden 90% drop likely a data issue.

Scheduling options (lightweight)

Run the script daily/weekly using your OS scheduler.
Log start/end time and written filenames to a simple text log.
Keep scripts idempotent: same input, same output filenames.

Exercises

Complete these tasks to practice. The same tasks appear below in the Exercises section with expected outputs and solutions.

Exercise 1: Build a daily metrics automation that produces two CSVs: overall daily metrics and by-event breakdown, with clear date-stamped filenames.
Exercise 2: Turn your code into a reusable function with input validations and a date parameter (default to yesterday). Write one CSV named using the provided date.

Checklist:
- Uses pandas groupby or pivot_table.
- Includes at least two validation checks.
- Outputs have deterministic, date-stamped filenames.
- Script is re-runnable without manual edits.

Common mistakes and self-check

Mixing local time and UTC: convert timestamps once, consistently.
Appending day after day to one file without dedupe: prefer one file per date or a deduplicated append.
Silent schema drift: always assert expected columns.
Non-idempotent filenames like report_final.csv: include the run date instead.
Forgetting null handling before groupby: drop or fill sensibly.

Self-check: If you run your script twice with the same input, do you get identical outputs and counts? If not, fix parameters or filenames.

Practical projects

Automated daily activation report (signup to first key action within 7 days), exported to CSV.
Weekly channel dashboard with CAC proxy (spend CSV + attributed signups) and rates.
Retention snapshot: cohort by signup month, week 1/2/4 retention table with basic checks.

Mini challenge

Create a single command that, given a date (YYYY-MM-DD), produces three outputs: daily_metrics_DATE.csv, daily_by_event_DATE.csv, and daily_metrics_with_flags_DATE.csv. Include at least three validations. Keep it idempotent.

Hint path

Write a function run(date_str) and use it in if __name__ == "__main__".
Separate load, transform, validate, and export into small functions.
Use try/except to print a friendly message and exit non-zero on validation failure.

Who this is for

Product Analysts who need reliable, repeatable reporting.
Data-savvy PMs or analysts transitioning from spreadsheets to code.

Prerequisites

Comfort with Python basics (functions, files, dates).
Pandas essentials (DataFrame, groupby, merge, pivot_table).
Basic understanding of your product’s events and KPIs.

Learning path

Parameterize and validate: make scripts deterministic.
Aggregate and reshape: groupby and pivot_table fluency.
Export and schedule: produce tidy outputs on a cadence.
Add anomaly flags and basic run logs.

Next steps

Add more validations (duplicate user_id-date checks, min/max thresholds).
Bundle scripts into a small CLI (argparse) for teammates to use.
Later, consider orchestration tools once your logic stabilizes.

Menu

Automation Of Product Reports

Table of Contents

Why this matters

Concept explained simply

Mental model

A practical workflow

Worked examples

Reusable snippets

Quality checks that save you

Scheduling options (lightweight)

Exercises

Common mistakes and self-check

Practical projects

Mini challenge

Who this is for

Prerequisites

Learning path

Next steps

Practice Exercises

Daily metrics automation with breakdown

Instructions

Expected Output

Reusable function with date parameter and validations

Automation Of Product Reports — Quick Test

Have questions about Automation Of Product Reports?

AI Assistant