luvv to helpDiscover the Best Free Online Tools
Topic 2 of 8

Automation Of Product Reports

Learn Automation Of Product Reports for free with explanations, exercises, and a quick test (for Product Analyst).

Published: December 22, 2025 | Updated: December 22, 2025

Why this matters

As a Product Analyst, you often need to deliver daily or weekly dashboards: signups, activation, conversion, retention, revenue by channel. Automating these with pandas saves hours, reduces manual errors, and makes results reproducible.

  • Daily metrics rollups (DAU, events, conversion).
  • Weekly channel performance and CAC proxy tracking.
  • Cohort summaries and retention tables for stakeholders.
  • Automated anomaly flags to catch data issues early.

Note: Everyone can take the exercises and quick test for free. Only logged-in users have their progress saved.

Concept explained simply

Think of automation as a small factory:

  • Inputs: raw files or queries (CSV, Parquet, API, SQL extract).
  • Transforms: clean, validate, aggregate in pandas.
  • Outputs: CSV/Excel files or charts saved to a folder.
  • Schedule: run daily/weekly with consistent parameters.

Mental model

Make the pipeline deterministic: the same input produces the same output. Parameterize the date and period, keep functions small, and add checks to fail fast when data looks wrong.

A practical workflow

  1. Define KPIs and columns needed (inputs).
  2. Write a small loader that picks the correct files (by date or latest).
  3. Transform with pure pandas (groupby, pivot_table, joins).
  4. Validate (row counts, unique keys, non-null rates).
  5. Export tidy outputs with clear filenames that include the date.
  6. Schedule and log basic run info (start/end, counts).

Worked examples

Example 1 — Daily metrics report (DAU, signups, conversion)

Goal: Generate a daily CSV/Excel with DAU, events, signups, and conversion.

import pandas as pd
from pathlib import Path
from datetime import date

# 1) Load latest events file
# Expect columns: user_id, event, ts (timestamp ISO)
files = sorted(Path("data").glob("events_*.csv"))
assert files, "No events files found in data/"
raw = pd.read_csv(files[-1])

# 2) Basic cleanup and typing
raw["ts"] = pd.to_datetime(raw["ts"], errors="coerce")
raw = raw.dropna(subset=["user_id", "event", "ts"])  # fail fast on malformed rows
raw["date"] = raw["ts"].dt.date

# 3) Aggregate
metrics = (
    raw.groupby("date").agg(
        dau=("user_id", "nunique"),
        events=("event", "count"),
        signups=("event", lambda s: (s == "signup").sum())
    ).reset_index()
)
metrics["conv_rate"] = (metrics["signups"] / metrics["dau"]).fillna(0).round(4)

# 4) By-event shape (optional breakdown)
by_event = (
    raw.pivot_table(
        index="date",
        columns="event",
        values="user_id",
        aggfunc=pd.Series.nunique,
        fill_value=0
    ).reset_index()
)

# 5) Validations
assert metrics["dau"].ge(0).all(), "DAU must be non-negative"
assert metrics["date"].is_unique, "Duplicate date aggregation detected"

# 6) Export
run_date = date.today().isoformat()
metrics.to_csv(f"out/daily_metrics_{run_date}.csv", index=False)
by_event.to_csv(f"out/daily_by_event_{run_date}.csv", index=False)
print("Saved daily reports for", run_date)

Result: two files in out/ with today’s date. Idempotent: re-running the same day rewrites the same files.

Example 2 — Weekly channel performance

Goal: Combine sessions with campaign UTM tags to summarize weekly users and signup rate by channel.

import pandas as pd

# Load (examples; adapt paths)
sessions = pd.read_csv("data/sessions.csv")  # user_id, started_at, utm_source
users = pd.read_csv("data/users.csv")        # user_id, signup_at

# Types
sessions["started_at"] = pd.to_datetime(sessions["started_at"], errors="coerce")
users["signup_at"] = pd.to_datetime(users["signup_at"], errors="coerce")

# Join to know who signed up
df = sessions.merge(users[["user_id", "signup_at"]], on="user_id", how="left")
df["week"] = df["started_at"].dt.to_period("W-SUN").dt.start_time

# Weekly by channel
weekly = (
    df.groupby(["week", "utm_source"]).agg(
        unique_users=("user_id", "nunique"),
        signups=("signup_at", lambda s: s.notna().sum())
    ).reset_index()
)
weekly["signup_rate"] = (weekly["signups"] / weekly["unique_users"]).fillna(0).round(4)

# Basic checks
assert set(["week", "utm_source", "unique_users", "signups", "signup_rate"]).issubset(weekly.columns)

weekly.to_csv("out/weekly_channel.csv", index=False)
Example 3 — Simple anomaly flags (day-over-day)

Goal: Flag suspicious drops or spikes in DAU beyond a threshold.

import pandas as pd

metrics = pd.read_csv("out/daily_metrics_2025-01-01.csv")  # or the latest produced
metrics["date"] = pd.to_datetime(metrics["date"]) 
metrics = metrics.sort_values("date")

metrics["dau_dod_pct"] = metrics["dau"].pct_change().round(4)
threshold = 0.3  # 30% change
metrics["flag_anomaly"] = metrics["dau_dod_pct"].abs().gt(threshold)

metrics.to_csv("out/daily_metrics_with_flags.csv", index=False)

This adds a boolean column flagging days to review.

Reusable snippets

  • Load latest file in a folder: use Path.glob and pick the last after sort.
  • Parameterize dates: accept a date string or default to today/yesterday.
  • Consistent exports: include date in filenames and keep tidy column names.
  • Validations: assert expected columns exist; assert unique keys; check row counts.
from datetime import date, timedelta

def default_run_date():
    return (date.today() - timedelta(days=1)).isoformat()

Quality checks that save you

  • Schema check: expected_cols ⊆ df.columns; if missing, stop.
  • Key uniqueness: per-day aggregates should have unique dates.
  • Range checks: metrics not negative; rates in [0,1].
  • Row-volume sanity: sudden 90% drop likely a data issue.

Scheduling options (lightweight)

  • Run the script daily/weekly using your OS scheduler.
  • Log start/end time and written filenames to a simple text log.
  • Keep scripts idempotent: same input, same output filenames.

Exercises

Complete these tasks to practice. The same tasks appear below in the Exercises section with expected outputs and solutions.

  1. Exercise 1: Build a daily metrics automation that produces two CSVs: overall daily metrics and by-event breakdown, with clear date-stamped filenames.
  2. Exercise 2: Turn your code into a reusable function with input validations and a date parameter (default to yesterday). Write one CSV named using the provided date.
  • Checklist:
    • Uses pandas groupby or pivot_table.
    • Includes at least two validation checks.
    • Outputs have deterministic, date-stamped filenames.
    • Script is re-runnable without manual edits.

Common mistakes and self-check

  • Mixing local time and UTC: convert timestamps once, consistently.
  • Appending day after day to one file without dedupe: prefer one file per date or a deduplicated append.
  • Silent schema drift: always assert expected columns.
  • Non-idempotent filenames like report_final.csv: include the run date instead.
  • Forgetting null handling before groupby: drop or fill sensibly.

Self-check: If you run your script twice with the same input, do you get identical outputs and counts? If not, fix parameters or filenames.

Practical projects

  • Automated daily activation report (signup to first key action within 7 days), exported to CSV.
  • Weekly channel dashboard with CAC proxy (spend CSV + attributed signups) and rates.
  • Retention snapshot: cohort by signup month, week 1/2/4 retention table with basic checks.

Mini challenge

Create a single command that, given a date (YYYY-MM-DD), produces three outputs: daily_metrics_DATE.csv, daily_by_event_DATE.csv, and daily_metrics_with_flags_DATE.csv. Include at least three validations. Keep it idempotent.

Hint path
  • Write a function run(date_str) and use it in if __name__ == "__main__".
  • Separate load, transform, validate, and export into small functions.
  • Use try/except to print a friendly message and exit non-zero on validation failure.

Who this is for

  • Product Analysts who need reliable, repeatable reporting.
  • Data-savvy PMs or analysts transitioning from spreadsheets to code.

Prerequisites

  • Comfort with Python basics (functions, files, dates).
  • Pandas essentials (DataFrame, groupby, merge, pivot_table).
  • Basic understanding of your product’s events and KPIs.

Learning path

  1. Parameterize and validate: make scripts deterministic.
  2. Aggregate and reshape: groupby and pivot_table fluency.
  3. Export and schedule: produce tidy outputs on a cadence.
  4. Add anomaly flags and basic run logs.

Next steps

  • Add more validations (duplicate user_id-date checks, min/max thresholds).
  • Bundle scripts into a small CLI (argparse) for teammates to use.
  • Later, consider orchestration tools once your logic stabilizes.

Practice Exercises

2 exercises to complete

Instructions

Build a script that reads the latest events_*.csv from data/, computes daily DAU, events, signups, conversion rate, and a by-event breakdown. Export:

  • out/daily_metrics_YYYY-MM-DD.csv
  • out/daily_by_event_YYYY-MM-DD.csv

Assume columns in events CSV: user_id, event, ts.

Include at least two validations (e.g., expected columns, non-negative counts, unique days).

Expected Output
Two CSV files in out/ with the same YYYY-MM-DD in the filename. daily_metrics has columns: date, dau, events, signups, conv_rate. daily_by_event has date plus one column per event.

Automation Of Product Reports — Quick Test

Test your knowledge with 6 questions. Pass with 70% or higher.

6 questions70% to pass

Have questions about Automation Of Product Reports?

AI Assistant

Ask questions about this tool