Why this matters
BI runs on facts. Every dashboard KPI comes from events in your business: orders placed, tickets resolved, invoices issued, ad clicks recorded. Mapping those events to the right fact tables is how you get accurate, fast analytics.
- Design a star schema so product, customer, and time dimensions slice the same metrics.
- Decide the grain: per order line, per day, per month, or per ticket lifecycle.
- Choose the right fact type: transaction, periodic snapshot, or accumulating snapshot.
- Prevent double counting and missing data before it hits reports.
Concept explained simply
Mapping events to facts means turning a real-world business event into a row (or rows) in a fact table with a clear level of detail (grain), foreign keys to dimensions, and numeric measures you can sum or average.
Mental model
Think of a camera taking snapshots of business activities. Each snapshot has a fixed resolution (grain). You attach labels (dimensions) like who, what, when, where, and record the numbers (measures). Different lenses create different kinds of snapshots:
- Transaction fact: each event as it happens (e.g., order line).
- Periodic snapshot fact: regular intervals (e.g., daily or monthly balances).
- Accumulating snapshot fact: one row that updates as stages complete (e.g., order fulfillment lifecycle).
Key terms at a glance
- Grain: the precise level of detail of one fact row.
- Measure: a numeric value you aggregate (e.g., quantity, amount).
- Dimension: descriptive context (e.g., date, customer, product).
- Degenerate dimension (DD): an identifier (e.g., order number) stored in the fact without a separate dimension table.
- Conformed dimension: the same dimension used across facts so metrics align.
Step-by-step: Map an event to a fact
- Name the business process (e.g., ecommerce ordering, subscription billing, support resolution).
- Choose the grain: the one-sentence rule for one row (e.g., "one row per order line at time of purchase").
- Pick the fact type: transaction, periodic snapshot, or accumulating snapshot.
- List dimensions: Date/Time, Customer, Product, Channel, Store, etc. Note any role-playing (Order Date vs Ship Date).
- Define measures: additive metrics (quantity, revenue), semi-additive (inventory), and non-additive ratios (store as numerators/denominators).
- Add identifiers: degenerate dimensions like order_id, ticket_id.
- Handle edge cases: multiple coupons (bridge), late-arriving dimensions (unknown member then update), corrections/voids (separate rows with negative amounts or status flags).
Mini checklist: Is your grain unambiguous?
- Exact statement of “one row = ?”.
- No attribute violates the grain (e.g., order-level attribute in order-line grain is allowed via conformed dimension or roll-up).
- All measures make sense at this grain.
Worked examples
1) Ecommerce order line (Transaction fact)
- Grain: one row per order line item at purchase time.
- Dimensions: Date, Time, Customer, Product, Store/Channel, Promotion (optional via bridge if multiple).
- Measures: quantity, unit_price, discount_amount, net_sales (quantity * unit_price - discount).
- Degenerate: order_number, line_number.
Why this works
Every metric sums cleanly by any dimension. Order-level fields are either replicated per line or placed in conformed dimensions. Multiple promotions require a bridge to avoid double counting.
2) Subscription MRR (Periodic snapshot)
- Grain: one row per subscription per month-end.
- Dimensions: Month, Customer, Plan, Region.
- Measures: starting_MRR, new_MRR, expansion_MRR, contraction_MRR, churned_MRR, ending_MRR.
- Degenerate: subscription_id.
Why this works
MRR components are additive across customers and regions per month. Month is a role of Date. This model supports cohort and trend analysis.
3) Support ticket lifecycle (Accumulating snapshot)
- Grain: one row per ticket across its lifecycle.
- Dimensions: Created Date, Assigned Date, Resolved Date (role-playing Date), Customer, Agent, Priority.
- Measures: response_time_minutes, resolution_time_minutes, reopen_count.
- Degenerate: ticket_number.
Why this works
Each milestone date is a foreign key to the Date dimension. As the ticket progresses, measures update. Useful for SLA metrics.
Patterns and edge cases
- Multiple promotions/coupons: use a bridge table between fact and promotion dimension; compute net amounts in the fact to avoid double counting.
- Corrections and refunds: record as separate rows with negative amounts or a correction flag to keep an auditable trail.
- Late-arriving dimensions: load fact with an “Unknown” dimension surrogate key, then backfill when the dimension arrives.
- Role-playing dimensions: the Date dimension can appear as order_date_key, ship_date_key, delivery_date_key.
- Semi-additive measures: inventory levels or account balances are additive across dimensions but not across time; use periodic snapshots and time-aware aggregations.
Exercises
Try the scenario below. Then compare with the solution.
Exercise ex1: Ad impressions to a fact
You have an ad platform with raw events: impression, click, conversion. Design a transaction fact for impressions.
- Write a one-sentence grain.
- List 4–6 dimensions.
- List 4–6 measures (include costs and derived rates inputs).
- Call out any degenerate dimensions.
- Mention how you will handle multiple targeting attributes (e.g., interest, placement).
Hints
- Impressions are high-volume and point-in-time.
- Rates like CTR are not stored; store numerators and denominators.
- Targeting can be many-to-many.
Show solution
Grain: one row per ad impression event.
Dimensions: Date, Time, Campaign, Ad, Device, Geo, Publisher/Placement; optional User segment via bridge.
Measures: impressions=1, media_cost, viewability_flag (0/1), billed_impression (0/1), click_count (0/1 for immediate click-through if same event stream joins), conversion_count (0/1 via later join not stored here), revenue=0 (if only spend side). Store numerators/denominators for later rates.
Degenerate: event_id, request_id.
Many-to-many: use a bridge from fact to a Targeting dimension (interest, audience) with weighting or distinct counts handled in reporting.
Self-check checklist
- Grain is unambiguous and event-level.
- Dimensions cover who/what/when/where.
- Measures are additive; rates are not stored directly.
- Many-to-many handled via a bridge or separate factless table.
Common mistakes and how to self-check
- Vague grain: If you can’t say “one row equals …”, you’ll double-count. Fix: write the grain sentence first.
- Storing ratios: CTR, conversion rate stored as numbers cause bias when aggregated. Fix: store counts and compute rates in queries.
- Mixing event types: Impressions and clicks in one table without a clear grain. Fix: separate facts or use a wide event fact with flags and a precise grain.
- Ignoring late dimensions: Dropping rows or nulls. Fix: use an Unknown member key and backfill later.
- Missing degenerate IDs: Can’t trace a row back to source. Fix: include order_id, ticket_id as DDs.
Quick self-audit
- Can two analysts independently design the same row from your grain statement?
- Do all measures aggregate correctly by every dimension?
- Do you have a plan for corrections/voids?
Who this is for
- BI Analysts and Analytics Engineers designing star schemas.
- Data Analysts turning raw events into trusted KPIs.
Prerequisites
- Basic SQL: SELECT, JOIN, GROUP BY.
- Familiarity with dimensions and facts at a high level.
Learning path
- Understand grains and fact types.
- Practice mapping common business processes.
- Handle edge cases (bridges, late-arriving data, corrections).
- Build a small star schema and validate with queries.
Practical projects
Project 1: Order-to-cash mini warehouse
- Design transaction fact for order lines.
- Add accumulating snapshot for fulfillment stages.
- Write queries: daily revenue, average fulfillment time, on-time delivery rate.
Project 2: Subscription health dashboard
- Design monthly MRR snapshot fact.
- Create conformed Customer and Plan dimensions.
- Write queries: net MRR, churn rate, expansion vs contraction.
Project 3: Marketing funnel
- Design separate facts for impressions and clicks; bridge for audience segments.
- Join to conversions to compute funnel metrics.
- Validate additive properties across Date and Campaign.
Quick Test
Ready to check your understanding? Take the Quick Test below. Anyone can take it for free; only logged-in users will have their progress saved.
Next steps
- Extend your model with conformed Date and Customer dimensions across multiple facts.
- Add SCD handling in dimensions and verify surrogate keys in facts.
- Document grain and fact types for each business process you support.
Mini challenge
You receive events for refunds issued after shipment. Describe how you would represent refunds in the sales transaction fact to keep totals accurate and auditable. Write the grain, measures (including signs), and any flags you would add.