Why this matters
As a Data Architect, you set the blueprint for analytics. Declaring the grain and choosing the right facts prevents double counting, unstable metrics, and slow queries. You will use this when:
- Designing star schemas for sales, subscriptions, or operations.
- Standardizing core metrics (revenue, churn, conversion).
- Planning fact table types (transaction, snapshot, accumulating).
- Reviewing PRDs with product/finance to ensure a single source of truth.
Concept explained simply
Grain answers: “Exactly what does one row in this fact table represent?” Facts are the numeric measures recorded at that grain.
- Grain examples: one row per order line, one row per day per product, one row per customer lifecycle.
- Fact types:
- Additive: sums across all dimensions (e.g., quantity, cost).
- Semi-additive: sums across some dimensions but not time (e.g., balance, inventory level).
- Non-additive: cannot sum (e.g., ratios like conversion rate). Compute at query time.
- Common fact table patterns:
- Transaction fact: one row per event (e.g., order line).
- Periodic snapshot: one row per time interval per entity (e.g., daily inventory).
- Accruing/accumulating snapshot: one row per lifecycle, updated as milestones occur (e.g., order fulfillment).
- Factless fact: records coverage or events without numeric facts (e.g., student attendance).
- Degenerate dimension: an identifier (like order_number) stored in the fact when there’s no natural dimension table.
Mental model
Think of your model like a camera and a measurement kit:
- The camera’s zoom level is the grain. Set it once and keep it fixed.
- The measurement kit are your facts. Only measure what makes sense at that zoom.
- Dimensions are labels on the photo (who, what, when, where, how).
Worked examples
Example 1: E-commerce sales (transaction fact)
- Business questions: What is daily revenue? Which products and channels drive sales?
- Grain: One row per order line item (order_id + product_id).
- Dimensions: date_key, customer_key, product_key, store_key/channel_key, promo_key; degenerate: order_number.
- Facts: unit_price, quantity, extended_amount (unit_price * quantity), discount_amount, tax_amount, shipping_amount.
- Additivity: quantity, extended_amount are additive; discount_amount additive; ratios like discount_rate derive in queries.
Example 2: Subscription lifecycle (accumulating snapshot)
- Business questions: How long to activate? What is churn risk by cohort?
- Grain: One row per subscription lifecycle (subscription_id).
- Dimensions: customer_key, plan_key, salesperson_key, cohort_month_key.
- Milestone dates: signup_date, trial_start_date, trial_end_date, paid_start_date, cancel_date.
- Facts: initial_mrr, current_mrr, trials_to_paid_days, tenure_days (updated as milestones occur).
- Additivity: mrr is additive by customer/plan but not across time snapshots; tenure_days derived.
Example 3: Inventory (periodic snapshot)
- Business questions: What was end-of-day stock? Days of supply by warehouse?
- Grain: One row per day per product per warehouse.
- Dimensions: date_key, product_key, warehouse_key, supplier_key.
- Facts: on_hand_qty (semi-additive), allocated_qty (semi-additive), on_order_qty (semi-additive), safety_stock (semi-additive).
- Additivity: Sum across products/warehouses; do not sum across time. Use time-aware logic for period averages.
Step-by-step method
- Choose the business process (e.g., order fulfillment, billing, inventory).
- Declare the grain in one sentence. Test with sample rows.
- List all relevant dimensions at that grain (who/what/when/where/how).
- List facts that are naturally measured at that grain. Mark additive/semi/non-additive.
- Pick the fact table type (transaction, periodic snapshot, accumulating, factless).
- Validate with queries: can you answer key metrics without awkward workarounds?
- Check conformance: do shared dimensions align with other stars?
Tip: Quick grain validation
- Can you say “one row equals …” without using “sometimes” or “except”?
- Can you list 5 example rows immediately?
- Do all facts make sense for each row without nulls or duplicates?
Common mistakes and self-check
- Mixing grains in one fact table (e.g., some rows per order, others per order line).
- Self-check: Group by keys—does count explode inconsistently?
- Storing ratios as facts (e.g., conversion_rate). Store components; calculate ratio in queries.
- Self-check: Does summing this fact produce nonsense?
- Using daily snapshots when you need events (or vice versa).
- Self-check: Do you need to explain changes or just report state?
- Forgetting degenerate dimensions (order_number) and then losing drill-through.
- Self-check: Can an analyst tie back to the operational record?
- Not labeling semi-additive facts, leading to incorrect time aggregation.
- Self-check: Does summing across days overstate balances?
Exercises
Do these before the quick test. Aim for concise, explicit grain statements.
Exercise 1 — Declare grain for a Ride-Hailing Trips fact
Scenario: A ride-hailing app tracks requested rides, driver assignments, completed trips, fares, tips, and promotions. Analysts ask for completion rate, average fare, and revenue by city and hour.
- Write a one-sentence grain statement.
- List dimensions at that grain.
- List facts and mark additive/semi/non-additive.
Show solution
Grain: One row per completed trip (trip_id).
Dimensions: date_key (trip_end_time), hour_key, rider_key, driver_key, city_key, product_tier_key, payment_method_key, promo_key; degenerate: trip_number.
Facts: base_fare (additive), distance_km (additive), duration_minutes (additive), surge_multiplier (store inputs if needed; average in queries), tip_amount (additive), promo_discount_amount (additive), total_collected (additive). Completion rate is derived from counts of requested vs completed events, not stored.
Exercise 2 — Design a Monthly Subscription Snapshot
Scenario: A SaaS business invoices monthly. Finance wants MRR, churned MRR, and expansion MRR by month and plan.
- Choose fact table type and declare the grain.
- List dimensions at that grain.
- List facts with additivity notes.
Show solution
Type & Grain: Periodic snapshot; one row per month per subscription.
Dimensions: month_key, customer_key, subscription_key, plan_key, region_key, salesperson_key.
Facts: starting_mrr (semi-additive across time), new_mrr (additive), expansion_mrr (additive), contraction_mrr (additive), churned_mrr (additive), ending_mrr (semi-additive). Do not sum starting/ending across months; use last non-null or time-aware logic.
- [Checklist] Is your grain one specific unit with no exceptions?
- [Checklist] Do all facts make sense at that grain?
- [Checklist] Did you avoid storing ratios?
- [Checklist] Are semi-additive facts clearly identified?
- [Checklist] Did you include necessary degenerate dimensions?
Mini challenge
Pick a process you know (support tickets or warehouse shipments). Draft two fact tables:
- Transaction fact: declare grain and 5 facts.
- Periodic snapshot: declare grain and 5 facts. Mark semi-additive ones.
Hint
For support: transaction = one row per ticket event; snapshot = one row per day per queue with open/closed counts.
Who this is for
- Data Architects defining analytic data models.
- Analytics Engineers standardizing metrics.
- Data Engineers implementing star schemas.
- Analysts validating metric correctness.
Prerequisites
- Basic SQL (GROUP BY, JOINs, aggregates).
- Understanding of dimensions vs facts.
- Familiarity with the business process you are modeling.
Learning path
- Define the business process and key decisions it supports.
- Declare grain before listing columns.
- Choose fact table type; list dimensions.
- Enumerate facts and classify additivity.
- Prototype with sample rows and test queries.
- Review with stakeholders; adjust as needed.
Practical projects
- E-commerce mini warehouse: transaction sales fact + monthly customer snapshot.
- Operations: shipment transaction fact + daily warehouse capacity snapshot.
- Product analytics: event fact + weekly retention snapshot; compute derived KPIs.
Next steps
- Model one real process end-to-end (grain first, then facts/dims).
- Create test queries for each KPI to validate additivity and grain.
- Document grain statements alongside each fact table.
Quick test
Note: The quick test is available to everyone; only logged-in users will have their progress saved.