How to learn Defining Grain And Facts for Dimensional Modeling For Analytics in Data Architect for free

Why this matters

As a Data Architect, you set the blueprint for analytics. Declaring the grain and choosing the right facts prevents double counting, unstable metrics, and slow queries. You will use this when:

Designing star schemas for sales, subscriptions, or operations.
Standardizing core metrics (revenue, churn, conversion).
Planning fact table types (transaction, snapshot, accumulating).
Reviewing PRDs with product/finance to ensure a single source of truth.

Concept explained simply

Grain answers: “Exactly what does one row in this fact table represent?” Facts are the numeric measures recorded at that grain.

Grain examples: one row per order line, one row per day per product, one row per customer lifecycle.
Fact types:
- Additive: sums across all dimensions (e.g., quantity, cost).
- Semi-additive: sums across some dimensions but not time (e.g., balance, inventory level).
- Non-additive: cannot sum (e.g., ratios like conversion rate). Compute at query time.
Common fact table patterns:
- Transaction fact: one row per event (e.g., order line).
- Periodic snapshot: one row per time interval per entity (e.g., daily inventory).
- Accruing/accumulating snapshot: one row per lifecycle, updated as milestones occur (e.g., order fulfillment).
- Factless fact: records coverage or events without numeric facts (e.g., student attendance).
Degenerate dimension: an identifier (like order_number) stored in the fact when there’s no natural dimension table.

Mental model

Think of your model like a camera and a measurement kit:

The camera’s zoom level is the grain. Set it once and keep it fixed.
The measurement kit are your facts. Only measure what makes sense at that zoom.
Dimensions are labels on the photo (who, what, when, where, how).

Worked examples

Example 1: E-commerce sales (transaction fact)

Business questions: What is daily revenue? Which products and channels drive sales?
Grain: One row per order line item (order_id + product_id).
Dimensions: date_key, customer_key, product_key, store_key/channel_key, promo_key; degenerate: order_number.
Facts: unit_price, quantity, extended_amount (unit_price * quantity), discount_amount, tax_amount, shipping_amount.
Additivity: quantity, extended_amount are additive; discount_amount additive; ratios like discount_rate derive in queries.

Example 2: Subscription lifecycle (accumulating snapshot)

Business questions: How long to activate? What is churn risk by cohort?
Grain: One row per subscription lifecycle (subscription_id).
Dimensions: customer_key, plan_key, salesperson_key, cohort_month_key.
Milestone dates: signup_date, trial_start_date, trial_end_date, paid_start_date, cancel_date.
Facts: initial_mrr, current_mrr, trials_to_paid_days, tenure_days (updated as milestones occur).
Additivity: mrr is additive by customer/plan but not across time snapshots; tenure_days derived.

Example 3: Inventory (periodic snapshot)

Business questions: What was end-of-day stock? Days of supply by warehouse?
Grain: One row per day per product per warehouse.
Dimensions: date_key, product_key, warehouse_key, supplier_key.
Facts: on_hand_qty (semi-additive), allocated_qty (semi-additive), on_order_qty (semi-additive), safety_stock (semi-additive).
Additivity: Sum across products/warehouses; do not sum across time. Use time-aware logic for period averages.

Step-by-step method

Choose the business process (e.g., order fulfillment, billing, inventory).
Declare the grain in one sentence. Test with sample rows.
List all relevant dimensions at that grain (who/what/when/where/how).
List facts that are naturally measured at that grain. Mark additive/semi/non-additive.
Pick the fact table type (transaction, periodic snapshot, accumulating, factless).
Validate with queries: can you answer key metrics without awkward workarounds?
Check conformance: do shared dimensions align with other stars?

Tip: Quick grain validation

Can you say “one row equals …” without using “sometimes” or “except”?
Can you list 5 example rows immediately?
Do all facts make sense for each row without nulls or duplicates?

Common mistakes and self-check

Mixing grains in one fact table (e.g., some rows per order, others per order line).
- Self-check: Group by keys—does count explode inconsistently?
Storing ratios as facts (e.g., conversion_rate). Store components; calculate ratio in queries.
- Self-check: Does summing this fact produce nonsense?
Using daily snapshots when you need events (or vice versa).
- Self-check: Do you need to explain changes or just report state?
Forgetting degenerate dimensions (order_number) and then losing drill-through.
- Self-check: Can an analyst tie back to the operational record?
Not labeling semi-additive facts, leading to incorrect time aggregation.
- Self-check: Does summing across days overstate balances?

Exercises

Do these before the quick test. Aim for concise, explicit grain statements.

Exercise 1 — Declare grain for a Ride-Hailing Trips fact

Scenario: A ride-hailing app tracks requested rides, driver assignments, completed trips, fares, tips, and promotions. Analysts ask for completion rate, average fare, and revenue by city and hour.

Write a one-sentence grain statement.
List dimensions at that grain.
List facts and mark additive/semi/non-additive.

Show solution

Grain: One row per completed trip (trip_id).

Dimensions: date_key (trip_end_time), hour_key, rider_key, driver_key, city_key, product_tier_key, payment_method_key, promo_key; degenerate: trip_number.

Facts: base_fare (additive), distance_km (additive), duration_minutes (additive), surge_multiplier (store inputs if needed; average in queries), tip_amount (additive), promo_discount_amount (additive), total_collected (additive). Completion rate is derived from counts of requested vs completed events, not stored.

Exercise 2 — Design a Monthly Subscription Snapshot

Scenario: A SaaS business invoices monthly. Finance wants MRR, churned MRR, and expansion MRR by month and plan.

Choose fact table type and declare the grain.
List dimensions at that grain.
List facts with additivity notes.

Show solution

Type & Grain: Periodic snapshot; one row per month per subscription.

Dimensions: month_key, customer_key, subscription_key, plan_key, region_key, salesperson_key.

Facts: starting_mrr (semi-additive across time), new_mrr (additive), expansion_mrr (additive), contraction_mrr (additive), churned_mrr (additive), ending_mrr (semi-additive). Do not sum starting/ending across months; use last non-null or time-aware logic.

[Checklist] Is your grain one specific unit with no exceptions?
[Checklist] Do all facts make sense at that grain?
[Checklist] Did you avoid storing ratios?
[Checklist] Are semi-additive facts clearly identified?
[Checklist] Did you include necessary degenerate dimensions?

Mini challenge

Pick a process you know (support tickets or warehouse shipments). Draft two fact tables:

Transaction fact: declare grain and 5 facts.
Periodic snapshot: declare grain and 5 facts. Mark semi-additive ones.

Hint

For support: transaction = one row per ticket event; snapshot = one row per day per queue with open/closed counts.

Who this is for

Data Architects defining analytic data models.
Analytics Engineers standardizing metrics.
Data Engineers implementing star schemas.
Analysts validating metric correctness.

Prerequisites

Basic SQL (GROUP BY, JOINs, aggregates).
Understanding of dimensions vs facts.
Familiarity with the business process you are modeling.

Learning path

Define the business process and key decisions it supports.
Declare grain before listing columns.
Choose fact table type; list dimensions.
Enumerate facts and classify additivity.
Prototype with sample rows and test queries.
Review with stakeholders; adjust as needed.

Practical projects

E-commerce mini warehouse: transaction sales fact + monthly customer snapshot.
Operations: shipment transaction fact + daily warehouse capacity snapshot.
Product analytics: event fact + weekly retention snapshot; compute derived KPIs.

Next steps

Model one real process end-to-end (grain first, then facts/dims).
Create test queries for each KPI to validate additivity and grain.
Document grain statements alongside each fact table.

Quick test

Note: The quick test is available to everyone; only logged-in users will have their progress saved.

Menu

Defining Grain And Facts

Table of Contents