Who this is for
- Aspiring Analytics Engineers who turn raw data into reliable BI models.
- BI Developers and Data Analysts moving from ad‑hoc SQL to dimensional modeling.
- Data Engineers who need clear patterns for analytics-friendly schemas.
Prerequisites
- Core SQL (joins, group by, window functions).
- Understanding of star schemas: facts vs. dimensions.
- Basics of primary keys, foreign keys, and data types.
Why this matters
Fact tables are the backbone of analytics. They store measurable events (orders, payments, sessions) at a clearly defined grain. Well‑designed facts make BI fast, intuitive, and consistent across teams.
Real tasks you will handle:
- Choosing the correct fact type (transaction, periodic snapshot, accumulating snapshot, factless).
- Defining the grain (“one row per …”) so metrics don’t double‑count.
- Selecting additive/semi‑additive/non‑additive measures and handling nulls.
- Linking to conformed dimensions with surrogate keys and role‑playing dates.
- Designing for late‑arriving facts and change history (SCD in dims).
Concept explained simply
A fact table captures measurements about an event at a chosen grain and links to descriptive dimensions. If dimensions answer “who/what/where/when,” the fact answers “how much/how many.”
Mental model
Imagine a ledger: each line is one event at exactly one level of detail. Every line carries numbers you can sum or analyze and has pointers (foreign keys) to look up rich context in dimensions.
Common fact types
- Transaction fact: one row per event (order, payment, click). Fully additive across all dimensions.
- Periodic snapshot: one row per entity per period (daily inventory, monthly MRR). Often semi‑additive.
- Accruing/accumulating snapshot: one row per process tracked across milestones (funnel, loan, onboarding). Dates update as the process advances.
- Factless fact: events without natural measures (login, sign‑up) or coverage facts (which products were on promotion). Useful for counts and distincts.
Measure additivity
- Additive: sums across all dimensions (amount, units).
- Semi‑additive: sums across some dimensions but not time (inventory level, balance).
- Non‑additive: can’t be summed (rates, ratios). Aggregate at query time.
Design steps (reliable pattern)
- Clarify the business question. What decisions will this power?
- Pick the grain: finish the sentence “one row per …”. If you can’t, you’re not ready to design.
- Choose fact type: transaction, periodic snapshot, accumulating, or factless.
- List measures: name, definition, data type, null/zero rules, rounding/precision.
- Identify dimensions: conformed, role‑playing dates (order_date, ship_date), degenerate dimension keys (e.g., order_id).
- Define keys: surrogate keys for dimensions; foreign keys in the fact. Add audit columns (load_date, source).
- Handle tricky cases: late‑arriving facts, updates, de‑dupe, cancelled events.
- Plan quality tests: not null, uniqueness of grain, foreign key integrity, accepted values.
- Consider performance: partitioning/clustering by date, numeric types, avoiding wide strings.
Worked examples
1) Orders transaction fact
- Grain: one row per submitted order.
- Type: Transaction fact.
- Measures: order_amount (decimal), items_count (integer), discount_amount (decimal), tax_amount (decimal).
- Dimensions (FKs): customer, store, product? (No — product varies by line. Put products in a separate order_line fact).
- Role‑playing dates: order_date_key, ship_date_key (nullable until shipped).
- Degenerate dimension: order_number in the fact (no separate dim needed).
Why not include product here?
At order header grain, product is multi‑valued. To analyze products, use a different fact: one row per order line.
2) Daily inventory snapshot
- Grain: one row per product per warehouse per day.
- Type: Periodic snapshot.
- Measures: on_hand_qty (semi‑additive), reserved_qty.
- Dimensions: product, warehouse, date.
- Note: Don’t sum on_hand across days. Use last_value or AVG for trends.
3) Sales funnel accumulating snapshot
- Grain: one row per lead opportunity.
- Type: Accumulating snapshot (milestone dates updated as lead progresses).
- Measures: cycle_days (derived), deal_amount (nullable until won).
- Dimensions: customer, sales_rep, source_campaign, calendar dates for each milestone.
- Milestones: created_date_key, qualified_date_key, proposal_date_key, closed_won_date_key, closed_lost_date_key.
Implementation notes
- Use integer surrogate keys for dimensions; store them in the fact. Keep string codes only if truly needed for reporting.
- Maintain precision: monetary amounts as decimal with sufficient scale; avoid float for financials.
- Missing dimensions: point to an explicit “Unknown” or “Not applicable” dimension row instead of null FKs.
- Late‑arriving facts: stage them; load with available FKs; backfill once dimensions arrive.
- Cancellations/returns: represent as separate facts or negative measures with a status flag.
- Testing: enforce one row per grain via uniqueness on business keys + date (or surrogate), and validate FK coverage.
Exercises
Do these on paper or in a notes doc. Then compare with the solutions below. Tip: write the grain as a single clear sentence.
Exercise 1 — Food delivery orders
Design a transaction fact for a food delivery app. You have orders with multiple items, a restaurant, a courier, and customer. Some orders are canceled before pickup.
- Define the grain.
- Choose the fact type.
- List measures and data types.
- List dimension FKs, including role‑playing dates.
- Describe how to represent cancellations.
Exercise 2 — Loan application pipeline
Design an accumulating snapshot for loan applications moving through stages: submitted → verified → approved → funded or rejected.
- Define the grain.
- List milestone date keys and any measures.
- Identify dimensions.
- Explain how to handle applications that never reach a milestone.
Check your work (solutions)
- Grain: one row per submitted order (order header). Separate order_line fact for items.
- Type: Transaction fact.
- Measures: order_amount DECIMAL(12,2), delivery_fee DECIMAL, tip_amount DECIMAL, discount_amount DECIMAL, items_count INT, tax_amount DECIMAL.
- Dimensions: customer_key, restaurant_key, courier_key (nullable until assigned), order_date_key, pickup_date_key (nullable), delivery_date_key (nullable), status_key.
- Cancellation: keep the row with status = canceled and zero amounts beyond refund; optionally negative adjustment fact if refunds are separate.
- Grain: one row per loan application.
- Milestone date keys: submitted_date_key, verified_date_key, approved_date_key, funded_date_key, rejected_date_key.
- Measures: requested_amount DECIMAL, approved_amount DECIMAL (nullable), processing_days INT (derived).
- Dimensions: applicant_key, product_key, branch_key, channel_key.
- Missing milestones: leave the date keys null; the row remains open until an end state (funded/rejected).
Self‑check checklist
- I wrote a clear grain sentence (“one row per …”).
- My chosen fact type matches the business process.
- Measures have data types and additivity clarified.
- Dimensions are conformed and linked via surrogate keys.
- I accounted for late data, nulls, and cancellations.
- I know the minimal tests: uniqueness at grain, FK integrity, not‑null for required fields.
Common mistakes and how to self‑check
Mixing grains
Symptom: double counting revenue. Fix: split order header and order line facts; never include multi‑valued dimensions at header grain.
Missing date roles
Symptom: using one date for multiple meanings. Fix: create role‑playing date keys (order_date_key, ship_date_key, etc.).
Using natural keys as FKs
Symptom: broken joins after source changes. Fix: use surrogate keys in dimensions; retain natural keys as attributes.
Non‑additive metrics in the fact
Symptom: wrong sums on rates. Fix: store numerator/denominator or compute rates at query time.
Null foreign keys
Symptom: orphaned facts. Fix: use explicit Unknown/Not applicable dimension rows to maintain referential integrity.
Mini challenge
Design a fact for marketing emails: sends, opens, clicks. Some users open multiple times.
- Pick the fact type and grain.
- List measures and dimensions.
- How will you avoid double‑counting unique opens?
Suggested approach
Use a transaction fact at event grain: one row per email event (send/open/click). Measures are mostly counts (1 per row). Dimensions: user, campaign, message, date_time (role‑playing or timestamp). For unique opens, aggregate distinct user per message, or maintain a separate periodic snapshot of daily unique opens per message if needed for performance.
Learning path
- Review dimensional modeling basics (facts, dimensions, conformed dims).
- Practice writing clear grain statements.
- Design transaction facts, then periodic snapshots, then accumulating snapshots.
- Add role‑playing date dimensions and degenerate dimensions.
- Implement data quality tests and handle late‑arriving data.
- Optimize for BI performance (partitioning, clustering, summary marts).
Practical projects
- E‑commerce: order header fact, order line fact, payments fact; conformed customer and date dims.
- SaaS: monthly MRR periodic snapshot and a customer lifecycle accumulating snapshot.
- Product analytics: event fact for app interactions; build daily active users metrics.
Next steps
- Revisit your current SQL dashboards; identify where a fact table could simplify logic.
- Create a small star schema and add not‑null, unique, and foreign key tests.
- Take the quick test below. Note: anyone can take the test; only logged‑in learners have their progress saved.