Why this matters
Star schema governance keeps your analytics consistent, trusted, and efficient across teams. As a Data Architect, you decide how facts and dimensions are defined, named, versioned, and evolved so analysts don’t argue over numbers and engineers don’t rebuild the wheel.
- Ensure consistent KPIs across dashboards (e.g., “Net Sales” means the same in Sales and Finance).
- Prevent schema drift and breaking BI models when source systems change.
- Speed up onboarding and delivery with clear standards and reusable conformed dimensions.
- Enable reliable joins and performance in warehouses and lakehouses.
Concept explained simply
Star schema governance is a set of decisions and standards that define how you build and maintain fact and dimension tables so they are consistent and reusable across the organization.
Mental model
Think of it as building codes for your analytics city. Architects (you) define the rules for materials (keys, data types), layouts (grain, relationships), signage (column names), and inspections (data quality checks). Builders (engineers) follow them; residents (analysts) benefit from safety and predictability.
Governance touchpoints across lifecycle
- Intake: Approve new star proposals (grain, business process, consumers).
- Design: Conform dimensions, name measures, set SCD types, keys, and null policies.
- Build: Enforce constraints, data tests, and lineage documentation.
- Release: Version changes, communicate deprecations, update semantic layer.
- Operate: Monitor data quality SLAs, handle incident reviews, evolve standards.
Core governance decisions
- Grain: The exact level of detail of each fact row (e.g., one line item per order). No mixed grains.
- Conformed dimensions: Shared dimensions (Date, Product, Customer, Geography) with stable surrogate keys used across facts.
- Keys and constraints: Surrogate keys for dimensions; natural keys stored as attributes; referential integrity enforced where feasible.
- SCD policy: Dimension change handling (Type 1, 2, or hybrid) and effective/expiry date rules.
- Naming conventions: Verb-noun for measures (e.g., sales_amount), singular dimension names, consistent role-playing names (order_date_key, ship_date_key).
- Null and default rules: When to allow nulls, use defaults (e.g., Unknown, Not Applicable), and special members (e.g., -1 Unknown).
- Measure definitions: Clear formulas, units, currency, and aggregation types (sum, avg, distinct count).
- Role-playing and junk dimensions: Approach for multiple dates, flags, and low-cardinality attributes.
- Data quality: Required checks (freshness, completeness, uniqueness, referential integrity, reasonability).
- Versioning and change management: Semantic versioning for breaking changes; deprecation windows and communication.
Worked examples
Example 1 — Sales Orders star
- Business process: Order line items.
- Grain: One row per order_line_id.
- Dimensions: Date (order_date, ship_date via role-playing), Customer (SCD2), Product (SCD2), Salesperson (SCD2), Store (Type 1), Promotion (Type 2), Payment Method (junk or small dimension).
- Facts/measures: sales_amount (currency, sum), quantity (integer, sum), discount_amount (currency, sum), return_flag (boolean via junk dimension or degenerate column for quick filter).
- Keys: Surrogate keys on all dimensions; degenerate order_id stored on fact for drill-back.
- Null/defaults: Unknown (-1) for all dimension FKs when late-arriving; 0 for amounts only when true zero exists; else null.
- SCD policy: Customer, Product, Salesperson as Type 2 with effective/expiry timestamps.
- Naming: fact_sales_order_line; dim_product; order_date_key, ship_date_key.
Example 2 — Returns star aligned to Sales
- Business process: Product returns at line level.
- Grain: One row per return_line_id.
- Conformance: Reuse dim_product, dim_customer, dim_date. Ensure same surrogate keys as Sales.
- Measure conventions: return_amount as positive value; net_sales derived in semantic layer = sales_amount - return_amount.
- Integrity: Foreign keys must match conformed dimension SKs; unknown/late returns map to -1 Unknown.
- Policy note: Never store negative sales; store positive returns for clarity and aggregation safety.
Example 3 — App Events star
- Business process: Application events (e.g., button_click).
- Grain: One row per event_id.
- Dimensions: Date (event_date), Time (optional), User (SCD2 for plan tier changes), Device, App Version (Type 1 or mini-dimension), Geography.
- Measures: event_count = 1 per row (sum), session_duration_seconds (on session fact if needed), revenue_amount only for monetization events.
- Junk dimension: Flags (is_test_user, is_bot_suspected).
- Keys: Degenerate session_id on fact for drill-back; surrogate user_key and product_key for joins.
Policy starter templates
Star approval template
- Business process and consumers:
- Fact grain (exactly one):
- Dimensions (conformed reuse flagged):
- SCD types per dimension + rationale:
- Measures with units, aggregation, formula owner:
- Null/default rules:
- Data quality checks (freshness, RI, duplicates, thresholds):
- Performance notes (partition, clustering):
- Versioning and rollout plan:
Naming quick rules
- Tables: fact_
_ , dim_ . - Columns:
_key for SKs, natural keys suffixed _nk. - Dates:
_date_key (int yyyymmdd) or timestamp; be consistent. - Measures: noun_or_verb_noun in lowercase snake_case.
Exercises
Do these two exercises. Solutions are hidden under each prompt.
Exercise 1 — Approve a Sales Orders star
Draft a one-page approval note covering: grain, dimensions and their SCD types, keys, measures and units, null/default policies, and role-playing dates. See the Exercises section below for full prompt and solution.
Exercise 2 — Conform Sales and Returns
Propose a conformance plan to align Product and measure names across Sales and Returns. See the Exercises section below for full prompt and solution.
Self-check checklist
- Is the fact grain unambiguous and single-grain?
- Are all dimensions either reused conformed ones or clearly justified as new?
- Are SCD policies explicit and testable?
- Do measures have clear units and aggregation rules?
- Are null/defaults defined for each foreign key and measure?
- Is versioning/rollout documented?
Common mistakes and how to self-check
- Mixed grain in a single fact table. Self-check: Can you describe each row in one sentence? If not, split into multiple facts.
- Natural keys as foreign keys. Self-check: If the source rekeys, will joins break? Use surrogate keys.
- Undefined SCD behavior. Self-check: For a changing attribute, can you simulate yesterday vs today’s row? If not, define SCD type.
- Ambiguous measures. Self-check: Can two teams get different numbers from the same table? If yes, standardize definitions.
- Overusing nulls. Self-check: Are unknowns represented consistently with special members (e.g., -1)?
- Silent breaking changes. Self-check: Is there a version bump and deprecation notice for changes that alter meaning?
Practical projects
- Conform a Date and Customer dimension and apply them to two existing facts.
- Refactor a mixed-grain fact into two facts (e.g., orders and daily inventory snapshot) with a shared Product dimension.
- Write a data quality test suite for referential integrity and duplicate detection on one star schema.
- Create a semantic layer spec mapping measures and role-playing dates with business definitions.
Who this is for
- Data Architects who define analytical data models.
- Analytics Engineers standardizing facts/dimensions.
- BI Developers maintaining a semantic layer.
Prerequisites
- Foundational dimensional modeling (facts, dimensions, grain).
- Basic SQL and warehouse concepts (surrogate keys, constraints).
- Understanding of SCD Types 1 and 2.
Learning path
- Refresh dimensional modeling basics.
- Apply governance decisions (this lesson) to one new star.
- Conform dimensions across two existing facts.
- Add data quality checks and versioning workflow.
- Publish documentation and semantic layer mappings.
Next steps
- Adopt the approval template for new stars.
- Schedule a monthly governance review for proposed changes.
- Backfill conformed dimensions where feasible.
Mini challenge
Your team proposes a “Payments” fact that mixes one row per transaction and daily rollups. In two sentences, state the governance decision and next action.
Possible answer
Reject the mixed-grain design and split into two facts: fact_payment_transaction (one row per transaction) and fact_payment_daily (one row per account per day). Reuse conformed Date, Customer, and Payment Method dimensions.
Quick Test
Take the quick test to check your understanding. Everyone can take it for free. Note: only logged-in users get saved progress.