Why this matters
Self-serve BI rises or falls on the quality of your semantic model. A clean and consistent business layer lets non-technical users answer questions without writing SQL or breaking logic. As a BI Developer, your real tasks include:
- Defining standard metrics (e.g., Revenue, AOV, Retention) once and making them reusable everywhere.
- Modeling dimensions and hierarchies so slicing and drilling feels natural and safe.
- Controlling data relationships and grain to avoid double counting.
- Building in guardrails: data types, formats, default filters, and row-level security when needed.
Concept explained simply
A semantic model is a business-friendly layer on top of your warehouse. It exposes dimensions, measures, and relationships with names people understand. Think of it as the “menu” of facts and slices users can confidently combine.
Mental model
Picture a grocery store:
- Facts = items sold (transactions at a specific grain)
- Dimensions = aisles and labels (date, customer, product, region)
- Measures = totals and ratios at the checkout (Revenue, Orders, Conversion Rate)
Your job: decide the right level of detail (grain), connect aisles to items (relationships), and predefine the totals/ratios so anyone can shop for insights without getting lost.
Checklist: What a good semantic model includes
- Clear business names and descriptions for all fields
- One-hinge grain per fact (no mixed granularity)
- Conformed dimensions shared across facts (e.g., Date, Customer)
- Curated measures with consistent logic and formatting
- Well-defined relationships (cardinality, direction) and role-playing dates if needed
- Time intelligence patterns (e.g., MoM, YoY) ready to use
- Security rules (row-level) where required
- Performance aids (aggregations, summaries) for large data
Design steps (fast path)
- Collect questions: Write down top 10 questions users ask. Group by subject area.
- Choose fact grain: Decide the most atomic, stable event (e.g., order line, daily campaign).
- Select conformed dimensions: Date, Customer, Product, Geography, Channel.
- Define measures: Sum/Count first, then ratios (AOV, Conversion Rate) with clear denominators.
- Model relationships: Star schema preferred. Avoid many-to-many unless justified and controlled.
- Name and document: Business names, tooltips/descriptions, number formats, units.
- Add guardrails: Default filters, hidden helper columns, role-playing dates, RLS if needed.
- Test with real user slices: Try typical filter combos and edge cases (empty slices, late-arriving data).
Worked examples
Example 1: Ecommerce Sales
Goal: Self-serve questions like “Revenue by week by channel” and “AOV by country.”
- Fact: fact_sales at order_line grain (order_id, line_id, product_id, customer_id, quantity, unit_price, discounts, order_timestamp)
- Dimensions: dim_date, dim_customer, dim_product, dim_geography, dim_channel
- Measures:
- Total Sales = SUM(quantity * unit_price) - SUM(discounts)
- Orders = DISTINCTCOUNT(order_id)
- AOV = [Total Sales] / [Orders]
- Units = SUM(quantity)
- Relationships: One-to-many from each dimension to fact. Role-playing dates: order_date, ship_date
- Guardrails: Format currency with symbol; hide unit_price raw if exposing Total Sales; default filter to last 12 months
Why this works
Order-line grain avoids double counting when slicing by product. AOV depends on distinct orders, not lines, so it stays consistent across slices. Role-playing dates support both order and fulfillment analyses.
Example 2: Marketing Performance
Goal: “What’s CAC by channel?” and “Which campaigns drive highest CTR over time?”
- Fact: fact_campaign_performance at daily-campaign grain (date, campaign_id, channel, impressions, clicks, cost, signups)
- Dimensions: dim_date, dim_campaign, dim_channel
- Measures:
- CTR = DIVIDE(SUM(clicks), SUM(impressions))
- CPC = DIVIDE(SUM(cost), SUM(clicks))
- CAC = DIVIDE(SUM(cost), SUM(signups))
- Guardrails: Use DIVIDE-like safe division to avoid divide-by-zero; format CTR as percentage with 2 decimals; ensure channel is a dimension, not free text
Why this works
Daily-campaign grain allows trending and avoids mixing daily and weekly data. Ratios are consistent since both numerator and denominator aggregate at the same grain.
Example 3: Product Usage (Events)
Goal: “DAU/WAU/MAU by plan” and “Feature adoption by cohort.”
- Fact: fact_events at event grain (event_id, user_id, event_name, event_timestamp, product_id)
- Dimensions: dim_date, dim_user (with plan), dim_product, dim_feature
- Measures:
- DAU = DISTINCTCOUNT(user_id filtered to last 1 day)
- WAU = DISTINCTCOUNT(user_id in last 7 days)
- MAU = DISTINCTCOUNT(user_id in last 30 days)
- Feature Adoption = DISTINCTCOUNT(users who triggered feature) / DISTINCTCOUNT(active users)
- Guardrails: Use a calendar table to define rolling windows; ensure event timestamps map to date keys; hide raw event_id from end users
Key patterns and guardrails
- Star schema: One fact, many dimensions. Prefer this for self-serve.
- Conformed dimensions: Reuse Date/Customer/Product across facts to enable cross-subject analysis.
- Single source of truth for measures: Define once, reference everywhere; avoid calculated columns for aggregations if your tool supports measures.
- Time intelligence: Predefine YoY, MoM, rolling windows with clear labels and consistent filters.
- Role-playing dimensions: Multiple date roles (order, ship, signup) from the same Date table.
- Security: Apply row-level security on dimensions (e.g., region) and test with role personas.
- Performance: Summaries/aggregations for large facts, hide high-cardinality columns from visuals unless needed, enable incremental refresh where available.
Who this is for
BI Developers, Analytics Engineers, and Data Analysts who need to make reliable, reusable metrics and user-friendly data models for business stakeholders.
Prerequisites
- Comfort with star schemas (facts/dimensions) and data types
- Basic SQL
- Familiarity with at least one BI tool’s modeling concepts (measures, relationships)
Learning path
- Define business questions and metrics catalog.
- Design star schemas with clear grain and conformed dimensions.
- Create measures and time intelligence logic.
- Add relationships, role-playing dimensions, and formats.
- Implement security and performance features.
- User test with typical slices; iterate based on feedback.
Exercises
Do these in your preferred BI tool or on paper. Aim for clarity and consistency.
Exercise 1: Model a Subscription Business
Design a semantic model to answer: “MRR by plan by month,” “Churn rate by cohort,” and “ARPA by region.”
- Identify facts, dimensions, and grain.
- List measures and their exact formulas.
- Describe relationships and any role-playing dates.
When done, compare with the solution in the Exercises panel below.
Exercise 2: Fix a Risky Model
You inherit a model with a single giant table combining orders, customers, and campaigns. Users report double-counted revenue when slicing by product and campaign.
- Rewrite the model as facts/dimensions.
- Propose at least five concrete improvements that prevent double counting and improve usability.
Open: Solutions and hints for Exercises 1–2
Exercise 1 solution outline
- Facts: fact_subscriptions (subscription_id, customer_id, plan_id, start_date, end_date, status), fact_invoices (invoice_id, customer_id, amount, invoice_date), fact_mrr_snapshots at month-end grain
- Dimensions: dim_date, dim_customer (region), dim_plan (price, tier)
- Measures:
- MRR = SUM(mrr_amount in fact_mrr_snapshots)
- Churned Customers = COUNT of customers with status = churned in month
- Churn Rate = Churned Customers / Customers at start of month
- ARPA = SUM(invoice amount) / DISTINCTCOUNT(active paying accounts)
- Relationships: One-to-many from dimensions to facts; role-playing dates for start_date and end_date from dim_date
- Guardrails: Use snapshot fact for MRR to avoid re-computing from events; format currency; use safe division
Exercise 2 solution outline
- Split into fact_sales (order_line grain), fact_campaign_performance (daily), and dimensions: dim_customer, dim_product, dim_channel, dim_date
- Measures defined on facts (Total Sales, Orders, AOV)
- Conformed dim_date connects both facts for aligned time analysis
- Improvements:
- Eliminate many-to-many by using proper fact-dim joins
- Protect totals with distinct counts and line-grain sales
- Make channel a dimension with controlled values
- Add role-playing dates (order vs ship)
- Hide raw columns that confuse users; expose curated measures
- Set default filter to relevant time window
Self-check checklist
- Every fact has a single, documented grain.
- Dimensions are conformed and reused; no duplicated Date tables without purpose.
- Measures have clear denominators and safe division.
- Names are business-friendly; technical helpers are hidden.
- Time intelligence is correct when slicing by any dimension.
- No double counting when combining facts; if needed, use separate visuals or bridge tables intentionally.
Common mistakes and how to self-check
- Mixed granularity in a single fact: Fix by choosing the most atomic stable grain; move aggregates to separate summary tables.
- Ratios on calculated columns: Prefer measures so filters recalculate correctly.
- Ambiguous relationships: Avoid bi-directional or many-to-many unless you fully understand the impact; document and test edge cases.
- Inconsistent definitions: Maintain a metric catalog; one definition per metric.
- No role-playing dates: Use distinct relationships for order/ship/signup dates via the same Date table.
- Exposing everything: Hide helper columns; keep the user-facing set minimal and labeled.
Practical projects
- Sales Self-Serve Pack: Build a star schema and a semantic model with 8–10 measures (Revenue, Orders, AOV, YoY Revenue, Units). Include role-playing order/ship dates and a 12-month default filter.
- Marketing Efficiency Model: Daily campaign fact with CTR, CPC, CAC. Add plan-level RLS and verify with two persona test accounts (Manager vs Analyst).
- Product Adoption Dashboard Model: Event-grain fact, DAU/WAU/MAU measures, feature adoption ratio. Add formatted measures and hide raw IDs.
Mini challenge
In one paragraph, define a new measure “Contribution Margin” for ecommerce. Include formula components, where they live (fact/dimension), and formatting. Then list 3 slices a business user should try first to validate it.
Quick test
Short quiz to check your understanding. Note: The quick test is available to everyone; if you log in, your progress will be saved.
Next steps
- Refine your metric catalog and add time intelligence variants.
- Introduce row-level security for one dimension and test with sample roles.
- Create a short user guide describing available measures and recommended slicing paths.