luvv to helpDiscover the Best Free Online Tools
Topic 1 of 8

Warehouse Concepts And Architectures

Learn Warehouse Concepts And Architectures for free with explanations, exercises, and a quick test (for Data Engineer).

Published: January 8, 2026 | Updated: January 8, 2026

Why this matters

As a Data Engineer, you will be asked to design data models, pick warehouse architectures, and build pipelines that keep analytics fast, reliable, and cost-effective. You’ll decide how facts and dimensions are shaped, how slowly changing attributes are tracked, and how data flows from raw to analytics-ready layers. Solid warehouse concepts prevent rework, broken dashboards, and unscalable costs.

  • Real tasks: choose star vs snowflake for a new product analytics mart
  • Real tasks: plan staging, core, and presentation layers for a cloud warehouse
  • Real tasks: implement slowly changing dimensions (SCD) and conformed dimensions
  • Real tasks: balance cost, performance, and freshness SLAs

Who this is for

  • Aspiring/early-career Data Engineers
  • Analysts transitioning to data modeling
  • Software engineers supporting analytics platforms

Prerequisites

  • Basic SQL (SELECT, JOIN, GROUP BY)
  • Understanding of files and tables (CSV/Parquet, row vs column stores)
  • High-level ETL/ELT familiarity

Learning path

  1. Understand OLTP vs OLAP and warehouse layers
  2. Learn star/snowflake, grain, facts, dimensions, SCD
  3. Compare Kimball, Inmon, Data Vault, and lakehouse/medallion approaches
  4. Model a small domain and choose an architecture
  5. Optimize for partitioning, clustering, and cost/freshness SLAs

Concept explained simply

A data warehouse is a curated place where business events (facts) and reference entities (dimensions) are modeled so analysts can ask questions quickly and consistently. You shape raw data into clear, joinable tables, keep history when needed, and publish stable data marts for BI.

Mental model

The city analogy

Think of the warehouse like a city:

  • Staging = warehouses on the docks; everything arrives as-is
  • Core = factories; raw goods are standardized and quality-checked
  • Presentation (data marts) = storefronts; products are arranged for easy shopping
  • Facts = receipts of transactions; Dimensions = catalogs describing products/customers
  • Conformed dimensions = shared catalogs used by many stores

Core concepts

OLTP vs OLAP

  • OLTP: frequent small writes, normalized, supports apps (orders, payments)
  • OLAP: fewer large reads/aggregations, denormalized for analytics

Warehouse layers

  • Staging (raw/bronze): minimal transforms, schema-on-read acceptable
  • Core (standardized/silver): conformed entities, business keys, data quality
  • Presentation (gold/marts): star schemas optimized for BI

Schemas and modeling

  • Star schema: central fact table with dimension tables; simplest for BI
  • Snowflake: normalized dimensions; reduces duplication, more joins
  • Data Vault: Hubs (business keys), Links (relationships), Satellites (context/history); scalable, audit-friendly
  • 3NF (Inmon): enterprise integrated model, then marts for consumption

Grain, facts, and dimensions

  • Grain: the exact level a fact table represents (e.g., one line item per order)
  • Fact table: numeric measures at a set grain, with foreign keys to dimensions
  • Dimension table: descriptive attributes; often with surrogate keys
  • Conformed dimensions: reused across marts for consistent metrics

Slowly Changing Dimensions (SCD)

  • Type 1: overwrite; no history
  • Type 2: new row per change with effective dates and current flag
  • Type 3: limited history via prior columns for selective attributes

Processing styles

  • ETL: transform before load; classic on-prem style
  • ELT: load first, transform in-warehouse; common in cloud
  • Batch vs streaming: choose based on SLA and cost

Modern architectures

  • Lakehouse/Medallion: data lake + warehouse features; bronze/silver/gold layers
  • Warehouse-centric: curated warehouse plus external staging
  • Hybrid: vault core + star marts

Performance and cost levers

  • Columnar storage for scans; partitioning and clustering for pruning
  • Materialized views for common aggregates
  • Auditability and lineage for trust
  • Cost controls: data pruning, incremental loads, right-sizing compute

Worked examples

Example 1: E-commerce product analytics (Kimball/star)

  1. Business questions: daily revenue by product, conversion by campaign
  2. Choose star: FactOrderLine (grain: one line item), DimProduct, DimCustomer, DimDate, DimChannel
  3. Keys: surrogate keys in dims, foreign keys in fact
  4. SCD: DimProduct Type 2 for category changes
  5. Performance: partition FactOrderLine by order_date, cluster by product_id
Why star here?

Analysts need fast aggregates and simple joins. Conformed DimDate and DimChannel enable cross-domain reporting.

Example 2: Finance consolidation (Inmon core + marts)

  1. Business need: unified chart-of-accounts across subsidiaries
  2. Core (3NF): Account, Entity, LedgerEntry with harmonized business keys
  3. Downstream: build star marts for P&L and Balance Sheet
  4. Governance: lineage and data quality rules at core
Why Inmon-like?

Heavy standardization and integration across systems, strict governance, and multiple downstream marts benefit from a centralized normalized core.

Example 3: Regulated healthcare (Data Vault core + marts)

  1. Need: auditability, late-arriving data, flexible history
  2. Vault: Hubs for Patient, Provider; Links for Encounters; Satellites for attributes
  3. Presentation: derive star marts for claims analytics
  4. Benefits: scalable change management and traceability
Why Vault?

Vault separates keys, relationships, and context. It simplifies handling schema drift and regulatory audits.

Architecture selection shortcuts

Quick guidance
  • Fast BI with stable domains: Star/snowflake marts (Kimball)
  • Enterprise integration and governance first: Inmon (3NF core) then marts
  • High change rate, mixed sources, strong audit: Data Vault core + marts
  • Data lake first, broad file ingestion: Lakehouse medallion with gold marts

It’s common to mix: Vault or 3NF in core; star in presentation.

Exercises

Complete the exercise below. You can take the quick test at the end. Everyone can take the test for free; logged-in users will have their progress saved.

Exercise 1: Design a simple analytics warehouse for an online store

Goal: choose an architecture and model for sales analytics.

  1. Pick an overall approach (Kimball star, Inmon core, Data Vault, or lakehouse) and justify in 2-3 sentences.
  2. Define the grain of your main fact table.
  3. List at least 4 dimensions with key attributes and identify which ones need SCD Type 2.
  4. Specify partitioning/clustering strategy for the fact table.
  5. Write 2 sample queries analysts can run on your model.
Acceptance checklist
  • Clear architecture choice and rationale
  • Fact grain stated precisely
  • Dimensions listed with surrogate keys and SCD decisions
  • Partitioning/clustering described
  • Two realistic queries included

Common mistakes and self-check

  • Unclear grain: leads to double counting. Self-check: can you state a single sentence “one row = …”?
  • Missing surrogate keys in dimensions: breaks SCD Type 2 joins. Self-check: do dims have stable surrogate keys?
  • Too many snowflake levels: hurts performance. Self-check: are joins minimal for core dashboards?
  • Overusing Type 2: bloats tables. Self-check: which attributes truly matter historically?
  • No conformed dimensions: inconsistent metrics. Self-check: are Date, Customer, Product reused across marts?
  • Ignoring pruning: no partitioning/clustering. Self-check: do the largest facts partition by date or ingestion time?

Practical projects

  • Sales mart: build a star schema from public retail-like data; include DimDate, DimProduct, DimCustomer; implement SCD2 for product category.
  • Vault-to-star pipeline: model a small Data Vault (Hubs, Links, Satellites), then derive a star mart and compare row counts and lineage.
  • Lakehouse medallion: ingest CSV to bronze, standardize to silver with quality checks, and publish a gold star mart with two materialized views.

Mini challenge

You inherit a mart where FactOrders has mixed grains (order-level and line-level). Propose a two-step remediation plan that fixes grain and preserves existing dashboards with minimal disruption.

Hint

Create a new FactOrderLine with correct grain and a view for backward compatibility while migrating dashboards.

Next steps

  • Model another domain (e.g., marketing) and ensure conformed dimensions with sales
  • Add SCD2 to one more dimension and validate joins
  • Benchmark queries with different partition/cluster strategies

Reference notes

  • It’s normal to combine approaches: a governed core (Vault/3NF) feeding star marts
  • Choose ETL vs ELT by tooling, cost, and skills; cloud favors ELT
  • Keep models explainable to stakeholders; clarity beats cleverness

Practice Exercises

1 exercises to complete

Instructions

Choose an architecture and produce a minimal design for sales analytics.

  1. Select an approach (Kimball star, Inmon, Data Vault, or lakehouse) and justify it in 2–3 sentences.
  2. Define the grain of the central fact table.
  3. List at least 4 dimensions with key attributes and identify SCD Type 2 candidates.
  4. Propose partitioning/clustering for the fact.
  5. Write two example analyst queries.
  • Deliverable: a short design doc (bullets are fine).
  • Timebox: 45–60 minutes.
Expected Output
A concise design doc covering architecture choice, fact grain statement, dimension list with SCD decisions, partition/cluster strategy, and two SQL query examples.

Warehouse Concepts And Architectures — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Warehouse Concepts And Architectures?

AI Assistant

Ask questions about this tool