Why this matters
Analytics Engineers shape raw data into trustworthy, business-ready tables. Layering your dbt models into staging, intermediate, and marts makes your project predictable, easy to debug, and fast to iterate. In real work, you will: standardize messy source fields, join and de-duplicate data, and publish clean dimensions and facts for BI dashboards, finance reports, and product analytics.
- Make upstream changes low-risk by isolating them in staging.
- Keep business logic organized in intermediate so its testable and reusable.
- Ship reliable, well-documented marts that analysts can trust.
Concept explained simply
Think of a three-step assembly line:
- Staging (stg_): lightly clean one source at a time. Rename, cast types, basic filters. No joins.
- Intermediate (int_): combine multiple staging models, de-duplicate, compute reusable features and grains.
- Marts (dim_/fct_): publish final business tables for reporting: dimensions (one row per entity) and facts (one row per event/measure).
Mental model
Kitchen metaphor: staging is your pantry prep (wash, label), intermediate is your chopping station (mix and pre-cook), marts are plated dishes ready to serve to stakeholders.
source tables > stg_* > int_* > dim_*/fct_*
(light clean) (joins, logic) (final for BI)
Layer roles and conventions
- Staging (stg_*)
- Goal: one source per model, tidy columns.
- Common actions: rename columns, cast types, trim/clean strings, simple filters, de-duplicate only if its clearly source-level noise.
- Folders: models/staging/<source_system>/
- Materialization: usually view or ephemeral for speed and clarity.
- Tests: not_null/unique on keys, accepted_values on enums.
- Intermediate (int_*)
- Goal: reusable business logic and shaping.
- Common actions: joins across stg models, de-dupes, surrogate keys, entity resolution, re-graining (to daily/customer/order grain), feature computation.
- Folders: models/intermediate/
- Materialization: view for light logic; table for heavy logic; ephemeral if used only once and cheap.
- Marts (dim_*, fct_*)
- Goal: final curated tables for BI/analytics.
- Common actions: finalize grain, select only needed fields, apply naming that analysts expect, add documentation.
- Folders: models/marts/<domain>/
- Materialization: table or incremental for large facts; view for small dims.
- Tests: not_null/unique on natural/surrogate keys; referential integrity between facts and dimensions.
Worked examples
Example 1 Staging a single source (orders)
Create a staging model that cleans names and types from a raw orders source.
-- models/staging/shop/stg_orders.sql
with src as (
select * from {{ source('shop', 'orders') }}
)
select
cast(order_id as bigint) as order_id,
cast(customer_id as bigint) as customer_id,
upper(trim(status)) as order_status,
cast(order_timestamp as timestamp) as order_ts,
cast(amount_usd as numeric(12,2)) as amount_usd
from src
where order_id is not null;
# models/staging/shop/stg_orders.yml
version: 2
models:
- name: stg_orders
tests: []
columns:
- name: order_id
tests: [not_null, unique]
- name: order_status
tests:
- accepted_values:
values: ['PLACED','PAID','CANCELLED','REFUNDED']
Example 2 Intermediate join and re-grain
Join orders to customers and compute reusable metrics per customer.
-- models/intermediate/int_customer_orders.sql
with orders as (
select * from {{ ref('stg_orders') }}
), customers as (
select
cast(customer_id as bigint) as customer_id,
lower(trim(email)) as email_lower,
first_name,
last_name
from {{ ref('stg_customers') }}
), joined as (
select
c.customer_id,
c.email_lower,
o.order_id,
o.order_status,
o.order_ts::date as order_date,
o.amount_usd
from orders o
left join customers c using (customer_id)
)
select
customer_id,
email_lower,
count(*) filter (where order_status in ('PAID','PLACED')) as order_count,
sum(case when order_status = 'PAID' then amount_usd else 0 end) as total_revenue,
min(order_date) as first_order_date,
max(order_date) as last_order_date
from joined
group by 1,2;
Example 3 Publish marts: a dimension and a fact
-- models/marts/sales/dim_customer.sql
select
customer_id,
email_lower as email,
first_order_date,
last_order_date,
order_count,
total_revenue
from {{ ref('int_customer_orders') }};
-- models/marts/sales/fct_orders_daily.sql
with orders as (
select * from {{ ref('stg_orders') }}
)
select
order_ts::date as date_day,
order_id,
customer_id,
order_status,
amount_usd
from orders
where order_status in ('PLACED','PAID','REFUNDED');
# models/marts/sales/schema.yml
version: 2
models:
- name: dim_customer
columns:
- name: customer_id
tests: [not_null, unique]
- name: fct_orders_daily
columns:
- name: order_id
tests: [not_null, unique]
- name: customer_id
tests:
- relationships:
to: ref('dim_customer')
field: customer_id
Materializations by layer
- stg_*: view or ephemeral. Prefer view if you want queryable checkpoints for debugging.
- int_*: view for light logic; table for heavy transforms or expensive joins; ephemeral when only used once and cheap.
- dim_*/fct_*: table by default; incremental for large, append-like facts to reduce runtime.
Rule of thumb
- If its reused by multiple downstream models, make it persistent (view/table).
- If its large and stable daily, consider incremental.
- If its tiny glue logic used only once, ephemeral is fine.
Quality checks and tests by layer
Keep tests close to where the problem originates.
- Staging: validate source integrity (keys, enums, basic de-dup).
- Intermediate: check grains and keys after joins (unique + not_null).
- Marts: enforce relationships and documented fields.
# example unique grain test in intermediate
version: 2
models:
- name: int_customer_orders
columns:
- name: customer_id
tests: [not_null]
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns: [customer_id]
Your turn: Exercises
Practice the full flow from staging to marts. The quick test is below and always available. Note: everyone can take it; only logged-in learners have progress saved.
- Exercise ex1 Build clean staging models for orders and customers.
- Exercise ex2 Create an intermediate join and publish dim and fct models.
Checklist before you move on
- All staging models reference sources using {{ source() }} and contain only one source each.
- Intermediate models define a clear grain and pass uniqueness tests.
- Marts expose only needed columns with friendly names.
- Key columns have not_null and unique tests; facts relate to dimensions.
Common mistakes and self-check
- Joining in staging: move joins to intermediate.
- Unclear grain in intermediate: add a unique grain test and comment the intended grain at the top of the model.
- Leaking technical columns into marts: select only business-friendly columns in dim_/fct_.
- Missing relationships: add relationships tests from facts to dims.
- Too-heavy views: if a view is slow and widely used, materialize as table or incremental.
Fast self-check routine
- Row counts: stg rows should be close to source (after obvious filters); int rows should match intended grain; fct rows should match event count.
- Dupes: run select key, count(*) from model group by 1 having count(*) > 1.
- Documentation: can a new analyst understand column names without opening the SQL?
Practical projects
Project 1: E-commerce core
- Staging: stg_orders, stg_customers, stg_products.
- Intermediate: int_customer_orders, int_product_sales.
- Marts: dim_customer, dim_product, fct_orders_daily.
Project 2: Subscription analytics
- Staging: stg_subscriptions, stg_invoices, stg_payments.
- Intermediate: int_subscription_status_daily (derive active/cancelled per day).
- Marts: dim_account, fct_mrr_daily (incremental).
Project 3: Marketing attribution (simple)
- Staging: stg_sessions, stg_utm_campaigns.
- Intermediate: int_first_touch_by_user.
- Marts: dim_user, fct_attributed_signups.
Who this is for
- Analytics Engineers and Data Analysts building reliable BI layers.
- Data Engineers organizing ELT pipelines for downstream analytics.
Prerequisites
- Comfort with SQL SELECT/JOIN/GROUP BY.
- Basic dbt familiarity: ref(), source(), models, schema.yml, tests.
Learning path
- Review sources and staging conventions.
- Practice defining grains and building intermediate joins.
- Publish marts and add tests.
- Optimize materializations (view/table/incremental).
Next steps
- Add documentation blocks and descriptions to your marts.
- Introduce incremental logic to large facts.
- Set up CI to run tests on pull requests.
Mini challenge
Your product team needs a daily active users fact. Create stg_events, build int_user_activity_daily, and publish fct_dau with one row per user per day. Add tests to ensure (user_id, date_day) is unique.
Progress note: The quick test is available to everyone; sign in to save your progress and scores.