How to learn Model Layers Staging Intermediate Marts for dbt Concepts in Analytics Engineer for free

Why this matters

Analytics Engineers shape raw data into trustworthy, business-ready tables. Layering your dbt models into staging, intermediate, and marts makes your project predictable, easy to debug, and fast to iterate. In real work, you will: standardize messy source fields, join and de-duplicate data, and publish clean dimensions and facts for BI dashboards, finance reports, and product analytics.

Make upstream changes low-risk by isolating them in staging.
Keep business logic organized in intermediate so its testable and reusable.
Ship reliable, well-documented marts that analysts can trust.

Concept explained simply

Think of a three-step assembly line:

Staging (stg_): lightly clean one source at a time. Rename, cast types, basic filters. No joins.
Intermediate (int_): combine multiple staging models, de-duplicate, compute reusable features and grains.
Marts (dim_/fct_): publish final business tables for reporting: dimensions (one row per entity) and facts (one row per event/measure).

Mental model

Kitchen metaphor: staging is your pantry prep (wash, label), intermediate is your chopping station (mix and pre-cook), marts are plated dishes ready to serve to stakeholders.

source tables > stg_*  > int_*  > dim_*/fct_*
(light clean)      (joins, logic)     (final for BI)

Layer roles and conventions

Staging (stg_*)
- Goal: one source per model, tidy columns.
- Common actions: rename columns, cast types, trim/clean strings, simple filters, de-duplicate only if its clearly source-level noise.
- Folders: models/staging/<source_system>/
- Materialization: usually view or ephemeral for speed and clarity.
- Tests: not_null/unique on keys, accepted_values on enums.
Intermediate (int_*)
- Goal: reusable business logic and shaping.
- Common actions: joins across stg models, de-dupes, surrogate keys, entity resolution, re-graining (to daily/customer/order grain), feature computation.
- Folders: models/intermediate/
- Materialization: view for light logic; table for heavy logic; ephemeral if used only once and cheap.
Marts (dim_*, fct_*)
- Goal: final curated tables for BI/analytics.
- Common actions: finalize grain, select only needed fields, apply naming that analysts expect, add documentation.
- Folders: models/marts/<domain>/
- Materialization: table or incremental for large facts; view for small dims.
- Tests: not_null/unique on natural/surrogate keys; referential integrity between facts and dimensions.

Worked examples

Example 1 Staging a single source (orders)

Create a staging model that cleans names and types from a raw orders source.

-- models/staging/shop/stg_orders.sql
with src as (
  select * from {{ source('shop', 'orders') }}
)
select
  cast(order_id as bigint)        as order_id,
  cast(customer_id as bigint)     as customer_id,
  upper(trim(status))             as order_status,
  cast(order_timestamp as timestamp) as order_ts,
  cast(amount_usd as numeric(12,2)) as amount_usd
from src
where order_id is not null;

# models/staging/shop/stg_orders.yml
version: 2
models:
  - name: stg_orders
    tests: []
    columns:
      - name: order_id
        tests: [not_null, unique]
      - name: order_status
        tests:
          - accepted_values:
              values: ['PLACED','PAID','CANCELLED','REFUNDED']

Example 2 Intermediate join and re-grain

Join orders to customers and compute reusable metrics per customer.

-- models/intermediate/int_customer_orders.sql
with orders as (
  select * from {{ ref('stg_orders') }}
), customers as (
  select
    cast(customer_id as bigint) as customer_id,
    lower(trim(email))          as email_lower,
    first_name,
    last_name
  from {{ ref('stg_customers') }}
), joined as (
  select
    c.customer_id,
    c.email_lower,
    o.order_id,
    o.order_status,
    o.order_ts::date as order_date,
    o.amount_usd
  from orders o
  left join customers c using (customer_id)
)
select
  customer_id,
  email_lower,
  count(*) filter (where order_status in ('PAID','PLACED')) as order_count,
  sum(case when order_status = 'PAID' then amount_usd else 0 end) as total_revenue,
  min(order_date) as first_order_date,
  max(order_date) as last_order_date
from joined
group by 1,2;

Example 3 Publish marts: a dimension and a fact

-- models/marts/sales/dim_customer.sql
select
  customer_id,
  email_lower as email,
  first_order_date,
  last_order_date,
  order_count,
  total_revenue
from {{ ref('int_customer_orders') }};

-- models/marts/sales/fct_orders_daily.sql
with orders as (
  select * from {{ ref('stg_orders') }}
)
select
  order_ts::date       as date_day,
  order_id,
  customer_id,
  order_status,
  amount_usd
from orders
where order_status in ('PLACED','PAID','REFUNDED');

# models/marts/sales/schema.yml
version: 2
models:
  - name: dim_customer
    columns:
      - name: customer_id
        tests: [not_null, unique]
  - name: fct_orders_daily
    columns:
      - name: order_id
        tests: [not_null, unique]
      - name: customer_id
        tests:
          - relationships:
              to: ref('dim_customer')
              field: customer_id

Materializations by layer

stg_*: view or ephemeral. Prefer view if you want queryable checkpoints for debugging.
int_*: view for light logic; table for heavy transforms or expensive joins; ephemeral when only used once and cheap.
dim_*/fct_*: table by default; incremental for large, append-like facts to reduce runtime.

Rule of thumb

If its reused by multiple downstream models, make it persistent (view/table).
If its large and stable daily, consider incremental.
If its tiny glue logic used only once, ephemeral is fine.

Quality checks and tests by layer

Keep tests close to where the problem originates.

Staging: validate source integrity (keys, enums, basic de-dup).
Intermediate: check grains and keys after joins (unique + not_null).
Marts: enforce relationships and documented fields.

# example unique grain test in intermediate
version: 2
models:
  - name: int_customer_orders
    columns:
      - name: customer_id
        tests: [not_null]
    tests:
      - dbt_utils.unique_combination_of_columns:
          combination_of_columns: [customer_id]

Your turn: Exercises

Practice the full flow from staging to marts. The quick test is below and always available. Note: everyone can take it; only logged-in learners have progress saved.

Exercise ex1 Build clean staging models for orders and customers.
Exercise ex2 Create an intermediate join and publish dim and fct models.

Checklist before you move on

All staging models reference sources using {{ source() }} and contain only one source each.
Intermediate models define a clear grain and pass uniqueness tests.
Marts expose only needed columns with friendly names.
Key columns have not_null and unique tests; facts relate to dimensions.

Common mistakes and self-check

Joining in staging: move joins to intermediate.
Unclear grain in intermediate: add a unique grain test and comment the intended grain at the top of the model.
Leaking technical columns into marts: select only business-friendly columns in dim_/fct_.
Missing relationships: add relationships tests from facts to dims.
Too-heavy views: if a view is slow and widely used, materialize as table or incremental.

Fast self-check routine

Row counts: stg rows should be close to source (after obvious filters); int rows should match intended grain; fct rows should match event count.
Dupes: run select key, count(*) from model group by 1 having count(*) > 1.
Documentation: can a new analyst understand column names without opening the SQL?

Practical projects

Project 1: E-commerce core

Staging: stg_orders, stg_customers, stg_products.
Intermediate: int_customer_orders, int_product_sales.
Marts: dim_customer, dim_product, fct_orders_daily.

Project 2: Subscription analytics

Staging: stg_subscriptions, stg_invoices, stg_payments.
Intermediate: int_subscription_status_daily (derive active/cancelled per day).
Marts: dim_account, fct_mrr_daily (incremental).

Project 3: Marketing attribution (simple)

Staging: stg_sessions, stg_utm_campaigns.
Intermediate: int_first_touch_by_user.
Marts: dim_user, fct_attributed_signups.

Who this is for

Analytics Engineers and Data Analysts building reliable BI layers.
Data Engineers organizing ELT pipelines for downstream analytics.

Prerequisites

Comfort with SQL SELECT/JOIN/GROUP BY.
Basic dbt familiarity: ref(), source(), models, schema.yml, tests.

Learning path

Review sources and staging conventions.
Practice defining grains and building intermediate joins.
Publish marts and add tests.
Optimize materializations (view/table/incremental).

Next steps

Add documentation blocks and descriptions to your marts.
Introduce incremental logic to large facts.
Set up CI to run tests on pull requests.

Mini challenge

Your product team needs a daily active users fact. Create stg_events, build int_user_activity_daily, and publish fct_dau with one row per user per day. Add tests to ensure (user_id, date_day) is unique.

Progress note: The quick test is available to everyone; sign in to save your progress and scores.

Instructions

Create two staging models that each reference a single source and apply light cleaning.

Files: models/staging/shop/stg_orders.sql and models/staging/shop/stg_customers.sql.
stg_orders: select from {{ source('shop','orders') }}, cast order_id and customer_id to bigint, upper-case status as order_status, cast order_timestamp to timestamp as order_ts, cast amount_usd to numeric(12,2), filter null order_id.
stg_customers: select from {{ source('shop','customers') }}, cast customer_id to bigint, lower+trim email to email_lower, keep first_name, last_name.
Add tests in a YAML file (models/staging/shop/stg_models.yml): unique+not_null on order_id and customer_id; accepted_values for order_status.

Menu

Model Layers Staging Intermediate Marts

Table of Contents