Why this matters
Refs and the dependency graph are the heart of how dbt knows what to build and in what order. In real Analytics Engineering work you will:
- Build reliable pipelines where upstream fixes automatically trigger correct downstream rebuilds.
- Avoid manual ordering or fragile SQL that breaks across environments.
- Run only what changed and its dependents to speed up CI and production deploys.
Progress saving note: The quick test on this page is available to everyone. Only logged-in users will have their progress saved.
Concept explained simply
ref() tells dbt: “this model depends on that model.” dbt uses all your ref() calls to build a Directed Acyclic Graph (DAG). Then it runs models in a safe order from leaves to roots.
ref('model_name'): references another model in the same project.ref('package_name','model_name'): references a model from another package.source('schema','table'): references raw data that dbt does not build.
Materializations (table, view, incremental, ephemeral) affect how a node is built, but the graph still comes from ref() and source().
Mental model
Imagine plumbing in a building. Each model is a fixture. Pipes are ref() connections. dbt looks at all the pipes and figures out the order to open valves safely so water flows correctly. Circular pipes (A feeds B and B feeds A) cannot work—dbt will stop you.
How dbt builds and uses the graph
- Parses your project and packages to find models, seeds, snapshots, tests, and their SQL.
- Creates nodes for these assets.
- Creates edges when it sees
ref()andsource()calls. - Validates the DAG (no cycles) and plans a topological run order.
- Executes only the selected slice of the graph (using selectors like
+and@).
Peek under the hood
- Ephemeral models don’t persist but still show in the DAG. They compile into CTEs within the downstream SQL.
- Dynamic ref names (building a name with logic) are possible but discouraged—they make parsing and selection less predictable.
- Cross-package refs keep your models loosely coupled while remaining dependency-aware.
Worked examples
Example 1 — Classic staging to mart pipeline
Files:
-- models/staging/stg_orders.sql
select * from {{ source('app', 'orders') }}
-- models/intermediate/int_customer_orders.sql
select
o.customer_id,
o.order_id,
o.order_total
from {{ ref('stg_orders') }} o
-- models/marts/fct_revenue_by_customer.sql
select
customer_id,
sum(order_total) as revenue
from {{ ref('int_customer_orders') }}
group by 1
Resulting order: source(app.orders) → stg_orders → int_customer_orders → fct_revenue_by_customer.
Example 2 — Ephemeral joiner
-- models/helpers/join_customers_orders.sql
{{ config(materialized='ephemeral') }}
select c.customer_id, c.customer_tier, o.order_id, o.order_total
from {{ ref('stg_customers') }} c
join {{ ref('stg_orders') }} o using (customer_id)
-- models/marts/fct_customer_value.sql
select customer_tier, sum(order_total) as revenue
from {{ ref('join_customers_orders') }}
group by 1
Note: The ephemeral model compiles into a CTE inside fct_customer_value, but it still appears in the DAG for selection and lineage.
Example 3 — Cross-package reference
-- In your project, referencing a model from package 'core_dims'
select *
from {{ ref('core_dims', 'dim_date') }} d
join {{ ref('int_customer_orders') }} o on d.date = o.order_date
Why: Keeps shared dimensions centralized while preserving dependency tracking.
Selector quick demo
@fct_revenue_by_customer: select the model plus all ancestors and descendants (entire neighborhood).+fct_revenue_by_customer: include immediate parents and children only.- Combine with tags or paths to target only what matters in CI.
Exercises
These mirror the interactive exercises below. Try here first, then check the exercise blocks for sample solutions.
- ex1: Create a 3-model chain using
ref()that aggregates revenue by customer. Show the expected topological order. - ex2: You discover a circular dependency between two intermediate models. Break the cycle by introducing a proper staging split.
- ex3: Choose selectors to rebuild one mart and everything it depends on, without touching unrelated marts.
Need a hint?
- Think layer-first: staging → intermediate → marts.
- Ephemeral is great for small, reusable join logic.
- Use
@your_modelto grab the full ancestry and descendancy.
Self-check checklist
- All downstream models reference upstream ones via
ref()(no hard-coded schemas). - No circular dependencies remain.
- Your DAG reads left-to-right from raw sources to marts.
- Selectors you’d use in CI are predictable and minimal.
Common mistakes and how to self-check
- Hard-coding database/schema/table names instead of using
ref()andsource(). Self-check: search for quoted schema.table patterns and replace with refs. - Creating circular dependencies by letting two models depend on each other. Self-check: if two models both need each other’s outputs, extract the shared logic into a new upstream model.
- Overusing dynamic ref names (e.g., building names with macros). Self-check: can you easily select just the affected models with standard selectors? If not, simplify.
- Mixing raw and cleaned data in one model. Self-check: ensure raw inputs come from
source()only, then move transformations intostg_orint_layers. - Ignoring ephemeral cost in heavy queries. Self-check: if a compiled model becomes too large or slow, consider materializing to
viewortable.
Who this is for
- Analytics Engineers and BI Developers building modular SQL transformations.
- Data Analysts moving from one-off SQL to reusable, versioned models.
Prerequisites
- Comfort writing SELECT queries and basic joins.
- Basic understanding of dbt projects (models, profiles, materializations).
- Know what sources, seeds, and tests are at a high level.
Learning path
- Review sources vs. models: use
source()for raw andref()for modeled data. - Practice layered modeling: staging → intermediate → marts.
- Add ephemeral helpers where appropriate.
- Adopt selectors in your daily workflow and CI.
Practical projects
- Refactor an existing SQL pipeline into dbt models with clear staging, intermediate, and mart layers using
ref(). - Create a small shared package with a
dim_datemodel and ref it from another project. - Set up a CI plan that runs only modified models and their dependents using selectors.
Next steps
- Add tests (unique, not_null) to upstream nodes so issues fail fast.
- Adopt naming conventions like
stg_,int_, andfct_to keep the DAG understandable. - Document lineage so stakeholders understand where metrics come from.
Mini challenge
Design a mini DAG for churn analysis:
stg_subscriptionsfromsource('app','subscriptions')int_active_periodsderived fromstg_subscriptionsfct_monthly_churnaggregating churn by month
Write the three SQL files using ref() and show the expected run order. Then decide whether any node should be ephemeral and justify why.