How to learn Refs And Dependency Graph for dbt Concepts in Analytics Engineer for free

Why this matters

Refs and the dependency graph are the heart of how dbt knows what to build and in what order. In real Analytics Engineering work you will:

Build reliable pipelines where upstream fixes automatically trigger correct downstream rebuilds.
Avoid manual ordering or fragile SQL that breaks across environments.
Run only what changed and its dependents to speed up CI and production deploys.

Progress saving note: The quick test on this page is available to everyone. Only logged-in users will have their progress saved.

Concept explained simply

ref() tells dbt: “this model depends on that model.” dbt uses all your ref() calls to build a Directed Acyclic Graph (DAG). Then it runs models in a safe order from leaves to roots.

ref('model_name'): references another model in the same project.
ref('package_name','model_name'): references a model from another package.
source('schema','table'): references raw data that dbt does not build.

Materializations (table, view, incremental, ephemeral) affect how a node is built, but the graph still comes from ref() and source().

Mental model

Imagine plumbing in a building. Each model is a fixture. Pipes are ref() connections. dbt looks at all the pipes and figures out the order to open valves safely so water flows correctly. Circular pipes (A feeds B and B feeds A) cannot work—dbt will stop you.

How dbt builds and uses the graph

Parses your project and packages to find models, seeds, snapshots, tests, and their SQL.
Creates nodes for these assets.
Creates edges when it sees ref() and source() calls.
Validates the DAG (no cycles) and plans a topological run order.
Executes only the selected slice of the graph (using selectors like + and @).

Peek under the hood

Ephemeral models don’t persist but still show in the DAG. They compile into CTEs within the downstream SQL.
Dynamic ref names (building a name with logic) are possible but discouraged—they make parsing and selection less predictable.
Cross-package refs keep your models loosely coupled while remaining dependency-aware.

Worked examples

Example 1 — Classic staging to mart pipeline

Files:

-- models/staging/stg_orders.sql
select * from {{ source('app', 'orders') }}

-- models/intermediate/int_customer_orders.sql
select
  o.customer_id,
  o.order_id,
  o.order_total
from {{ ref('stg_orders') }} o

-- models/marts/fct_revenue_by_customer.sql
select
  customer_id,
  sum(order_total) as revenue
from {{ ref('int_customer_orders') }}
group by 1

Resulting order: source(app.orders) → stg_orders → int_customer_orders → fct_revenue_by_customer.

Example 2 — Ephemeral joiner

-- models/helpers/join_customers_orders.sql
{{ config(materialized='ephemeral') }}
select c.customer_id, c.customer_tier, o.order_id, o.order_total
from {{ ref('stg_customers') }} c
join {{ ref('stg_orders') }} o using (customer_id)

-- models/marts/fct_customer_value.sql
select customer_tier, sum(order_total) as revenue
from {{ ref('join_customers_orders') }}
group by 1

Note: The ephemeral model compiles into a CTE inside fct_customer_value, but it still appears in the DAG for selection and lineage.

Example 3 — Cross-package reference

-- In your project, referencing a model from package 'core_dims'
select *
from {{ ref('core_dims', 'dim_date') }} d
join {{ ref('int_customer_orders') }} o on d.date = o.order_date

Why: Keeps shared dimensions centralized while preserving dependency tracking.

Selector quick demo

@fct_revenue_by_customer: select the model plus all ancestors and descendants (entire neighborhood).
+fct_revenue_by_customer: include immediate parents and children only.
Combine with tags or paths to target only what matters in CI.

Exercises

These mirror the interactive exercises below. Try here first, then check the exercise blocks for sample solutions.

ex1: Create a 3-model chain using ref() that aggregates revenue by customer. Show the expected topological order.
ex2: You discover a circular dependency between two intermediate models. Break the cycle by introducing a proper staging split.
ex3: Choose selectors to rebuild one mart and everything it depends on, without touching unrelated marts.

Need a hint?

Think layer-first: staging → intermediate → marts.
Ephemeral is great for small, reusable join logic.
Use @your_model to grab the full ancestry and descendancy.

Self-check checklist

All downstream models reference upstream ones via ref() (no hard-coded schemas).
No circular dependencies remain.
Your DAG reads left-to-right from raw sources to marts.
Selectors you’d use in CI are predictable and minimal.

Common mistakes and how to self-check

Hard-coding database/schema/table names instead of using ref() and source(). Self-check: search for quoted schema.table patterns and replace with refs.
Creating circular dependencies by letting two models depend on each other. Self-check: if two models both need each other’s outputs, extract the shared logic into a new upstream model.
Overusing dynamic ref names (e.g., building names with macros). Self-check: can you easily select just the affected models with standard selectors? If not, simplify.
Mixing raw and cleaned data in one model. Self-check: ensure raw inputs come from source() only, then move transformations into stg_ or int_ layers.
Ignoring ephemeral cost in heavy queries. Self-check: if a compiled model becomes too large or slow, consider materializing to view or table.

Who this is for

Analytics Engineers and BI Developers building modular SQL transformations.
Data Analysts moving from one-off SQL to reusable, versioned models.

Prerequisites

Comfort writing SELECT queries and basic joins.
Basic understanding of dbt projects (models, profiles, materializations).
Know what sources, seeds, and tests are at a high level.

Learning path

Review sources vs. models: use source() for raw and ref() for modeled data.
Practice layered modeling: staging → intermediate → marts.
Add ephemeral helpers where appropriate.
Adopt selectors in your daily workflow and CI.

Practical projects

Refactor an existing SQL pipeline into dbt models with clear staging, intermediate, and mart layers using ref().
Create a small shared package with a dim_date model and ref it from another project.
Set up a CI plan that runs only modified models and their dependents using selectors.

Next steps

Add tests (unique, not_null) to upstream nodes so issues fail fast.
Adopt naming conventions like stg_, int_, and fct_ to keep the DAG understandable.
Document lineage so stakeholders understand where metrics come from.

Mini challenge

Design a mini DAG for churn analysis:

stg_subscriptions from source('app','subscriptions')
int_active_periods derived from stg_subscriptions
fct_monthly_churn aggregating churn by month

Write the three SQL files using ref() and show the expected run order. Then decide whether any node should be ephemeral and justify why.

Menu

Refs And Dependency Graph

Table of Contents

Why this matters

Concept explained simply

How dbt builds and uses the graph

Worked examples

Exercises

Self-check checklist

Common mistakes and how to self-check

Who this is for

Prerequisites

Learning path

Practical projects

Next steps

Mini challenge

Practice Exercises

Build a 3-model DAG with ref()

Instructions

Expected Output

Break a circular dependency

Select the right slice with selectors

Refs And Dependency Graph — Quick Test

Have questions about Refs And Dependency Graph?

AI Assistant