Why dbt Concepts matter for Analytics Engineers
dbt is the backbone of modern analytics engineering. It lets you version, test, document, and deploy SQL transformations as code. With dbt concepts, you can standardize warehouse modeling, guarantee data quality, document lineage to BI, and ship reliable analytics faster.
- Turn raw data into trusted, modeled datasets.
- Automate tests and documentation so teams trust dashboards.
- Use environments and deployments to promote changes safely.
Who this is for
- Analytics Engineers building warehouse models for BI and data apps.
- Data/BI Analysts moving from ad-hoc SQL to governed pipelines.
- Data Engineers who want standardized SQL transformations and lineage.
Prerequisites
- Comfortable with SQL (joins, window functions, CTEs).
- Basic Git workflows (branch, commit, merge).
- Access to a SQL data warehouse (e.g., Postgres, Snowflake, BigQuery, Redshift) and dbt CLI.
Quick environment checklist
- dbt installed and profiles.yml configured for your warehouse.
- A project scaffolded (dbt init) and runs working (dbt debug).
- At least one raw table available to model.
Learning path (practical roadmap)
- Model layers β organize models into staging, intermediate, marts. Establish naming conventions and folder structure.
- Sources and freshness β declare sources in YAML, reference with
source(), and configure freshness. - Materializations β choose view/table; implement incremental for large facts.
- Tests and packages β add generic tests in YAML; use a common package (e.g., utility macros).
- Seeds and snapshots β load small CSVs as tables; version-change capture with snapshots.
- Macros and Jinja β DRY up SQL with macros; use conditionals and loops safely.
- Refs and dependency graph β link models with
ref(), run by selection, explore DAG. - Documentation β write model and column descriptions; generate docs site.
- Exposures β register dashboards/reports for lineage to BI.
- Environments and deployments β use targets (dev/stage/prod), run/build, and CI-minded checks.
Milestone checks
- You can run
dbt buildwithout errors. - You added at least 3 generic tests that pass.
- Docs site shows sources β models β exposures lineage.
Worked examples
1) Declare a source with freshness and build a staging model
sources.yml
version: 2
sources:
- name: raw
database: analytics
schema: raw
freshness:
warn_after: {count: 24, period: hour}
error_after: {count: 48, period: hour}
tables:
- name: orders
loaded_at_field: _ingested_at
models/staging/stg_orders.sql
with src as (
select * from {{ source('raw', 'orders') }}
)
select
id as order_id,
customer_id,
status,
total_amount::numeric as total_amount,
order_timestamp
from src
Run freshness checks: dbt source freshness. Then build: dbt build -s models/staging/stg_orders.sql.
2) Incremental model with merge strategy
models/marts/fct_orders.sql
{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge'
) }}
with base as (
select * from {{ ref('stg_orders') }}
)
select
order_id,
customer_id,
status,
total_amount,
order_timestamp
from base
{% if is_incremental() %}
where order_timestamp > (select coalesce(max(order_timestamp), '1900-01-01') from {{ this }})
{% endif %}
Build only this model and its parents: dbt build -s +fct_orders.
3) Snapshot to track slowly changing dimensions
snapshots/customer_snapshot.sql
{% snapshot customer_snapshot %}
{{ config(
target_schema='snapshots',
unique_key='customer_id',
strategy='timestamp',
updated_at='updated_at'
) }}
select * from {{ ref('stg_customers') }}
{% endsnapshot %}
Create/update snapshots: dbt snapshot.
4) Macro to standardize country names
macros/clean_country.sql
{% macro clean_country(country_col) %}
case
when lower({{ country_col }}) in ('us','usa','united states') then 'United States'
when lower({{ country_col }}) in ('uk','united kingdom','gb','great britain') then 'United Kingdom'
else {{ country_col }}
end
{% endmacro %}
models/intermediate/int_customers.sql
select
customer_id,
{{ clean_country('country') }} as country_standardized
from {{ ref('stg_customers') }}
5) Exposure to document a BI dashboard
exposures.yml
version: 2
exposures:
- name: revenue_dashboard
type: dashboard
maturity: high
owner:
name: Analytics Team
email: analytics@example.com
depends_on:
- ref('fct_orders')
url: ''
description: Key revenue KPIs for leadership
Rebuild docs so lineage includes the dashboard: dbt docs generate.
6) Tests in YAML (generic + relationships)
models/schema.yml
version: 2
models:
- name: fct_orders
description: Order fact table
columns:
- name: order_id
tests:
- not_null
- unique
- name: customer_id
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
Run tests: dbt test -s fct_orders.
Skill drills (checklist)
- Create a
sources.ymlwith freshness and rundbt source freshness. - Build a staging model that casts types and renames columns predictably.
- Convert a heavy model to
materialized='incremental'withunique_key. - Add at least 3 generic tests to a fact or dimension model and get them passing.
- Create one snapshot and confirm multiple records per business key over time.
- Write a small macro and use it in two different models.
- Generate docs and confirm lineage chains include your exposure.
- Run only a subtree with
dbt build -s +your_model.
Common mistakes and debugging tips
- Missing or incorrect unique_key on incremental merge.
Tip: Ensureunique_keyexists and is stable. Validate withdbt run -s your_model --full-refreshif needed. - Freshness not working.
Tip: Setloaded_at_fieldon the source table. Confirm type is timestamp and values update as expected. - Tests never run.
Tip: Generic tests go in YAML under the model/columns. Verify model names match and rundbt test -s model_name. - Macros not recognized.
Tip: Macros must be in themacros/directory. Restart the run after creating new macros. - Broken DAG or circular refs.
Tip: Useref()only downstream. Inspect dependencies via the docs site or by simplifying references. - Docs site missing descriptions.
Tip: Adddescription:at both model and column levels in YAML. Re-rundbt docs generate.
Debug run strategies
- Use
dbt compileto view rendered SQL intarget/compiled. - Limit scope:
dbt build -s +model_name. - Force clean state with
--full-refreshfor incremental models.
Mini project: Sales Analytics Warehouse in dbt
Goal: Build a small, production-like pipeline with sources β staging β intermediate β marts, tests, docs, snapshots, and an exposure.
- Declare sources:
raw.orders,raw.customerswith freshness andloaded_at_field. - Staging models: Clean types, standardize names (
stg_orders,stg_customers). - Intermediate: Join customers to orders for enriched facts (
int_orders_enriched). - Marts: Create
fct_orders(incremental) anddim_customers(table). - Tests:
unique/not_nullon keys;relationshipsfrom facts to dims. - Snapshot: Track changes to
stg_customerswith a timestamp strategy. - Macro: A small transformation (e.g., country standardization) used in two models.
- Docs: Add descriptions for all models/columns and generate docs.
- Exposure: Register your BI dashboard reading from
fct_orders. - Deploy: Run in a dev target, then promote to prod by switching the target profile.
Success criteria
dbt buildcompletes with 0 errors and all tests passing.- Docs lineage shows source β staging β mart β exposure.
- A re-run with one new row only updates the incremental model minimally.
Practical projects (portfolio ready)
- Customer 360 mart: Unify customers, orders, tickets; incremental fact + dimensional model with >10 tests.
- Marketing attribution mart: Build sessionization and campaign attribution with macros and exposures.
- Finance marts: Monthly P&L model from journal entries with seeds for exchange rates and snapshots for chart of accounts.
Subskills
- Model Layers: Staging, Intermediate, Marts β Organize models into clear layers with naming standards and predictable ownership. Estimated: 45β90 min.
- Materializations: View, Table, Incremental β Choose the right materialization for performance and cost; implement incremental safely. Estimated: 60β120 min.
- Sources and Freshness β Declare sources in YAML, set
loaded_at_field, and monitor freshness. Estimated: 45β90 min. - Seeds and Snapshots β Load small CSVs as managed tables; track changing records over time. Estimated: 45β90 min.
- Macros and Jinja Basics β Reuse logic, reduce duplication, and render dynamic SQL safely. Estimated: 60β120 min.
- Refs and Dependency Graph β Use
ref()and selection syntax to control runs and understand lineage. Estimated: 45β90 min. - Exposures for BI Lineage β Register dashboards/reports to close the loop from data to decisions. Estimated: 30β60 min.
- Environments and Deployments β Work with dev/stage/prod targets and promote changes confidently. Estimated: 45β90 min.
- Documentation Generation in dbt β Add descriptions and build a docs site your stakeholders can trust. Estimated: 30β60 min.
- dbt Tests and Packages Basics β Add generic tests and extend dbt with community packages. Estimated: 60β120 min.
Next steps
- Refine incremental strategies (filters, partitioning, post-hooks).
- Add more relationship tests and data contracts to stabilize interfaces.
- Scale to a multi-environment deployment and automate
dbt buildon each change.
Skill exam
Ready to check your understanding? Take the exam below. Everyone can take it for free. Logged-in users have their progress saved.