luvv to helpDiscover the Best Free Online Tools
Topic 7 of 10

Sources And Freshness

Learn Sources And Freshness for free with explanations, exercises, and a quick test (for Analytics Engineer).

Published: December 23, 2025 | Updated: December 23, 2025

Who this is for

Analytics Engineers and BI Developers who use dbt to transform warehouse data and need reliable pipelines with clear SLAs for raw data arrival.

Prerequisites

  • Basic dbt project setup and profiles configured
  • Comfort with YAML and SQL
  • Ability to run dbt commands from your terminal

Why this matters

In real teams, business metrics break when raw data stops arriving or arrives late. dbt sources document and test your upstream raw tables, while freshness checks tell you if data is on time. You will:

  • Gate downstream model runs when raw data is stale
  • Alert stakeholders when data lags SLAs
  • Document raw lineage and ownership for fast troubleshooting
Typical tasks in the profession
  • Declare raw data tables as sources and apply tests
  • Set freshness thresholds (warn/error) and schedule checks
  • Reference sources correctly in models using the source() function
  • Optimize freshness checks with filters for partitioned data

Concept explained simply

A source in dbt is a raw table that dbt does not build but depends on. You declare it in YAML so you can reference it safely in models via {{ source('src_name','table_name') }}. A freshness check compares the most recent ingestion timestamp in that table to the current time. If it exceeds your thresholds, dbt warns or errors.

Mental model

  • Source YAML = contract for upstream data (where it lives, what it’s called, how fresh it should be)
  • loaded_at_field = the best column that indicates when data landed in the warehouse (ingestion time)
  • warn_after/error_after = SLA lines in the sand
  • Freshness run = a quick health check before building models

Worked examples

Example 1 — Basic source with freshness

version: 2
sources:
  - name: raw
    database: "{{ target.database }}"
    schema: raw
    tables:
      - name: orders
        description: Raw orders landed via ingestion service
        loaded_at_field: _ingested_at
        freshness:
          warn_after: {count: 12, period: hour}
          error_after: {count: 24, period: hour}

Run a check:

dbt source freshness --select source:raw.orders

If the newest _ingested_at is older than 12 hours, you get a warning; older than 24 hours, an error.

Example 2 — Filter for large/partitioned tables

For very large tables or late-arriving rows, limit the scan window so checks stay fast and fair:

freshness:
  warn_after: {count: 6, period: hour}
  error_after: {count: 12, period: hour}
  filter: "_ingested_at >= (current_timestamp - interval '3 day')"

The filter restricts the rows considered in the freshness calculation.

Example 3 — Using a source in a model

-- models/stg_orders.sql
with src as (
  select *
  from {{ source('raw', 'orders') }}
)
select * from src

Always use source() for raw tables (not ref()). This keeps lineage and documentation correct.

Small gotchas to watch
  • Use ingestion timestamps, not business event timestamps, for loaded_at_field
  • Time zones: pick a consistent convention; your warehouse time is the reference
  • Case sensitivity/quoting varies by adapter; align with how objects were created

How to implement (step-by-step)

  1. Create or update models/sources.yml with your source and tables
  2. Pick loaded_at_field that truly reflects when data landed
  3. Set warn_after and error_after to match business SLAs
  4. Optionally add filter to restrict scan window
  5. Reference sources in models via source()
  6. Schedule dbt source freshness to run before model builds
  7. Fail the job on freshness errors; notify on warnings as needed

Common mistakes and self-checks

  • Mistake: Using an event timestamp for loaded_at_field
    Fix: Use the ingestion/landing timestamp.
  • Mistake: Thresholds too strict, constant false alarms
    Fix: Start with realistic SLAs; tune with historical arrival patterns.
  • Mistake: Forgetting to run freshness checks in CI/scheduler
    Fix: Add a pre-step job: dbt source freshness.
  • Mistake: No filter on massive partitions, slow checks
    Fix: Add a rolling-window filter (e.g., last 3–7 days).
  • Mistake: Hardcoding database/schema that differ by environment
    Fix: Use {{ target.database }} and environment-appropriate schema configs.
Self-check
  • Does your source YAML compile? Run dbt parse.
  • Does freshness run within SLA time? If not, adjust thresholds or filter.
  • Do models use source() (not ref()) for raw tables?

Exercises

Complete these to solidify the skill. The same tasks are listed below with solutions hidden. You can take the Quick Test afterwards. Note: Anyone can take the test; only signed-in users will see saved progress.

Exercise 1 — Declare a source with freshness

Create models/sources.yml declaring a source raw with tables orders and customers. Use _ingested_at as loaded_at_field. Set warn at 12 hours and error at 24 hours. Then run a freshness check for orders.

  • Expected: The YAML compiles; running freshness returns success, warning, or error depending on your data.
Show solution
version: 2
sources:
  - name: raw
    database: "{{ target.database }}"
    schema: raw
    tables:
      - name: orders
        loaded_at_field: _ingested_at
        freshness:
          warn_after: {count: 12, period: hour}
          error_after: {count: 24, period: hour}
      - name: customers
        loaded_at_field: _ingested_at
        freshness:
          warn_after: {count: 12, period: hour}
          error_after: {count: 24, period: hour}
dbt source freshness --select source:raw.orders

Exercise 2 — Add a filter for performance

For table orders, add a freshness filter to limit the scan to 3 days. Keep thresholds at warn 6h / error 12h. Run the freshness check again.

  • Expected: The check completes quickly and only evaluates rows in the last 3 days.
Show solution
version: 2
sources:
  - name: raw
    database: "{{ target.database }}"
    schema: raw
    tables:
      - name: orders
        loaded_at_field: _ingested_at
        freshness:
          warn_after: {count: 6, period: hour}
          error_after: {count: 12, period: hour}
          filter: "_ingested_at >= (current_timestamp - interval '3 day')"
dbt source freshness --select source:raw.orders

Checklist before you move on

  • I can declare sources in YAML and reference them with source()
  • I can set realistic warn_after and error_after thresholds
  • I can use filter to keep checks fast and fair
  • I can schedule and interpret dbt source freshness results

Practical projects

  • Project 1: Add freshness checks to two critical sources; set warn/error thresholds aligned with stakeholders; create a notification rule in your scheduler.
  • Project 2: Build a staging model layer that references only source() nodes; ensure every model depends on a freshness check pre-step.
  • Project 3: Create a small dashboard showing source freshness status over time using your BI tool of choice.

Learning path

  • Before: Models and the ref() function; Configs and materializations
  • Now: Sources and Freshness
  • Next: Tests and documentation; Exposures and SLAs; Scheduling and CI

Next steps

  • Set up a scheduled job: freshness check, then model run
  • Share SLAs with stakeholders and adjust thresholds after a week of observations
  • Take the Quick Test below to confirm mastery

Mini challenge

Your events source lands hourly but occasionally backfills last week’s rows. Propose a freshness.filter and thresholds that avoid false errors while still catching outages within 2 hours. Write the YAML snippet and one sentence explaining your choice.

Practice Exercises

2 exercises to complete

Instructions

Create models/sources.yml declaring a source named raw with tables orders and customers. Use _ingested_at as loaded_at_field. Set warn at 12h and error at 24h. Run dbt source freshness for orders.

Expected Output
YAML compiles; freshness run returns success/warn/error depending on data latency.

Sources And Freshness — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Sources And Freshness?

AI Assistant

Ask questions about this tool