How to learn Seeds And Snapshots for dbt Concepts in Analytics Engineer for free

Why this matters

As an Analytics Engineer, you often need small, trusted reference tables (like country codes or marketing channels) and reliable change history for key entities (like customers or products). dbt seeds load static CSVs to your warehouse, while snapshots capture how records change over time. Together, they make your transformations reproducible and your analytics auditable.

Keep business logic centralized: load small lookup tables once, reference everywhere.
Answer time-travel questions: what did we know about a customer at a past date?
Audit and compliance: track history of critical attributes safely.

Who this is for

Analytics Engineers and BI Developers using dbt.
Data Analysts who need stable reference data and historical tracking.
Data Engineers standardizing ELT pipelines with dbt.

Prerequisites

Basic dbt project set up (profiles, target warehouse, dbt run/compile).
Comfort with SQL SELECTs and refs.
Understanding of schemas, tables, and column types in your warehouse.

Concept explained simply

Seeds (simple definition)

Seeds are CSV files you place in your dbt project (typically in the data/ folder). Running "dbt seed" loads each CSV as a table in your warehouse. You can then reference that table in models with ref('seed_name'). They’re ideal for small, relatively static datasets like mappings or code lists.

Snapshots (simple definition)

Snapshots store historical versions of records. You define a snapshot with a strategy (timestamp or check), a unique_key, and a source query. Running "dbt snapshot" creates/updates a table that records when each version was valid (dbt_valid_from, dbt_valid_to) and related metadata. Use them for Slowly Changing Dimensions (SCD Type 2–like behavior).

Mental model

Seeds: think "tiny, trusted CSVs become warehouse tables". Load once, reuse everywhere.
Snapshots: think "versioned ledger of rows". Every change creates a new version with validity windows.

How it works (key points)

Seeds — essentials

Files live under data/ (default). The file name becomes the relation name.
Run with: dbt seed. To force recreation, use: dbt seed --full-refresh.
Configure in dbt_project.yml under seeds:, including schema, quoting, and column_types.
Reference in models with: {{ ref('your_seed_name') }}.
Great for reference data (channels, region mappings, SLA thresholds, holiday calendars).

Snapshots — essentials

Snapshot SQL lives under snapshots/ and uses {% snapshot %} blocks.
Requires unique_key and strategy.
Two strategies:
- timestamp: uses updated_at column; when it increases, a new version is recorded.
- check: compares specified columns (check_cols). Changes create new versions.
Run with: dbt snapshot.
Output includes dbt_valid_from, dbt_valid_to, dbt_scd_id, and dbt_updated_at.

Worked examples

Example 1 — Seed a marketing channel map

Create data/dim_channel.csv with columns: channel_code, channel_name and a few rows (e.g., "pa, Paid Ads").
Run dbt seed. A warehouse table (e.g., dim_channel) is created.
Use in a model: SELECT s.channel_name, COUNT(*) FROM {{ ref('dim_channel') }} s JOIN events e ON e.channel=s.channel_code GROUP BY 1.

Result: human-readable channels without hardcoding in SQL.

Example 2 — Enforce seed column types

In dbt_project.yml add under seeds: your_project: dim_channel: +column_types: channel_code: varchar(10), channel_name: varchar(100).
Run dbt seed to load with consistent types across environments.

Result: avoids implicit type surprises and join mismatches.

Example 3 — Snapshot customers with check strategy

Create snapshots/customers.sql using a snapshot block with strategy='check', unique_key='customer_id', check_cols=['email','address'].
Source data from {{ ref('stg_customers') }}.
Run dbt snapshot. Changes to email or address create a new version with dbt_valid_from/To timestamps.

Result: full change history for downstream SCD reporting.

Example 4 — Snapshot with timestamp strategy

Use updated_at column (monotonically increasing) and strategy='timestamp'.
Run dbt snapshot; when updated_at advances, dbt writes a new record version.

Result: simpler setup when a trusted update timestamp exists.

Learning path

Load your first seed and reference it in one model.
Add tests for seed data (e.g., not_null, unique).
Create a simple snapshot with timestamp strategy.
Evolve to check strategy for selected columns.
Integrate snapshots into marts for time-travel queries.

Exercises

Do these in a sample or your existing dbt project. If you’re not logged in, your quiz progress won’t be saved—but the test is available to everyone.

Exercise ex1 — Seed a lookup and use it

Create data/dim_channel.csv with rows: pa,Paid Ads; em,Email; so,Social.
Run dbt seed.
Create models/join_channels.sql joining events.channel to dim_channel.channel_code and counting by channel_name.
Run dbt run and inspect the result table.

Tip: configuration

Optionally declare seeds: column_types in dbt_project.yml to enforce VARCHAR sizes.

Exercise ex2 — Snapshot customer changes

Create snapshots/customers_snapshot.sql using check strategy with unique_key customer_id and check_cols ['email','address'].
Query from {{ ref('stg_customers') }}.
Run dbt snapshot. Update a record upstream, re-run dbt snapshot, and observe new version rows.

Tip: where to store

Set target_schema='snapshots' in snapshot config to keep history tables separate from models.

Exercise checklist

Seed loaded without errors
Model references seed via {{ ref('dim_channel') }}
Snapshot table created with dbt_valid_from and dbt_valid_to
New snapshot version appears after a source change

Common mistakes and self-check

Placing seeds outside data/: self-check by running dbt ls --resource-type seed (should list your file).
Forgetting column types: if joins fail or cast errors appear, define seeds: column_types and re-run dbt seed.
Using check strategy without specifying check_cols: dbt will error. Add check_cols or switch to timestamp.
Missing unique_key in snapshots: required for deduplicating versions.
Referencing seeds with raw table names: use ref('seed_name') for portability.

Practical projects

Reference data pack: create seeds for countries, currencies, and channels. Add tests for uniqueness and not null.
Customer SCD mart: build a snapshot of customers and a mart that reports current vs historical values by date.
Price history audit: snapshot product prices and visualize changes per product over time.

Next steps

Parameterize snapshot schemas per environment (dev/test/prod).
Add CI checks: ensure dbt seed and dbt snapshot run in your pipeline.
Document seeds and snapshot fields in your docs blocks for clarity.

Mini challenge

Create a seed called dim_currency with ISO code and symbol, and a snapshot for product base_price changes using check strategy on base_price. Prove it works by changing a price upstream and showing two versions for that product in the snapshot output.

Menu

Seeds And Snapshots

Table of Contents

Why this matters

Who this is for

Prerequisites

Concept explained simply

Mental model

How it works (key points)

Worked examples

Learning path

Exercises

Exercise ex1 — Seed a lookup and use it

Exercise ex2 — Snapshot customer changes

Exercise checklist

Common mistakes and self-check

Practical projects

Next steps

Mini challenge

Practice Exercises

Seed a lookup and use it in a join

Instructions

Expected Output

Snapshot customer changes with check strategy

Seeds And Snapshots — Quick Test

Have questions about Seeds And Snapshots?

AI Assistant