Why this matters
As an Analytics Engineer, you often need small, trusted reference tables (like country codes or marketing channels) and reliable change history for key entities (like customers or products). dbt seeds load static CSVs to your warehouse, while snapshots capture how records change over time. Together, they make your transformations reproducible and your analytics auditable.
- Keep business logic centralized: load small lookup tables once, reference everywhere.
- Answer time-travel questions: what did we know about a customer at a past date?
- Audit and compliance: track history of critical attributes safely.
Who this is for
- Analytics Engineers and BI Developers using dbt.
- Data Analysts who need stable reference data and historical tracking.
- Data Engineers standardizing ELT pipelines with dbt.
Prerequisites
- Basic dbt project set up (profiles, target warehouse, dbt run/compile).
- Comfort with SQL SELECTs and refs.
- Understanding of schemas, tables, and column types in your warehouse.
Concept explained simply
Seeds (simple definition)
Seeds are CSV files you place in your dbt project (typically in the data/ folder). Running "dbt seed" loads each CSV as a table in your warehouse. You can then reference that table in models with ref('seed_name'). They’re ideal for small, relatively static datasets like mappings or code lists.
Snapshots (simple definition)
Snapshots store historical versions of records. You define a snapshot with a strategy (timestamp or check), a unique_key, and a source query. Running "dbt snapshot" creates/updates a table that records when each version was valid (dbt_valid_from, dbt_valid_to) and related metadata. Use them for Slowly Changing Dimensions (SCD Type 2–like behavior).
Mental model
- Seeds: think "tiny, trusted CSVs become warehouse tables". Load once, reuse everywhere.
- Snapshots: think "versioned ledger of rows". Every change creates a new version with validity windows.
How it works (key points)
Seeds — essentials
- Files live under data/ (default). The file name becomes the relation name.
- Run with: dbt seed. To force recreation, use: dbt seed --full-refresh.
- Configure in dbt_project.yml under seeds:, including schema, quoting, and column_types.
- Reference in models with: {{ ref('your_seed_name') }}.
- Great for reference data (channels, region mappings, SLA thresholds, holiday calendars).
Snapshots — essentials
- Snapshot SQL lives under snapshots/ and uses {% snapshot %} blocks.
- Requires unique_key and strategy.
- Two strategies:
- timestamp: uses updated_at column; when it increases, a new version is recorded.
- check: compares specified columns (check_cols). Changes create new versions.
- Run with: dbt snapshot.
- Output includes dbt_valid_from, dbt_valid_to, dbt_scd_id, and dbt_updated_at.
Worked examples
Example 1 — Seed a marketing channel map
- Create data/dim_channel.csv with columns: channel_code, channel_name and a few rows (e.g., "pa, Paid Ads").
- Run dbt seed. A warehouse table (e.g., dim_channel) is created.
- Use in a model: SELECT s.channel_name, COUNT(*) FROM {{ ref('dim_channel') }} s JOIN events e ON e.channel=s.channel_code GROUP BY 1.
Result: human-readable channels without hardcoding in SQL.
Example 2 — Enforce seed column types
- In dbt_project.yml add under seeds: your_project: dim_channel: +column_types: channel_code: varchar(10), channel_name: varchar(100).
- Run dbt seed to load with consistent types across environments.
Result: avoids implicit type surprises and join mismatches.
Example 3 — Snapshot customers with check strategy
- Create snapshots/customers.sql using a snapshot block with strategy='check', unique_key='customer_id', check_cols=['email','address'].
- Source data from {{ ref('stg_customers') }}.
- Run dbt snapshot. Changes to email or address create a new version with dbt_valid_from/To timestamps.
Result: full change history for downstream SCD reporting.
Example 4 — Snapshot with timestamp strategy
- Use updated_at column (monotonically increasing) and strategy='timestamp'.
- Run dbt snapshot; when updated_at advances, dbt writes a new record version.
Result: simpler setup when a trusted update timestamp exists.
Learning path
- Load your first seed and reference it in one model.
- Add tests for seed data (e.g., not_null, unique).
- Create a simple snapshot with timestamp strategy.
- Evolve to check strategy for selected columns.
- Integrate snapshots into marts for time-travel queries.
Exercises
Do these in a sample or your existing dbt project. If you’re not logged in, your quiz progress won’t be saved—but the test is available to everyone.
Exercise ex1 — Seed a lookup and use it
- Create data/dim_channel.csv with rows: pa,Paid Ads; em,Email; so,Social.
- Run dbt seed.
- Create models/join_channels.sql joining events.channel to dim_channel.channel_code and counting by channel_name.
- Run dbt run and inspect the result table.
Tip: configuration
Optionally declare seeds: column_types in dbt_project.yml to enforce VARCHAR sizes.
Exercise ex2 — Snapshot customer changes
- Create snapshots/customers_snapshot.sql using check strategy with unique_key customer_id and check_cols ['email','address'].
- Query from {{ ref('stg_customers') }}.
- Run dbt snapshot. Update a record upstream, re-run dbt snapshot, and observe new version rows.
Tip: where to store
Set target_schema='snapshots' in snapshot config to keep history tables separate from models.
Exercise checklist
Common mistakes and self-check
- Placing seeds outside data/: self-check by running dbt ls --resource-type seed (should list your file).
- Forgetting column types: if joins fail or cast errors appear, define seeds: column_types and re-run dbt seed.
- Using check strategy without specifying check_cols: dbt will error. Add check_cols or switch to timestamp.
- Missing unique_key in snapshots: required for deduplicating versions.
- Referencing seeds with raw table names: use ref('seed_name') for portability.
Practical projects
- Reference data pack: create seeds for countries, currencies, and channels. Add tests for uniqueness and not null.
- Customer SCD mart: build a snapshot of customers and a mart that reports current vs historical values by date.
- Price history audit: snapshot product prices and visualize changes per product over time.
Next steps
- Parameterize snapshot schemas per environment (dev/test/prod).
- Add CI checks: ensure dbt seed and dbt snapshot run in your pipeline.
- Document seeds and snapshot fields in your docs blocks for clarity.
Mini challenge
Create a seed called dim_currency with ISO code and symbol, and a snapshot for product base_price changes using check strategy on base_price. Prove it works by changing a price upstream and showing two versions for that product in the snapshot output.