Why this matters
Analytics Engineers ship business logic as code. A small bug in a SQL transformation can silently skew KPIs, marketing spend, and decision-making. Unit tests for transformations let you validate logic on small, controlled inputs before data reaches production models.
- Real tasks: deduplicate events correctly, compute metrics consistently, map categories with edge cases, and safely change logic without breaking downstream dashboards.
- Outcomes: faster refactors, fewer regressions, and confidence to ship changes.
Concept explained simply
A unit test checks one transformation in isolation. You feed it a tiny, known input and assert the output matches expectations. Think of it as checking a recipe with a small sample before cooking for a crowd.
Mental model
- Treat the transformation as a pure function: Output = Transform(Input).
- Arrange β Act β Assert (AAA): Arrange fixtures (tiny tables), Act by running your SQL/logic, Assert results equal expected.
- Deterministic and fast: fixed timestamps, no randomness, small data.
Typical unit test skeleton (SQL idea)
-- Arrange: create tiny input and expected tables (CTEs or temp tables)
with input as (
select 1 as id, 'A' as status, '2024-01-01'::date as dt union all
select 1, 'B', '2024-01-02'::date
),
expected as (
select 1 as id, 'B' as status, '2024-01-02'::date as dt
),
-- Act: apply the transformation
actual as (
select * from (
select *, row_number() over(partition by id order by dt desc) as rn
from input
) t where rn = 1
)
-- Assert: no differences between actual and expected
select count(*) as diff_count
from (
select * from actual
except
select * from expected
union all
select * from expected
except
select * from actual
) d;
-- Test passes if diff_count = 0
Core patterns and templates
- Equality of result sets: use EXCEPT (or MINUS) in both directions; assert zero rows.
- Row-level assertions: sum/count comparisons for metrics; assert exact values.
- Window logic: verify partitioning and ordering produce the intended row.
- Type and rounding: cast to precise numeric and round before compare.
- Time: freeze "today" via parameters or constants; avoid non-deterministic functions.
Template: symmetric difference assertion
with actual as (...), expected as (...)
select case when exists (
(select * from actual except select * from expected)
union all
(select * from expected except select * from actual)
) then 'FAIL' else 'PASS' end as result;
Template: numeric tolerance check
with actual as (...), expected as (...)
select case when max(abs(a.val - e.val)) <= 0.00001 then 'PASS' else 'FAIL' end as result
from actual a
join expected e using(key);
Worked examples
Example 1 β Deduplicate to latest record
Goal: keep the most recent row per id.
with input as (
select 10 id, 'pending' status, timestamp '2024-03-01 09:00' ts union all
select 10, 'paid', timestamp '2024-03-01 09:05' union all
select 11, 'paid', timestamp '2024-03-01 10:00'
),
expected as (
select 10 id, 'paid' status, timestamp '2024-03-01 09:05' ts union all
select 11, 'paid', timestamp '2024-03-01 10:00'
),
actual as (
select id, status, ts from (
select *, row_number() over(partition by id order by ts desc) rn from input
) t where rn = 1
)
select count(*) as diffs from (
select * from actual except select * from expected
union all
select * from expected except select * from actual
) d; -- expect 0
Example 2 β Bucket amounts with edge cases
Goal: map amounts to size buckets with rules: amount < 0 β 'refund', 0β99.99 β 'small', 100β499.99 β 'medium', β₯ 500 β 'large'.
with input as (
select 1 id, -5.00 amount union all
select 2, 0.00 union all
select 3, 100.00 union all
select 4, 500.00
),
expected as (
select 1 id, 'refund' bucket union all
select 2, 'small' union all
select 3, 'medium' union all
select 4, 'large'
),
actual as (
select id,
case
when amount < 0 then 'refund'
when amount < 100 then 'small'
when amount < 500 then 'medium'
else 'large'
end as bucket
from input
)
select count(*) as diffs from (
select * from actual except select * from expected
union all
select * from expected except select * from actual
) d; -- expect 0
Example 3 β Compute daily revenue with rounding
Goal: convert cents to dollars and round to 2 decimals, grouped by date.
with input as (
select date '2024-06-01' as dt, 12345 as cents union all
select date '2024-06-01', 55 union all
select date '2024-06-02', 100
),
expected as (
select date '2024-06-01' dt, cast(123.45 + 0.55 as numeric(12,2)) revenue_usd union all
select date '2024-06-02', cast(1.00 as numeric(12,2))
),
actual as (
select dt, cast(round(sum(cents)/100.0, 2) as numeric(12,2)) as revenue_usd
from input group by dt
)
select count(*) as diffs from (
select * from actual except select * from expected
union all
select * from expected except select * from actual
) d; -- expect 0
Who this is for
- Analytics Engineers building SQL transformations and models.
- BI Developers who maintain business logic in views or marts.
- Data Analysts who want safer refactors to shared logic.
Prerequisites
- Comfortable with SQL (SELECT, JOIN, GROUP BY, window functions).
- Basic understanding of your warehouse data types and casting.
- Familiarity with how your transformations are executed in your stack.
Learning path
- Start with simple equality checks between actual and expected result sets.
- Add edge cases: nulls, empty sets, duplicates, boundary dates and amounts.
- Introduce deterministic parameters for time and randomness.
- Refactor transformations to be more testable (smaller, composable queries/CTEs).
- Automate tests in your CI process.
Exercises
These mirror the exercises below. Build tiny inputs, run the transform, assert equality with expected. Keep each result set under ~10 rows.
Exercise 1 β Keep the latest status per order
Create a unit test that ensures your deduplication keeps the most recent status per order_id by updated_at timestamp.
See input and expected
-- Input rows
order_id | status | updated_at
---------+----------+---------------------
101 | created | 2024-04-01 08:00:00
101 | paid | 2024-04-01 08:05:00
102 | shipped | 2024-04-02 09:00:00
-- Expected rows
order_id | status | updated_at
---------+----------+---------------------
101 | paid | 2024-04-01 08:05:00
102 | shipped | 2024-04-02 09:00:00
Exercise 2 β Category mapping with refunds
Write a unit test for a transformation that assigns categories: amount < 0 β 'refund', 0β49.99 β 'low', 50β199.99 β 'mid', β₯ 200 β 'high'.
See input and expected
-- Input rows
id | amount
---+--------
1 | -10.00
2 | 0.00
3 | 49.99
4 | 50.00
5 | 199.99
6 | 200.00
-- Expected rows
id | category
---+----------
1 | refund
2 | low
3 | low
4 | mid
5 | mid
6 | high
Exercise checklist
- Arrange: Defined tiny input and expected result CTEs.
- Act: Applied the transformation exactly once.
- Assert: Used two-way EXCEPT to check equality.
- Deterministic: No NOW(), RAND(), or non-fixed parameters.
- Edge cases included: nulls, zero/negative, ties for ordering if relevant.
Common mistakes and self-check
- Testing too much at once: If the test is long and fragile, split into smaller units.
- Non-deterministic inputs: Replace NOW() with a fixed date; seed randomness.
- Ignoring type/rounding: Cast decimals and round before compare.
- Missing bidirectional check: One-way EXCEPT misses extra rows; always use both directions.
- Skipping edge cases: Include nulls, duplicates, and boundary values.
Self-check prompt
- If the input changed slightly (extra duplicate), would your test catch a regression?
- Can you explain the rule your test enforces in one sentence?
- Does the test fail if you intentionally break the logic?
Practical projects
- Project A: Build a unit-tested customer lifecycle transformation (lead β signup β paid) with tie-breaking and date windows.
- Project B: Create a unit-tested daily financials model: currency conversion, rounding, tax, and refunds.
- Project C: Unit-test event sessionization using window functions, handling out-of-order events.
Mini challenge
Your metric model sums revenue for the last full calendar month. Write a unit test that fixes the reference date to 2024-08-15 and asserts the model aggregates 2024-07-01 to 2024-07-31 only, with the expected total of 12,345.67.
Quick Test note: The quick test is available to everyone. Only logged-in users will see saved progress.
Next steps
- Turn these patterns into reusable test templates in your project.
- Add unit tests for the riskiest transformations first (metrics and dedupes).
- Run the Quick Test to check your understanding and lock in the concepts.