luvv to helpDiscover the Best Free Online Tools
Topic 3 of 8

Unit Tests For Transformations

Learn Unit Tests For Transformations for free with explanations, exercises, and a quick test (for Analytics Engineer).

Published: December 23, 2025 | Updated: December 23, 2025

Why this matters

Analytics Engineers ship business logic as code. A small bug in a SQL transformation can silently skew KPIs, marketing spend, and decision-making. Unit tests for transformations let you validate logic on small, controlled inputs before data reaches production models.

  • Real tasks: deduplicate events correctly, compute metrics consistently, map categories with edge cases, and safely change logic without breaking downstream dashboards.
  • Outcomes: faster refactors, fewer regressions, and confidence to ship changes.

Concept explained simply

A unit test checks one transformation in isolation. You feed it a tiny, known input and assert the output matches expectations. Think of it as checking a recipe with a small sample before cooking for a crowd.

Mental model

  • Treat the transformation as a pure function: Output = Transform(Input).
  • Arrange β†’ Act β†’ Assert (AAA): Arrange fixtures (tiny tables), Act by running your SQL/logic, Assert results equal expected.
  • Deterministic and fast: fixed timestamps, no randomness, small data.
Typical unit test skeleton (SQL idea)
-- Arrange: create tiny input and expected tables (CTEs or temp tables)
with input as (
  select 1 as id, 'A' as status, '2024-01-01'::date as dt union all
  select 1, 'B', '2024-01-02'::date
),
expected as (
  select 1 as id, 'B' as status, '2024-01-02'::date as dt
),
-- Act: apply the transformation
actual as (
  select * from (
    select *, row_number() over(partition by id order by dt desc) as rn
    from input
  ) t where rn = 1
)
-- Assert: no differences between actual and expected
select count(*) as diff_count
from (
  select * from actual
  except
  select * from expected
  union all
  select * from expected
  except
  select * from actual
) d;
-- Test passes if diff_count = 0

Core patterns and templates

  • Equality of result sets: use EXCEPT (or MINUS) in both directions; assert zero rows.
  • Row-level assertions: sum/count comparisons for metrics; assert exact values.
  • Window logic: verify partitioning and ordering produce the intended row.
  • Type and rounding: cast to precise numeric and round before compare.
  • Time: freeze "today" via parameters or constants; avoid non-deterministic functions.
Template: symmetric difference assertion
with actual as (...), expected as (...)
select case when exists (
  (select * from actual except select * from expected)
  union all
  (select * from expected except select * from actual)
) then 'FAIL' else 'PASS' end as result;
Template: numeric tolerance check
with actual as (...), expected as (...)
select case when max(abs(a.val - e.val)) <= 0.00001 then 'PASS' else 'FAIL' end as result
from actual a
join expected e using(key);

Worked examples

Example 1 β€” Deduplicate to latest record

Goal: keep the most recent row per id.

with input as (
  select 10 id, 'pending' status, timestamp '2024-03-01 09:00' ts union all
  select 10, 'paid',    timestamp '2024-03-01 09:05' union all
  select 11, 'paid',    timestamp '2024-03-01 10:00'
),
expected as (
  select 10 id, 'paid' status, timestamp '2024-03-01 09:05' ts union all
  select 11, 'paid',    timestamp '2024-03-01 10:00'
),
actual as (
  select id, status, ts from (
    select *, row_number() over(partition by id order by ts desc) rn from input
  ) t where rn = 1
)
select count(*) as diffs from (
  select * from actual except select * from expected
  union all
  select * from expected except select * from actual
) d; -- expect 0

Example 2 β€” Bucket amounts with edge cases

Goal: map amounts to size buckets with rules: amount < 0 β†’ 'refund', 0–99.99 β†’ 'small', 100–499.99 β†’ 'medium', β‰₯ 500 β†’ 'large'.

with input as (
  select 1 id, -5.00 amount union all
  select 2, 0.00 union all
  select 3, 100.00 union all
  select 4, 500.00
),
expected as (
  select 1 id, 'refund' bucket union all
  select 2, 'small' union all
  select 3, 'medium' union all
  select 4, 'large'
),
actual as (
  select id,
         case
           when amount < 0 then 'refund'
           when amount < 100 then 'small'
           when amount < 500 then 'medium'
           else 'large'
         end as bucket
  from input
)
select count(*) as diffs from (
  select * from actual except select * from expected
  union all
  select * from expected except select * from actual
) d; -- expect 0

Example 3 β€” Compute daily revenue with rounding

Goal: convert cents to dollars and round to 2 decimals, grouped by date.

with input as (
  select date '2024-06-01' as dt, 12345 as cents union all
  select date '2024-06-01', 55 union all
  select date '2024-06-02', 100
),
expected as (
  select date '2024-06-01' dt, cast(123.45 + 0.55 as numeric(12,2)) revenue_usd union all
  select date '2024-06-02', cast(1.00 as numeric(12,2))
),
actual as (
  select dt, cast(round(sum(cents)/100.0, 2) as numeric(12,2)) as revenue_usd
  from input group by dt
)
select count(*) as diffs from (
  select * from actual except select * from expected
  union all
  select * from expected except select * from actual
) d; -- expect 0

Who this is for

  • Analytics Engineers building SQL transformations and models.
  • BI Developers who maintain business logic in views or marts.
  • Data Analysts who want safer refactors to shared logic.

Prerequisites

  • Comfortable with SQL (SELECT, JOIN, GROUP BY, window functions).
  • Basic understanding of your warehouse data types and casting.
  • Familiarity with how your transformations are executed in your stack.

Learning path

  1. Start with simple equality checks between actual and expected result sets.
  2. Add edge cases: nulls, empty sets, duplicates, boundary dates and amounts.
  3. Introduce deterministic parameters for time and randomness.
  4. Refactor transformations to be more testable (smaller, composable queries/CTEs).
  5. Automate tests in your CI process.

Exercises

These mirror the exercises below. Build tiny inputs, run the transform, assert equality with expected. Keep each result set under ~10 rows.

Exercise 1 β€” Keep the latest status per order

Create a unit test that ensures your deduplication keeps the most recent status per order_id by updated_at timestamp.

See input and expected
-- Input rows
order_id | status   | updated_at
---------+----------+---------------------
101      | created  | 2024-04-01 08:00:00
101      | paid     | 2024-04-01 08:05:00
102      | shipped  | 2024-04-02 09:00:00

-- Expected rows
order_id | status   | updated_at
---------+----------+---------------------
101      | paid     | 2024-04-01 08:05:00
102      | shipped  | 2024-04-02 09:00:00

Exercise 2 β€” Category mapping with refunds

Write a unit test for a transformation that assigns categories: amount < 0 β†’ 'refund', 0–49.99 β†’ 'low', 50–199.99 β†’ 'mid', β‰₯ 200 β†’ 'high'.

See input and expected
-- Input rows
id | amount
---+--------
1  | -10.00
2  | 0.00
3  | 49.99
4  | 50.00
5  | 199.99
6  | 200.00

-- Expected rows
id | category
---+----------
1  | refund
2  | low
3  | low
4  | mid
5  | mid
6  | high

Exercise checklist

  • Arrange: Defined tiny input and expected result CTEs.
  • Act: Applied the transformation exactly once.
  • Assert: Used two-way EXCEPT to check equality.
  • Deterministic: No NOW(), RAND(), or non-fixed parameters.
  • Edge cases included: nulls, zero/negative, ties for ordering if relevant.

Common mistakes and self-check

  • Testing too much at once: If the test is long and fragile, split into smaller units.
  • Non-deterministic inputs: Replace NOW() with a fixed date; seed randomness.
  • Ignoring type/rounding: Cast decimals and round before compare.
  • Missing bidirectional check: One-way EXCEPT misses extra rows; always use both directions.
  • Skipping edge cases: Include nulls, duplicates, and boundary values.
Self-check prompt
  • If the input changed slightly (extra duplicate), would your test catch a regression?
  • Can you explain the rule your test enforces in one sentence?
  • Does the test fail if you intentionally break the logic?

Practical projects

  • Project A: Build a unit-tested customer lifecycle transformation (lead β†’ signup β†’ paid) with tie-breaking and date windows.
  • Project B: Create a unit-tested daily financials model: currency conversion, rounding, tax, and refunds.
  • Project C: Unit-test event sessionization using window functions, handling out-of-order events.

Mini challenge

Your metric model sums revenue for the last full calendar month. Write a unit test that fixes the reference date to 2024-08-15 and asserts the model aggregates 2024-07-01 to 2024-07-31 only, with the expected total of 12,345.67.

Quick Test note: The quick test is available to everyone. Only logged-in users will see saved progress.

Next steps

  • Turn these patterns into reusable test templates in your project.
  • Add unit tests for the riskiest transformations first (metrics and dedupes).
  • Run the Quick Test to check your understanding and lock in the concepts.

Practice Exercises

2 exercises to complete

Instructions

Write a SQL unit test that keeps only the most recent row per order_id by updated_at. Use AAA: create input and expected CTEs, transform with a window function, and assert equality with two-way EXCEPT.

-- Input
order_id | status   | updated_at
101      | created  | 2024-04-01 08:00:00
101      | paid     | 2024-04-01 08:05:00
102      | shipped  | 2024-04-02 09:00:00

-- Expected
order_id | status   | updated_at
101      | paid     | 2024-04-01 08:05:00
102      | shipped  | 2024-04-02 09:00:00
Expected Output
A single-row result showing diff_count = 0 (or PASS) after comparing actual to expected.

Unit Tests For Transformations β€” Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Unit Tests For Transformations?

AI Assistant

Ask questions about this tool