Why this matters
As an Analytics Engineer, you translate business logic into reliable datasets. Custom SQL tests catch violations of business rules before they hit dashboards or models. Typical real tasks include ensuring only one active subscription per user, preventing shipments before payment, and validating that refunds zero out recognized revenue.
Who this is for
- Analytics Engineers and BI Developers maintaining data models
- Data Engineers adding data quality checks
- Analysts who codify metric definitions as tests
Prerequisites
- Comfort with SQL joins, aggregations, and window functions
- Basic understanding of your business domain (orders, customers, subscriptions)
- Familiarity with how tests are executed in your environment (CI jobs, schedulers)
Concept explained simply
A custom SQL test is a query that returns zero rows when your business rule is respected. Any returned row is a violation.
Mental model
- Rule as a filter: Write SQL that selects only the bad cases.
- Return evidence: Include keys and a readable reason column.
- Zero means pass: The goal is an empty result set.
Example mental translation
Business rule: A user can have at most one active subscription.
Test translation: Select users where active_subscriptions_count > 1 during the same time window.
Core patterns for SQL business-rule tests
- Uniqueness/invariants: At most one primary value per entity.
- Temporal logic: Event B cannot happen before Event A.
- Conditional logic: If X then Y must be true.
- Range and bounded values: Amounts non-negative unless a specific type.
- Referential consistency: Every fact references a known dimension entity and valid status.
How to structure a test query
- Use CTEs to isolate inputs and simplify logic.
- Build a final SELECT that returns only violations.
- Include columns: entity_id, rule_name, reason, helpful context (dates, amounts).
Worked examples
1) Only one active subscription per user
-- Returns users with overlapping active subscriptions (violation)
WITH subs AS (
SELECT user_id, start_date, end_date
FROM dim_subscriptions
WHERE status = 'active'
),
windows AS (
SELECT s1.user_id,
s1.start_date AS s1_start,
COALESCE(s1.end_date, DATE '2999-12-31') AS s1_end,
s2.start_date AS s2_start,
COALESCE(s2.end_date, DATE '2999-12-31') AS s2_end
FROM subs s1
JOIN subs s2
ON s1.user_id = s2.user_id
AND s1.start_date <= COALESCE(s2.end_date, DATE '2999-12-31')
AND s2.start_date <= COALESCE(s1.end_date, DATE '2999-12-31')
AND (s1.start_date, COALESCE(s1.end_date, DATE '2999-12-31')) <> (s2.start_date, COALESCE(s2.end_date, DATE '2999-12-31'))
)
SELECT DISTINCT user_id,
'one_active_subscription' AS rule_name,
'Overlapping active subscriptions' AS reason
FROM windows;
2) Shipment cannot precede payment
-- Returns orders where shipped_at < paid_at OR missing paid_at when shipped
SELECT o.order_id,
'ship_after_payment' AS rule_name,
CASE
WHEN o.paid_at IS NULL THEN 'Missing paid_at'
WHEN o.shipped_at < o.paid_at THEN 'Shipped before paid'
END AS reason,
o.paid_at, o.shipped_at
FROM fct_orders o
WHERE o.shipped_at IS NOT NULL
AND (o.paid_at IS NULL OR o.shipped_at < o.paid_at);
3) Refunded orders must have zero recognized revenue
-- Returns orders that have refunds but still recognize revenue
WITH refunds AS (
SELECT order_id
FROM fct_refunds
GROUP BY order_id
)
SELECT r.order_id,
'refund_zero_revenue' AS rule_name,
'Refund exists but recognized_revenue > 0' AS reason,
o.recognized_revenue
FROM refunds r
JOIN fct_order_revenue o ON o.order_id = r.order_id
WHERE o.recognized_revenue > 0;
Why these work
- Each query isolates violations only.
- Returned columns help debug issues quickly.
- They are deterministic and easy to reason about.
Designing robust tests
- Scope to fresh data when possible (e.g., last 30 days) to keep tests fast.
- Use stable time boundaries (e.g., date_trunc to day) to avoid flaky results.
- Prefer exact checks over heuristic thresholds for core rules; use warnings for soft rules.
- Name tests after the business rule and add a short reason for failures.
Implementation steps
- Write the rule in plain language with stakeholders.
- Translate it to a SQL predicate that selects only violations.
- Add helpful columns: entity key, rule_name, reason, context.
- Run locally on a small sample; verify known edge cases.
- Schedule in CI or a job that runs after model builds.
- Alert on non-empty results; log sample rows for triage.
Performance tips
- Filter by partitions (e.g., event_date >= current_date - 30) when safe.
- Select only needed columns; avoid SELECT *.
- Push filters down into CTEs to minimize scanned data.
- Use indexes or clustering keys where supported.
Helpful failure messages
Include a reason column that concisely explains the violation. Format: rule_name, reason, entity_id, and a couple of diagnostic fields (timestamps, amounts).
SELECT entity_id,
'positive_amount' AS rule_name,
CASE WHEN amount < 0 THEN 'Amount < 0' END AS reason,
amount, created_at
FROM fct_payments
WHERE amount < 0;
Exercises
Do these to solidify the skill. The Quick Test at the end is available to everyone; log in to save your progress.
- Exercise 1: Enforce one primary address per customer (see Exercises section below for full instructions).
- Exercise 2: Enforce captured payments must have paid_at and it cannot be before created_at.
- Checklist before you run: query returns only violations, includes a reason, scoped if necessary, and runs quickly.
Common mistakes and self-check
- Mistake: Returning all rows and counting in application code. Fix: Return only violating rows; zero rows means pass.
- Mistake: Unbounded scans. Fix: Add safe date filters or process in batches.
- Mistake: Ambiguous logic. Fix: Pair with domain stakeholders and add explicit CASE reasons.
- Mistake: Flaky time comparisons. Fix: Normalize timestamps to a consistent timezone and truncate to stable units if needed.
Self-check prompts
- Can a human read the reason and know what to fix?
- Does the test pass on known-good data and fail on seeded bad cases?
- Is the rule uniquely identified by the test name?
Practical projects
- Project 1: Create a test suite for your core facts (orders, payments, subscriptions). Include at least five business rules.
- Project 2: Add rule-based tests to your CI. Block merges when critical rules fail; warn on soft rules.
- Project 3: Build a daily violation summary table with counts per rule and top offenders to prioritize fixes.
Learning path
- Start: Basic data quality checks (nulls, uniqueness, referential integrity).
- Next: Custom business-rule tests (this lesson).
- Then: Test orchestration and alerting, SLA dashboards, and test-driven modeling.
Next steps
- Implement two of the example tests in your environment today.
- Review failures with stakeholders and adjust rules as needed.
- Add explanations to your team wiki for each rule.
Mini challenge
Write a test that ensures discounts cannot exceed 50% unless customer_tier = 'employee'. Return order_id, discount_rate, tier, and a reason. Keep it scoped to the last 60 days.