Why this matters
Business rules turn raw data into decisions the organization trusts. As an ETL Developer, you translate policies such as “what counts as an active customer,” “how to compute tax,” or “which duplicate to keep” into reproducible, testable transformations. Good rule implementation reduces disputes, audit risk, and rework. You will:
- Standardize and validate incoming data before it flows into analytics.
- Derive key indicators (e.g., customer status, revenue, churn flags).
- Apply mappings, priorities, and effective-dated policies consistently.
- Make pipelines idempotent, auditable, and easy to change.
Concept explained simply
Business rules are “if-then” agreements about how data should look and behave. They come from domain experts and must be encoded in your transformations in a clear, testable way.
Mental model
Think of a rule engine as a small, deterministic function applied to uniform inputs. You aim for:
- IF: conditions on fields and time (e.g., effective dates).
- MAP: lookups into reference tables instead of hard-coding.
- AGG: summarize over windows (e.g., 90-day spend).
- PRIORITY: tie-breakers for duplicates.
- VERSION: rule versions with dates and provenance.
Always prefer declarative representations (tables, config) over embedding constants in code.
Implementing rules in 6 steps
Tip: Declarative vs procedural
Prefer rules as data (tables and parameters) rather than code constants. This enables non-breaking updates, versioning, and easier audits.
Worked examples
Example 1 — Country VAT calculation
Rule: Apply VAT by customer country using an effective-dated rate table. If no rate found, flag the record for review.
-- Inputs: orders(o_id, amount_net, country_code, order_date)
-- Ref: vat_rates(country_code, vat_rate, effective_from, effective_to)
SELECT o.o_id,
o.amount_net,
r.vat_rate,
CASE WHEN r.vat_rate IS NULL THEN NULL ELSE o.amount_net * r.vat_rate END AS amount_vat,
CASE WHEN r.vat_rate IS NULL THEN 'REVIEW_MISSING_VAT' ELSE NULL END AS rule_flag
FROM orders o
LEFT JOIN vat_rates r
ON r.country_code = o.country_code
AND o.order_date >= r.effective_from
AND (r.effective_to IS NULL OR o.order_date <= r.effective_to);
Why it works: Declarative, time-aware, idempotent. Missing rates are explicitly flagged.
Example 2 — Customer status derivation
Rule: A customer is Active if their total paid amount in the last 90 days ≥ 100; else Inactive.
-- Inputs: payments(cust_id, paid_amount, paid_at)
-- We compute rolling 90-day sum per order date or snapshot date.
WITH sums AS (
SELECT cust_id,
SUM(paid_amount) FILTER (WHERE paid_at >= current_date - INTERVAL '90 days') AS amt_90d
FROM payments
GROUP BY cust_id
)
SELECT cust_id,
CASE WHEN amt_90d >= 100 THEN 'Active' ELSE 'Inactive' END AS customer_status
FROM sums;
Edge cases: Time zone normalization; multiple currencies (consider standardizing); customers with no payments default to Inactive.
Example 3 — Duplicate resolution with priorities
Rule: Keep one lead per email per day. Prefer channel priority Web > Partner > Other. If same channel, keep latest event_time.
-- Inputs: leads(lead_id, email, channel, event_time)
-- Ref: channel_priority(channel, priority_rank)
WITH ranked AS (
SELECT l.*, p.priority_rank,
ROW_NUMBER() OVER (
PARTITION BY email, CAST(event_time AS DATE)
ORDER BY p.priority_rank ASC, event_time DESC, lead_id ASC
) AS rn
FROM leads l
LEFT JOIN channel_priority p ON l.channel = p.channel
)
SELECT * FROM ranked WHERE rn = 1;
Why it works: Deterministic tie-breakers produce reproducible results.
Exercises
Exercises are available to everyone. If you are logged in, your progress will be saved.
Exercise 1 — Late fee calculation (mirrors the task below)
Implement a late fee rule for invoices: if due_date < today and status != 'PAID', apply 2% of amount_due per full week overdue, minimum $5, maximum 25% of amount_due. If not overdue, late_fee = 0 and flag = NULL.
- Inputs: invoices(invoice_id, amount_due, due_date, status), today_date parameter.
- Outputs: late_fee (decimal), rule_flag (nullable string).
Hints
- Compute weeks_overdue as floor(date_diff / 7), not negative.
- Use LEAST/GREATEST to cap and floor the fee.
- Leave a flag like 'OVERDUE_APPLIED' only when fee > 0.
Show solution
WITH base AS (
SELECT invoice_id, amount_due, due_date, status,
CAST(:today_date AS DATE) AS today_date,
GREATEST(0, FLOOR( (CAST(:today_date AS DATE) - due_date) / 7 )) AS weeks_overdue
FROM invoices
), fees AS (
SELECT invoice_id,
CASE WHEN status <> 'PAID' AND weeks_overdue > 0 THEN
LEAST( amount_due * 0.25,
GREATEST( 5.0, amount_due * 0.02 * weeks_overdue ) )
ELSE 0.0 END AS late_fee
FROM base
)
SELECT b.invoice_id,
f.late_fee,
CASE WHEN f.late_fee > 0 THEN 'OVERDUE_APPLIED' ELSE NULL END AS rule_flag
FROM base b
JOIN fees f USING (invoice_id);
- [ ] I used effective dating or parameters for time-based logic.
- [ ] The computation is idempotent and deterministic.
- [ ] Edge cases (status = PAID, just due today) behave correctly.
- [ ] I can explain the formula to a non-technical stakeholder.
Common mistakes and how to self-check
- Hard-coding constants in code: Move rates, thresholds, priorities to tables with effective dates. Self-check: Can a BA change a rate without code deploy?
- Ignoring time boundaries: Always filter by effective_from/effective_to or snapshot date. Self-check: Does rerunning last month produce identical results?
- Ambiguous NULL handling: Explicitly choose default values or flags. Self-check: What happens when a mapping is missing?
- Non-idempotent updates: Avoid accumulative writes. Prefer overwrite or MERGE with deterministic keys. Self-check: Rerun twice; outputs should match.
- Mixed concerns: Separate validation from derivation. Self-check: Can you count validation errors independently?
- No golden tests: Create small, labeled datasets. Self-check: Do you have expected outputs for edge cases?
Practical projects
- Build a configurable tax engine: mapping table with country/state, categories, and effective dates; include a validation report for missing rates.
- Customer lifecycle flags: Active, Churn Risk, VIP using rolling windows; publish a daily snapshot table with rule_version.
- Deduplication service: Resolve duplicates across sources with rank tables and tie-breakers; output a decision log per cluster.
Learning path
- Before this: Data types and standardization, joins, window functions, idempotent loading patterns.
- Now: Business Rules Implementation (this page) — mapping, derivation, priorities, effective dating.
- Next: Data quality monitoring, SCD patterns, configuration management, and documentation automation.
Who this is for
- ETL Developers implementing transformations for analytics and reporting.
- Data engineers making domain rules reproducible and testable.
- Analyst engineers who need consistent, auditable metrics.
Prerequisites
- Comfort with SQL (joins, CASE, window functions).
- Understanding of dimensional modeling basics and idempotent ELT/ETL patterns.
- Ability to design and use reference/config tables.
Mini challenge
Design rules for subscription discounts:
- Tenure >= 12 months gets 10% off; >= 36 months gets 20% off.
- Premium plan cannot exceed 15% discount.
- If multiple promos apply, keep the highest allowed by plan cap.
- Discounts effective next billing cycle; use effective_from dates.
What to deliver
- Config table schema for discount tiers and plan caps (with effective dates).
- SQL to compute discount_percent per subscription deterministically.
- A 10-row golden dataset with expected outputs.
Next steps
- Refactor hard-coded constants into reference tables.
- Add rule_version and effective date logic to at least one pipeline.
- Create a small golden dataset and a daily validation summary.