How to learn Derivations And Calculations for Data Transformations in ETL Developer for free

Why this matters

Derivations and calculations turn raw fields into business-ready metrics. As an ETL Developer, you will:

Compute revenue, margins, discounts, and KPIs reliably.
Standardize names, emails, and codes to enable joins and analytics.
Create buckets (age groups, spend tiers) and flags (churn risk, VIP).
Build rollups and window-calculated features (rolling averages, ranks).
Handle nulls, rounding, and data types so numbers stay trustworthy.

Concept explained simply

Derivations are new columns created from existing data (e.g., net_revenue = qty * price * (1 - discount)). Calculations are the operations and rules you apply to produce those columns: arithmetic, conditionals, string/date functions, aggregates, and window functions.

Mental model

Think of your dataset as ingredients and your transformation as a recipe. Each derived column is a step that must be:

Deterministic: same inputs lead to same outputs.
Traceable: easy to explain how it was computed.
Testable: you can check each step with small samples.

Core techniques you will use

Arithmetic: addition, subtraction, multiplication, division; order of operations matters.
Conditionals: CASE/IF for flags and buckets.
String ops: trim, lower/upper, split, substring, regex where available.
Date/time ops: date differences, truncation to day/week/month, time zone awareness.
Aggregations: sums, counts, averages; grouping keys.
Window functions: rolling averages, cumulative sums, ranks.
Type handling: casting, decimal precision, safe integer division.
Null handling: COALESCE/defaults, avoid dividing by zero.
Rounding: financial values usually round half up to 2 decimals; be consistent.

Worked examples

Example 1 — Order economics

Inputs per order: qty, unit_price, discount_pct, cogs.

revenue_before_discount = qty * unit_price
discount_amount = revenue_before_discount * discount_pct
net_revenue = revenue_before_discount - discount_amount
gross_margin = net_revenue - cogs
margin_pct = CASE WHEN net_revenue > 0 THEN gross_margin / net_revenue ELSE NULL END

Tip: Store monetary fields as decimals with sufficient precision to avoid floating-point surprises.

Example 2 — Churn risk flag

Inputs: last_purchase_date, today_date.

days_since_purchase = today_date - last_purchase_date
is_churn_risk = CASE WHEN days_since_purchase >= 90 THEN 1 ELSE 0 END

Make the cutoff explicit (e.g., 90) and keep it configurable in pipelines.

Example 3 — Standardize names and emails

full_name_clean = TRIM(REGEXP_REPLACE(full_name, '\\s+', ' '))
email_normalized = LOWER(TRIM(email))
email_domain = SPLIT_PART(email_normalized, '@', 2)

Small standardizations greatly improve join rates and deduplication.

Example 4 — Rolling 7-day average

Inputs: daily_sales(date, sales).

rolling_avg_7d = AVG(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

Window frames must be ordered and bounded. Early rows have smaller windows; document this behavior.

Data types, precision, and safety

Decimals: use DECIMAL/NUMERIC for money; set scale (e.g., 2) consistently across stages.
Integer division: cast to decimal before division to avoid truncation.
Rounding: choose a policy (e.g., round half up) and apply once, at the presentation or agreed layer.
Null safety: COALESCE defaults thoughtfully; do not mask real data gaps.
Time zones: convert timestamps to a canonical zone (e.g., UTC) before deriving date parts.

Self-checks you can automate

Invariants: net_revenue ≥ 0; margin_pct between -1 and 1 unless negative margins allowed.
Sum consistency: detail sums match aggregate tables within rounding tolerance.
Monotonicity: cumulative totals never decrease.
Coverage: flags/buckets sum to known totals; no unexpected NULLs after derivation.

Common mistakes (and how to avoid)

Integer division truncation: cast operands to decimal before dividing.
Unbounded windows: missing frame leads to full-history averages; always specify frame.
Silent nulls: operations with NULL return NULL; use COALESCE or CASE to control behavior.
Double-discounting: apply discounts once and in the right order.
Rounding at every step: round only at final output unless required, to avoid drift.
Timezone drift: derive dates after converting to a consistent time zone.

Exercises

Practice with small, deterministic datasets. The same tasks appear below in the interactive section with solutions.

Exercise ex1: Compute net_revenue, gross_margin, and margin_pct for 3 orders. Then round money to 2 decimals and margin_pct to 2 decimal percent.
Exercise ex2: Build a 3-day rolling average of daily sales and label each day with its week number (ISO week).
Exercise ex3: Derive churn_risk, ltv_bucket, and email_domain from a customers sample. Assume current_date = 2024-02-01.

Checklist before you submit

Handled NULLs and divide-by-zero safely.
Used consistent rounding for money and percentages.
Documented any constants (e.g., 90 days churn cutoff).
Window frames are explicit and ordered.

Mini challenge

Design derived fields for a subscription billing dataset: monthly_recurring_revenue (MRR), churn_flag, expansion_revenue, contraction_revenue, and net_revenue_retention by month. State your definitions, required inputs, and how you would validate them.

Who this is for

ETL Developers and Data Engineers building reliable transformations.
Analysts moving pipelines from notebooks to production.
Anyone preparing for metric definitions and data contracts.

Prerequisites

Comfort with SQL basics (SELECT, WHERE, GROUP BY) or a transformation tool.
Understanding of data types (int, decimal, string, date/timestamp).

Learning path

Start with simple arithmetic and conditional flags.
Add string/date standardization and safe casting.
Introduce aggregates and then window functions.
Bake in validation checks and rounding policies.
Optimize and document derivations as reusable macros/UDFs if your platform supports them.

Practical projects

Order metrics mart: net_revenue, discounts, margins by day/product.
Customer health table: churn risk, activity streaks, VIP tiering.
Marketing attribution prep: cleaned channels, campaign parsing, week buckets.

Next steps

Complete the exercises below and compare with the solutions.
Take the Quick Test to confirm understanding.
Apply these patterns to your current pipeline and add automated checks.

Ready to test yourself?

Take the Quick Test below. Note: The quick test is available to everyone; only logged-in users get saved progress.

Menu

Derivations And Calculations

Table of Contents

Why this matters

Concept explained simply

Mental model

Core techniques you will use

Worked examples

Data types, precision, and safety

Self-checks you can automate

Common mistakes (and how to avoid)

Exercises

Checklist before you submit

Mini challenge

Who this is for

Prerequisites

Learning path

Practical projects

Next steps

Ready to test yourself?

Practice Exercises

Compute order economics

Instructions

Expected Output

3-day rolling average with week labels

Customer health derivations

Derivations And Calculations — Quick Test

Have questions about Derivations And Calculations?

AI Assistant