luvv to helpDiscover the Best Free Online Tools
Topic 4 of 8

Derivations And Calculations

Learn Derivations And Calculations for free with explanations, exercises, and a quick test (for ETL Developer).

Published: January 11, 2026 | Updated: January 11, 2026

Why this matters

Derivations and calculations turn raw fields into business-ready metrics. As an ETL Developer, you will:

  • Compute revenue, margins, discounts, and KPIs reliably.
  • Standardize names, emails, and codes to enable joins and analytics.
  • Create buckets (age groups, spend tiers) and flags (churn risk, VIP).
  • Build rollups and window-calculated features (rolling averages, ranks).
  • Handle nulls, rounding, and data types so numbers stay trustworthy.

Concept explained simply

Derivations are new columns created from existing data (e.g., net_revenue = qty * price * (1 - discount)). Calculations are the operations and rules you apply to produce those columns: arithmetic, conditionals, string/date functions, aggregates, and window functions.

Mental model

Think of your dataset as ingredients and your transformation as a recipe. Each derived column is a step that must be:

  • Deterministic: same inputs lead to same outputs.
  • Traceable: easy to explain how it was computed.
  • Testable: you can check each step with small samples.

Core techniques you will use

  • Arithmetic: addition, subtraction, multiplication, division; order of operations matters.
  • Conditionals: CASE/IF for flags and buckets.
  • String ops: trim, lower/upper, split, substring, regex where available.
  • Date/time ops: date differences, truncation to day/week/month, time zone awareness.
  • Aggregations: sums, counts, averages; grouping keys.
  • Window functions: rolling averages, cumulative sums, ranks.
  • Type handling: casting, decimal precision, safe integer division.
  • Null handling: COALESCE/defaults, avoid dividing by zero.
  • Rounding: financial values usually round half up to 2 decimals; be consistent.

Worked examples

Example 1 — Order economics

Inputs per order: qty, unit_price, discount_pct, cogs.

  • revenue_before_discount = qty * unit_price
  • discount_amount = revenue_before_discount * discount_pct
  • net_revenue = revenue_before_discount - discount_amount
  • gross_margin = net_revenue - cogs
  • margin_pct = CASE WHEN net_revenue > 0 THEN gross_margin / net_revenue ELSE NULL END

Tip: Store monetary fields as decimals with sufficient precision to avoid floating-point surprises.

Example 2 — Churn risk flag

Inputs: last_purchase_date, today_date.

  • days_since_purchase = today_date - last_purchase_date
  • is_churn_risk = CASE WHEN days_since_purchase >= 90 THEN 1 ELSE 0 END

Make the cutoff explicit (e.g., 90) and keep it configurable in pipelines.

Example 3 — Standardize names and emails
  • full_name_clean = TRIM(REGEXP_REPLACE(full_name, '\\s+', ' '))
  • email_normalized = LOWER(TRIM(email))
  • email_domain = SPLIT_PART(email_normalized, '@', 2)

Small standardizations greatly improve join rates and deduplication.

Example 4 — Rolling 7-day average

Inputs: daily_sales(date, sales).

  • rolling_avg_7d = AVG(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

Window frames must be ordered and bounded. Early rows have smaller windows; document this behavior.

Data types, precision, and safety

  • Decimals: use DECIMAL/NUMERIC for money; set scale (e.g., 2) consistently across stages.
  • Integer division: cast to decimal before division to avoid truncation.
  • Rounding: choose a policy (e.g., round half up) and apply once, at the presentation or agreed layer.
  • Null safety: COALESCE defaults thoughtfully; do not mask real data gaps.
  • Time zones: convert timestamps to a canonical zone (e.g., UTC) before deriving date parts.

Self-checks you can automate

  • Invariants: net_revenue ≥ 0; margin_pct between -1 and 1 unless negative margins allowed.
  • Sum consistency: detail sums match aggregate tables within rounding tolerance.
  • Monotonicity: cumulative totals never decrease.
  • Coverage: flags/buckets sum to known totals; no unexpected NULLs after derivation.

Common mistakes (and how to avoid)

  • Integer division truncation: cast operands to decimal before dividing.
  • Unbounded windows: missing frame leads to full-history averages; always specify frame.
  • Silent nulls: operations with NULL return NULL; use COALESCE or CASE to control behavior.
  • Double-discounting: apply discounts once and in the right order.
  • Rounding at every step: round only at final output unless required, to avoid drift.
  • Timezone drift: derive dates after converting to a consistent time zone.

Exercises

Practice with small, deterministic datasets. The same tasks appear below in the interactive section with solutions.

  1. Exercise ex1: Compute net_revenue, gross_margin, and margin_pct for 3 orders. Then round money to 2 decimals and margin_pct to 2 decimal percent.
  2. Exercise ex2: Build a 3-day rolling average of daily sales and label each day with its week number (ISO week).
  3. Exercise ex3: Derive churn_risk, ltv_bucket, and email_domain from a customers sample. Assume current_date = 2024-02-01.

Checklist before you submit

  • Handled NULLs and divide-by-zero safely.
  • Used consistent rounding for money and percentages.
  • Documented any constants (e.g., 90 days churn cutoff).
  • Window frames are explicit and ordered.

Mini challenge

Design derived fields for a subscription billing dataset: monthly_recurring_revenue (MRR), churn_flag, expansion_revenue, contraction_revenue, and net_revenue_retention by month. State your definitions, required inputs, and how you would validate them.

Who this is for

  • ETL Developers and Data Engineers building reliable transformations.
  • Analysts moving pipelines from notebooks to production.
  • Anyone preparing for metric definitions and data contracts.

Prerequisites

  • Comfort with SQL basics (SELECT, WHERE, GROUP BY) or a transformation tool.
  • Understanding of data types (int, decimal, string, date/timestamp).

Learning path

  1. Start with simple arithmetic and conditional flags.
  2. Add string/date standardization and safe casting.
  3. Introduce aggregates and then window functions.
  4. Bake in validation checks and rounding policies.
  5. Optimize and document derivations as reusable macros/UDFs if your platform supports them.

Practical projects

  • Order metrics mart: net_revenue, discounts, margins by day/product.
  • Customer health table: churn risk, activity streaks, VIP tiering.
  • Marketing attribution prep: cleaned channels, campaign parsing, week buckets.

Next steps

  • Complete the exercises below and compare with the solutions.
  • Take the Quick Test to confirm understanding.
  • Apply these patterns to your current pipeline and add automated checks.

Ready to test yourself?

Take the Quick Test below. Note: The quick test is available to everyone; only logged-in users get saved progress.

Practice Exercises

3 exercises to complete

Instructions

Given the orders CSV below, derive the following per row: revenue_before_discount, discount_amount, net_revenue, gross_margin, and margin_pct. Round monetary fields to 2 decimals and margin_pct to 2 decimals (as a fraction, e.g., 0.17 for 17%).

order_id,qty,unit_price,discount_pct,cogs
1001,2,20.00,0.10,30.00
1002,1,15.00,0.00,8.00
1003,3,12.50,0.20,20.00
Expected Output
order_id,revenue_before_discount,discount_amount,net_revenue,gross_margin,margin_pct 1001,40.00,4.00,36.00,6.00,0.17 1002,15.00,0.00,15.00,7.00,0.47 1003,37.50,7.50,30.00,10.00,0.33

Derivations And Calculations — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Derivations And Calculations?

AI Assistant

Ask questions about this tool