How to learn Writing Modular Reusable SQL Models for SQL in Analytics Engineer for free

Who this is for

Analytics Engineers and BI developers who want to build clean, reliable, and reusable SQL models that power dashboards, metrics, and downstream analyses.

Prerequisites

Comfort writing SELECT queries with JOINs, GROUP BY, and WHERE
Basic knowledge of CTEs (WITH), views, and indexes
Familiarity with your warehouse dialect (e.g., casting, date functions)

Why this matters

In real projects, you rarely write one-off SQL. You create models others will reuse: clean customer tables, order facts, daily metrics. Modular, reusable SQL saves hours, reduces defects, and makes changes safer. Examples of real tasks:

Standardize raw data into staging models so all teams join on the same clean keys.
Refactor repeated logic (like revenue calculation) into one trusted model.
Create intermediate models that aggregate and enrich for BI dashboards.
Ensure models run fast and deterministically during daily refreshes.

Concept explained simply

Modular SQL means splitting complex logic into smaller models with one clear responsibility. Reusable means these models are stable, named consistently, and safe for others to build on.

Mental model

Think in layers:

Staging (stg_): Clean and standardize raw tables. Cast types, trim strings, fix nulls, rename columns.
Intermediate (int_): Combine and enrich. Deduplicate, compute metrics, apply business rules.
Marts (dim_/fact_): Final analytics-friendly tables for BI. Dimensions and facts with documentation-ready columns.

Each layer depends only on earlier layers. Avoid jumping over layers.

Design principles

DRY (Don't Repeat Yourself): Put shared logic in one place (CTEs, views, or upstream models).
Single responsibility: One primary purpose per model (e.g., stg_orders only cleans orders).
Deterministic schemas: Explicitly select and alias columns; avoid SELECT *.
Predictable naming: Prefix by layer and purpose (stg_, int_, dim_, fact_).
Idempotent builds: Use create or replace for views; avoid side effects.
Performance aware: Pre-aggregate where appropriate; ensure keys are unique before joins.

Self-check: Is my model modular?

Can I describe the model’s single purpose in one sentence?
Is any complex logic shared moved upstream?
Are column names explicit and stable?
Can another analyst reuse this without reading 5 other files?

Useful patterns and building blocks

1) CTEs to organize logic

WITH cleaned AS (
  SELECT
    order_id,
    CAST(order_date AS DATE) AS order_date,
    customer_id,
    CAST(amount AS NUMERIC(12,2)) AS amount
  FROM raw.orders
),
order_totals AS (
  SELECT order_id, customer_id, order_date, SUM(amount) AS order_value
  FROM cleaned
  GROUP BY 1,2,3
)
SELECT * FROM order_totals;

2) Views for reuse

Turn frequently used logic into a view (or materialized view if supported and the query is heavy). This avoids copying logic into many places.

3) Date spine for complete time series

WITH date_spine AS (
  SELECT DATEADD(day, seq, (SELECT MIN(order_date) FROM stg_orders)) AS d
  FROM generate_series(0, DATEDIFF(day, (SELECT MIN(order_date) FROM stg_orders), (SELECT MAX(order_date) FROM stg_orders))) AS t(seq)
)
SELECT d FROM date_spine;

Join your facts to the spine to include zero-activity days.

4) Safe joins

Clean keys in staging: TRIM, UPPER, CAST
Check uniqueness before joining (using COUNT DISTINCT and anti-joins for duplicates)

Quick duplicate check snippet

SELECT customer_id, COUNT(*) AS cnt
FROM stg_customers
GROUP BY 1
HAVING COUNT(*) > 1;

Worked examples

Example 1: From messy raw to reusable stg_orders

Goal: Create a clean staging model for orders, with typed columns and totals per order.

Step-by-step

Clean and cast raw data
Aggregate to one row per order
Expose only stable, well-named columns

WITH cleaned AS (
  SELECT
    o.order_id,
    CAST(o.order_date AS DATE) AS order_date,
    o.customer_id,
    CAST(oi.item_amount AS NUMERIC(12,2)) AS item_amount
  FROM raw.orders o
  JOIN raw.order_items oi ON o.order_id = oi.order_id
),
order_totals AS (
  SELECT
    order_id,
    customer_id,
    order_date,
    SUM(item_amount) AS total_order_value
  FROM cleaned
  GROUP BY 1,2,3
)
SELECT
  order_id,
  customer_id,
  order_date,
  total_order_value
FROM order_totals;

Example 2: Intermediate model for daily revenue (int_daily_revenue)

Goal: Provide a complete daily time series, including zero-revenue days.

Show example

WITH date_spine AS (
  SELECT DATEADD(day, seq, (SELECT MIN(order_date) FROM stg_orders)) AS d
  FROM generate_series(0, DATEDIFF(day, (SELECT MIN(order_date) FROM stg_orders), (SELECT MAX(order_date) FROM stg_orders))) t(seq)
),
revenue_by_day AS (
  SELECT order_date AS d, SUM(total_order_value) AS revenue
  FROM stg_orders
  GROUP BY 1
)
SELECT
  s.d AS date,
  COALESCE(r.revenue, 0) AS revenue
FROM date_spine s
LEFT JOIN revenue_by_day r ON r.d = s.d;

Example 3: Dimensions and facts

Goal: Build dim_customers and fact_orders from staging.

Show example

-- dim_customers
WITH base AS (
  SELECT
    customer_id,
    INITCAP(TRIM(customer_name)) AS customer_name,
    UPPER(TRIM(country)) AS country,
    LOWER(TRIM(email)) AS email
  FROM stg_customers
)
SELECT * FROM base;

-- fact_orders
SELECT
  o.order_id,
  o.customer_id,
  o.order_date,
  o.total_order_value
FROM stg_orders o;

Exercises

Try these without peeking. Then compare with the solutions.

Exercise 1: Refactor order totals into a reusable staging model

Take repeated revenue logic used across dashboards and move it into a single, clean model. Use CTEs to organize steps and produce one row per order with a stable schema.

Instructions

Start from this messy pattern used in multiple places:

SELECT o.order_id, o.customer_id, o.order_date, SUM(oi.item_amount) AS total
FROM raw.orders o
JOIN raw.order_items oi ON o.order_id = oi.order_id
GROUP BY 1,2,3;

Refactor into a model with two CTEs: cleaned and order_totals.
Cast numeric amounts to NUMERIC(12,2) and order_date to DATE.
Alias the final columns as: order_id, customer_id, order_date, total_order_value.

Expected outcome

A single query returning one row per order with total_order_value, with explicit casts and column aliases. Suitable to become a view named stg_orders.

Hints

Put all type casting in the cleaned CTE.
Aggregate only once in order_totals.
Final SELECT should expose only four columns in stable order.

Show solution

WITH cleaned AS (
  SELECT
    o.order_id,
    CAST(o.order_date AS DATE) AS order_date,
    o.customer_id,
    CAST(oi.item_amount AS NUMERIC(12,2)) AS item_amount
  FROM raw.orders o
  JOIN raw.order_items oi ON o.order_id = oi.order_id
),
order_totals AS (
  SELECT
    order_id,
    customer_id,
    order_date,
    SUM(item_amount) AS total_order_value
  FROM cleaned
  GROUP BY 1,2,3
)
SELECT
  order_id,
  customer_id,
  order_date,
  total_order_value
FROM order_totals;

Exercise 2: Build a daily revenue model using a date spine

Produce a model that outputs one row per day with total revenue, including days with zero revenue.

Instructions

Create a date_spine CTE generating all dates between MIN and MAX order_date in stg_orders.
Aggregate revenue from stg_orders to daily level.
Left join the spine to the revenue and coalesce nulls to zero.

Expected outcome

A query that returns columns date and revenue, covering every date in the range with 0 where applicable.

Hints

Use generate_series or your warehouse’s equivalent to build the spine.
COALESCE handles missing days.
Group by the date column only once.

Show solution

WITH date_spine AS (
  SELECT DATEADD(day, seq, (SELECT MIN(order_date) FROM stg_orders)) AS d
  FROM generate_series(0, DATEDIFF(day, (SELECT MIN(order_date) FROM stg_orders), (SELECT MAX(order_date) FROM stg_orders))) t(seq)
),
rev AS (
  SELECT order_date AS d, SUM(total_order_value) AS revenue
  FROM stg_orders
  GROUP BY 1
)
SELECT s.d AS date, COALESCE(r.revenue, 0) AS revenue
FROM date_spine s
LEFT JOIN rev r ON r.d = s.d
ORDER BY 1;

Exercise checklist

All casts are in staging, not mixed into final SELECTs.
No SELECT * in final models.
Each model has one clear responsibility.
Joins use clean keys from staging.

Common mistakes and how to self-check

Problem: SELECT * makes schemas fragile. Fix: Explicitly select and alias columns.
Problem: Repeating logic across models. Fix: Move shared logic to a single upstream model or view.
Problem: Joining on dirty keys. Fix: Standardize in staging (TRIM/UPPER/CAST) and assert uniqueness.
Problem: Overloaded models doing cleanup + business rules + metrics. Fix: Split into staging and intermediate layers.
Problem: Missing zero-activity days in trends. Fix: Use a date spine and left join.

Self-audit snippet pack

-- Column drift check
SELECT column_name FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'your_model';

-- Dupes before joining
SELECT key_col, COUNT(*) FROM stg_table GROUP BY 1 HAVING COUNT(*) > 1;

-- Row-level determinism
SELECT COUNT(*) FROM your_model;

Practical projects

Project 1: Build stg_customers and stg_orders from raw, with documented, typed columns.
Project 2: Create int_customer_ltv that aggregates customer lifetime value and the order of first purchase.
Project 3: Publish dim_customers and fact_orders for BI, and a daily KPI model that feeds a dashboard.

Learning path

Staging fundamentals: typing, trimming, renaming columns.
Intermediate modeling: aggregations, deduplication, date spines.
Dimensions and facts: star-schema friendly models.
Performance tuning: pruning columns, pre-aggregation, sensible materialization.

Next steps

Add basic SQL assertions as separate validation queries (e.g., key uniqueness, null checks).
Document every model with purpose, grain, and column definitions.
Review dependency graph to ensure clean layering (stg_ → int_ → dim_/fact_).

Mini challenge

You have an int_daily_revenue model and a dim_customers model. Create a model daily_revenue_by_country that outputs date, country, revenue. Requirements: join keys are clean, includes zero-revenue days per country, and column names are explicit. Try to solve it in 25 minutes.

Take the quick test

Everyone can take the test. If you log in, your progress is saved and results are stored on your profile.

Menu

Writing Modular Reusable SQL Models

Table of Contents

Who this is for

Prerequisites

Why this matters

Concept explained simply

Mental model

Design principles

Useful patterns and building blocks

1) CTEs to organize logic

2) Views for reuse

3) Date spine for complete time series

4) Safe joins

Worked examples

Example 1: From messy raw to reusable stg_orders

Example 2: Intermediate model for daily revenue (int_daily_revenue)

Example 3: Dimensions and facts

Exercises

Exercise 1: Refactor order totals into a reusable staging model

Exercise 2: Build a daily revenue model using a date spine

Common mistakes and how to self-check

Practical projects

Learning path

Next steps

Mini challenge

Take the quick test

Practice Exercises

Refactor order totals into a reusable staging model

Instructions

Expected Output

Build a daily revenue model using a date spine

Writing Modular Reusable SQL Models — Quick Test

Have questions about Writing Modular Reusable SQL Models?

AI Assistant