Who this is for
BI Analysts and data practitioners who build dashboards and reports and want stable, well-documented SQL views that the whole team can reuse safely.
Prerequisites
- Comfort with SELECT, JOIN, GROUP BY, and CTEs.
- Basic familiarity with your warehouse (PostgreSQL/Snowflake/BigQuery/Redshift syntax differences).
- Understanding of your company’s core entities (customers, orders, products).
Why this matters
In BI work you repeatedly need clean, consistent data for dashboards and ad-hoc questions. Reusable SQL views give everyone a single, trustworthy source. Real tasks where this helps:
- Providing a single definition of metrics (e.g., Daily Active Customers) across multiple dashboards.
- Hiding raw data complexity (deduplication, type casting, nullable fields) behind a clean interface.
- Speeding up onboarding: new analysts query one documented view instead of re-learning joins.
Concept explained simply
A reusable view is a named SQL query you can SELECT from as if it were a table. It standardizes logic (joins, filters, types, naming) so others don’t have to rebuild it. Think of it as a contract: stable columns with clear meanings that don’t surprise downstream users.
Mental model
Imagine three layers:
- Staging (raw_clean): lightly cleaned source tables.
- Base views (analytics.base_*): consistent schemas, business-safe defaults (data types, null-handling).
- Consumer views (analytics.v_*): user-friendly surfaces for dashboards (pre-aggregations, labels).
Base views change rarely and prioritize correctness. Consumer views are tailored for specific use cases but built on base views to avoid duplication.
Design principles for reusable BI views
- Stability first: prefer additive changes; avoid breaking renames or type changes.
- Explicit columns: never SELECT *; list columns with clear, semantic names (snake_case).
- Type safety: CAST to expected types; normalize booleans and timestamps; set time zones explicitly if applicable.
- Null handling: use COALESCE and default flags to avoid surprising NULLs in dashboards.
- Document in-line: add a header comment describing purpose, grain, and column meanings.
- Two-tier approach: build a base view, then thin consumer views on top for specific metrics.
- Performance: push filters to source, avoid unnecessary DISTINCT, and pre-aggregate where appropriate.
- Reproducibility: use CREATE OR REPLACE VIEW; isolate logic in CTEs with clear names.
- Security: expose only necessary columns; exclude PII or mask it as required by policy.
Worked examples
Example 1: Customer dimension base view (dedupe, types, flags)
-- Purpose: Clean customer dimension
-- Grain: 1 row per customer_id
-- Columns: customer_id, full_name, email_hash, country_code, is_active, created_date
CREATE OR REPLACE VIEW analytics.base_customers AS
WITH src AS (
SELECT
c.customer_id,
TRIM(c.first_name || ' ' || c.last_name) AS full_name,
/* Avoid exposing raw email; hash or mask based on policy */
MD5(LOWER(TRIM(c.email))) AS email_hash,
UPPER(c.country_code) AS country_code,
c.status,
CAST(c.created_at AS DATE) AS created_date,
ROW_NUMBER() OVER (PARTITION BY c.customer_id ORDER BY c.updated_at DESC) AS rn
FROM raw.customers c
)
SELECT
customer_id,
NULLIF(full_name, '') AS full_name,
email_hash,
country_code,
CASE WHEN LOWER(status) = 'active' THEN TRUE ELSE FALSE END AS is_active,
created_date
FROM src
WHERE rn = 1;
Why it’s reusable: one row per customer, safe types, masked email, consistent is_active flag, deduped by latest record.
Example 2: Daily order metrics (consumer view on base data)
-- Purpose: Daily commerce metrics for dashboards
-- Grain: 1 row per order_date
-- Depends on: analytics.base_orders, analytics.base_order_items
CREATE OR REPLACE VIEW analytics.v_daily_order_metrics AS
WITH o AS (
SELECT order_id, customer_id, CAST(order_timestamp AS DATE) AS order_date, order_status
FROM analytics.base_orders
WHERE order_status IN ('paid','fulfilled')
),
items AS (
SELECT order_id, SUM(quantity) AS items_qty, SUM(quantity * unit_price) AS gross_revenue
FROM analytics.base_order_items
GROUP BY order_id
)
SELECT
o.order_date,
COUNT(DISTINCT o.order_id) AS orders,
COUNT(DISTINCT o.customer_id) AS unique_customers,
COALESCE(SUM(items.items_qty),0) AS items_sold,
COALESCE(SUM(items.gross_revenue),0.0) AS revenue
FROM o
LEFT JOIN items USING (order_id)
GROUP BY o.order_date;
Why it’s reusable: standard daily grain, clear metric names, business-status filtered, and safe COALESCE for dashboard visuals.
Example 3: Two-tier approach (base view + thin consumer view)
-- Base: one row per product with normalized types
CREATE OR REPLACE VIEW analytics.base_products AS
SELECT
p.product_id,
TRIM(p.product_name) AS product_name,
UPPER(p.category) AS category,
CAST(p.active AS BOOLEAN) AS is_active,
CAST(p.created_at AS DATE) AS created_date
FROM raw.products p;
-- Consumer: active products only, for UI filters
CREATE OR REPLACE VIEW analytics.v_active_products AS
SELECT product_id, product_name, category
FROM analytics.base_products
WHERE is_active = TRUE;
Why it’s reusable: base view standardizes schema; consumer view tailors a common filter without duplicating normalization logic.
Build steps you can follow
- Define grain and purpose: one row per what? (customer, order_date, product).
- List columns and types: write them down; assign names and expected types.
- Draft SQL with CTEs: raw input, cleanup, business rules, final select.
- Harden: COALESCE nulls, cast types, filter statuses, remove duplicates.
- Document: add header comment with grain, purpose, dependencies.
- Test: count, distinct keys, spot-check joins, compare to known totals.
- Release: CREATE OR REPLACE VIEW under analytics schema; avoid breaking changes.
Exercises
These mirror the exercises below. Try them in your warehouse; then open the solutions.
- Exercise 1 (ex1): Build a clean customer dimension view with dedupe and masking.
- Exercise 2 (ex2): Build a daily order metrics view from orders and order_items.
Checklist before you consider a view "reusable"
- Purpose and grain stated in a comment.
- No SELECT *; columns explicitly listed.
- Primary key or uniqueness guaranteed at the grain.
- Types casted; timestamps clarified; nulls handled.
- Filters reflect business rules (e.g., paid orders only).
- Depends only on stable upstream objects.
- Performance is acceptable for dashboard refresh SLAs.
Common mistakes and self-check
- Using SELECT *: downstream breaks when new columns appear or order changes. Fix: list columns explicitly.
- Missing grain definition: duplicates appear. Fix: enforce uniqueness (ROW_NUMBER then rn=1).
- Leaking PII: exposing raw emails. Fix: mask or hash; expose only needed fields.
- Implicit types/time zones: numbers as strings or mixed tz timestamps. Fix: CAST and standardize TZ.
- Over-aggregation: double-counting revenue by joining items incorrectly. Fix: aggregate items first, then join to orders.
- Hard-coded dates: views that only work for last month. Fix: make logic relative or push date filters to consumers.
Self-check
- Can you describe the grain in one sentence?
- Is there a column (or combo) that is unique at that grain?
- Are all numeric columns correct type and scale?
- If a column contains nulls, is that intentional and documented?
- Does COUNT(*) from the view match known control totals when filtered?
Practical projects
- Analytics customer mart: Create base_customers and v_active_customers with dedupe, PII masking, and activity flags. Acceptance: uniqueness by customer_id; no raw PII exposed.
- Sales daily metrics: Build v_daily_order_metrics with orders, customers, items, revenue. Acceptance: matches finance totals for a sampled week.
- Product browse layer: Base products + v_active_products + v_category_rollup. Acceptance: category-level counts match product catalog.
Learning path
- Start: base entity views (customers, products, orders).
- Then: consumer-friendly metric views (daily metrics, retention cohorts).
- Next: incremental models or materialized views if needed for performance.
- Finally: semantic layer naming consistency across dashboards.
Quick test note
The quick test at the end is available to everyone. Only logged-in users will have their progress saved.
Mini challenge
Build a two-tier view for "active subscribers":
- Create base_subscriptions with one row per subscription_id, normalized statuses, and start/end dates.
- Create v_active_subscriptions filtering to currently active records and expose subscriber_id and plan_name.
- Prove correctness: show that counts per day match billing exports for two random dates.