How to learn Table Design Standards for Physical Data Modeling And Storage Design in Data Architect for free

Why this matters

Table design standards prevent costly rework, performance bottlenecks, and data quality issues. As a Data Architect, you set conventions so engineers, analysts, and BI tools can rely on consistent, fast, and safe data structures.

Faster queries: Correct keys, indexes, and partitioning cut costs and latency.
Reliable data: Constraints and defaults reduce nulls, duplicates, and drift.
Scalability: Storage and distribution choices scale with volume and concurrency.
Governance: Naming, PII tagging, and audit columns make compliance manageable.

Concept explained simply

Table design standards are your team’s rules for how every table is named, structured, constrained, and tuned. They keep thousands of tables understandable and high-performing.

Mental model

Think of each table as a contract:

Name tells purpose and grain.
Columns define facts and dimensions at that grain.
Keys, constraints, and indexes enforce the contract.
Partitioning, clustering, and compression ensure it’s fast and affordable.
Audit and metadata columns make it trustworthy and governable.

Core standards checklist

Open checklist

Naming
- snake_case, singular for OLTP (e.g., customer), plural or fact_/dim_ for analytics (fact_orders, dim_customer)
- Prefix schemas by domain (sales.customer, core.fact_events)
- Column names descriptive and consistent (created_at, updated_at, is_active)
Data types & nullability
- Use the narrowest type that fits; prefer explicit precision (decimal(18,2) for money)
- Timestamp with timezone for cross-region data
- Avoid generic TEXT for structured values; prefer enums or constrained VARCHAR
- Define nullability intentionally; add NOT NULL where possible with defaults
Keys & constraints
- Primary key: surrogate (id BIGINT) for OLTP; stable natural key if truly immutable
- Foreign keys for referential integrity in OLTP; soft FKs in warehouses (enforced via QA)
- Unique constraints for business uniqueness (email unique per tenant, etc.)
- CHECK constraints for domain rules (quantity >= 0)
Indexes & clustering
- OLTP: 1 clustered/primary index + selective nonclustered indexes on frequent filters/joins
- Warehouses: partition by date/time; cluster/sort by high-cardinality join/filter columns
- Avoid over-indexing writes; measure before adding
Partitioning & distribution
- Time-based partitioning for event/append-only tables
- Distribute/cluster on join keys in MPP systems
- Prune old partitions per retention policy
Compression & storage
- Columnar compression for analytics tables
- Choose row vs columnar store by workload
Audit & lifecycle
- Include created_at, updated_at, source_system, load_id/batch_id
- Soft delete flag (is_deleted) or valid_from/valid_to for history
- Data retention policy per table
Security & governance
- Tag sensitive columns (pii, phi); apply masking and row/column-level access
- Consistent collation/encoding; store PII as hashed where possible
Documentation
- Table and column comments include grain, source, load logic, and owner

Worked examples

Example 1 — OLTP customer table

-- Goals: fast point lookups, enforce uniqueness, track updates
CREATE TABLE sales.customer (
  id BIGSERIAL PRIMARY KEY,
  tenant_id BIGINT NOT NULL,
  email VARCHAR(254) NOT NULL,
  given_name VARCHAR(100) NOT NULL,
  family_name VARCHAR(100) NOT NULL,
  phone VARCHAR(32),
  is_active BOOLEAN NOT NULL DEFAULT TRUE,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  CONSTRAINT uq_customer_email_per_tenant UNIQUE (tenant_id, email),
  CONSTRAINT chk_phone_format CHECK (phone IS NULL OR phone ~ '^[0-9+\-() ]+$')
);
-- Index for tenant-scoped queries
CREATE INDEX ix_customer_tenant_active ON sales.customer (tenant_id, is_active);
-- Trigger or app logic to update updated_at on change.

Example 2 — Analytics fact_events (partitioned)

-- Goals: append-only, fast time-range scans and user filters
-- Generic warehouse SQL (conceptual)
CREATE TABLE core.fact_events (
  event_id BIGINT,
  occurred_at TIMESTAMP NOT NULL,
  user_id BIGINT,
  session_id STRING,
  event_name STRING NOT NULL,
  source STRING,
  payload VARIANT, -- JSON-like type in many warehouses
  load_id STRING,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
PARTITION BY DATE(occurred_at)
CLUSTER BY (user_id, event_name);
-- Notes: No PK enforced in many warehouses; ensure uniqueness via ETL and QA queries.

Example 3 — SCD Type 2 dimension (dim_customer)

-- Goals: track history; one current row per business key
CREATE TABLE dim.dim_customer (
  customer_sk BIGINT PRIMARY KEY, -- surrogate key
  customer_bk STRING NOT NULL,    -- natural/business key
  given_name STRING,
  family_name STRING,
  email STRING,
  is_active BOOLEAN,
  valid_from TIMESTAMP NOT NULL,
  valid_to TIMESTAMP,
  is_current BOOLEAN NOT NULL,
  load_id STRING,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Rules:
-- - (customer_bk, valid_from) unique
-- - is_current = TRUE implies valid_to IS NULL
-- - Query current state with is_current = TRUE

Design decisions: key trade-offs

Natural vs surrogate keys: Prefer surrogate for stability; keep natural key as alternate key for deduplication and joins.
Normalization vs denormalization: Normalize OLTP for integrity; denormalize analytics for scan efficiency and simpler queries.
Nullability: Allow NULL only when semantically unknown; otherwise use NOT NULL with defaults.
Wide vs narrow tables: Wide reduces joins but increases I/O; cluster/partition to mitigate. Narrow improves write throughput.
Soft deletes vs hard deletes: Prefer soft deletes or validity windows for lineage and reproducibility; hard delete only for compliance.
Encoding/collation: Choose UTF-8 and case rules consistently to prevent join mismatches.

Practical projects

Design a small OLTP schema for orders, customers, and payments with full constraints and indexes. Include comments on every column.
Build an analytics model: dim_customer (SCD2) and fact_orders with partitioning and clustering. Load a month of sample data.
Add data governance: tag PII columns, implement a masking view for analysts, and document retention and access rules.

Exercises

Do these and compare with the solutions. The same tasks appear in the Exercises section for convenience. Use the checklist to self-review.

Exercise 1: Design a user_events table for both OLTP and warehouse use.
- Define naming, columns, data types, keys, constraints, and indexes.
- Choose partitioning and clustering for the warehouse version.
- Add audit columns and a retention note.
Exercise 2: Design orders and order_items with integrity rules.
- PKs, FKs, CHECKs, defaults, and unique constraints.
- Partitioning strategy for the warehouse version.
- Indexing strategy for common queries.

Exercise self-check

Grain is explicitly stated in table comments or design notes.
Primary keys are stable and unique; natural keys captured as alternates.
Foreign keys or soft constraints exist with QA queries if FKs not enforced.
Partitioning and clustering choices match query patterns and data volume.
Indexes support top N filters/joins; no redundant indexes.
Nullability, defaults, and CHECK constraints enforce data rules.
Audit columns present; retention window defined.

Common mistakes and how to self-check

Missing grain definition: Add a one-line grain statement (e.g., one row per event occurrence).
Overusing TEXT/STRING: Replace with sized VARCHAR or structured types; add CHECKs.
Too many indexes: Drop low-selectivity or unused indexes after measuring usage.
No partition pruning: Partition on the filter used in 80% of scans (often occurred_at or order_date).
Surrogate key only: Keep natural key for dedup and conformance.
Unchecked JSON: Validate JSON shape on load or store parsed fields for hot queries.
PII mixed without tags: Tag sensitive columns; create masked views.

Mini challenge

You have a daily 500M-row clickstream table. Queries filter by event_date and user_id, and join to dim_campaign. Propose partitioning, clustering, and key strategy. Note any constraints and audit fields. Keep cost low and loading simple.

Sample direction (open after thinking)

Partition by event_date (daily).
Cluster by user_id, campaign_id.
No enforced PK; ensure uniqueness via (event_id) in ETL checks.
Columns: event_id, event_date, occurred_at, user_id, session_id, campaign_id, event_name, payload, load_id, created_at.
Retention: 180 days hot partitions; archive older to cheaper storage.

Who this is for

Data Architects defining platform-wide standards.
Data Engineers implementing schemas and pipelines.
Analytics Engineers modeling for BI and self-serve.

Prerequisites

Comfort with SQL DDL (CREATE/ALTER TABLE, indexes, constraints).
Understanding of OLTP vs analytics workloads.
Basic knowledge of your target database/warehouse features.

Learning path

Start with naming, grain, and column standards.
Add keys, constraints, and nullability rules.
Introduce indexing, partitioning, and clustering by workload.
Layer on audit columns, governance, and retention.
Practice with OLTP and warehouse examples; run EXPLAIN/monitor costs.

Next steps

Refactor one existing table to align with these standards and measure improvement.
Document a template: required columns, naming rules, and review checklist.
Take the quick test to validate understanding. Everyone can take it for free; only logged-in users get saved progress.

Menu

Table Design Standards

Table of Contents

Why this matters

Concept explained simply

Mental model

Core standards checklist

Worked examples

Design decisions: key trade-offs

Practical projects

Exercises

Exercise self-check

Common mistakes and how to self-check

Mini challenge

Who this is for

Prerequisites

Learning path

Next steps

Practice Exercises

Design a dual-use user_events table (OLTP + Warehouse)

Instructions

Expected Output

Design orders and order_items with integrity and performance

Table Design Standards — Quick Test

Have questions about Table Design Standards?

AI Assistant