Menu

Topic 3 of 8

Table Design Standards

Learn Table Design Standards for free with explanations, exercises, and a quick test (for Data Architect).

Published: January 18, 2026 | Updated: January 18, 2026

Why this matters

Table design standards prevent costly rework, performance bottlenecks, and data quality issues. As a Data Architect, you set conventions so engineers, analysts, and BI tools can rely on consistent, fast, and safe data structures.

  • Faster queries: Correct keys, indexes, and partitioning cut costs and latency.
  • Reliable data: Constraints and defaults reduce nulls, duplicates, and drift.
  • Scalability: Storage and distribution choices scale with volume and concurrency.
  • Governance: Naming, PII tagging, and audit columns make compliance manageable.

Concept explained simply

Table design standards are your team’s rules for how every table is named, structured, constrained, and tuned. They keep thousands of tables understandable and high-performing.

Mental model

Think of each table as a contract:

  • Name tells purpose and grain.
  • Columns define facts and dimensions at that grain.
  • Keys, constraints, and indexes enforce the contract.
  • Partitioning, clustering, and compression ensure it’s fast and affordable.
  • Audit and metadata columns make it trustworthy and governable.

Core standards checklist

Open checklist
  • Naming
    • snake_case, singular for OLTP (e.g., customer), plural or fact_/dim_ for analytics (fact_orders, dim_customer)
    • Prefix schemas by domain (sales.customer, core.fact_events)
    • Column names descriptive and consistent (created_at, updated_at, is_active)
  • Data types & nullability
    • Use the narrowest type that fits; prefer explicit precision (decimal(18,2) for money)
    • Timestamp with timezone for cross-region data
    • Avoid generic TEXT for structured values; prefer enums or constrained VARCHAR
    • Define nullability intentionally; add NOT NULL where possible with defaults
  • Keys & constraints
    • Primary key: surrogate (id BIGINT) for OLTP; stable natural key if truly immutable
    • Foreign keys for referential integrity in OLTP; soft FKs in warehouses (enforced via QA)
    • Unique constraints for business uniqueness (email unique per tenant, etc.)
    • CHECK constraints for domain rules (quantity >= 0)
  • Indexes & clustering
    • OLTP: 1 clustered/primary index + selective nonclustered indexes on frequent filters/joins
    • Warehouses: partition by date/time; cluster/sort by high-cardinality join/filter columns
    • Avoid over-indexing writes; measure before adding
  • Partitioning & distribution
    • Time-based partitioning for event/append-only tables
    • Distribute/cluster on join keys in MPP systems
    • Prune old partitions per retention policy
  • Compression & storage
    • Columnar compression for analytics tables
    • Choose row vs columnar store by workload
  • Audit & lifecycle
    • Include created_at, updated_at, source_system, load_id/batch_id
    • Soft delete flag (is_deleted) or valid_from/valid_to for history
    • Data retention policy per table
  • Security & governance
    • Tag sensitive columns (pii, phi); apply masking and row/column-level access
    • Consistent collation/encoding; store PII as hashed where possible
  • Documentation
    • Table and column comments include grain, source, load logic, and owner

Worked examples

Example 1 — OLTP customer table
-- Goals: fast point lookups, enforce uniqueness, track updates
CREATE TABLE sales.customer (
  id BIGSERIAL PRIMARY KEY,
  tenant_id BIGINT NOT NULL,
  email VARCHAR(254) NOT NULL,
  given_name VARCHAR(100) NOT NULL,
  family_name VARCHAR(100) NOT NULL,
  phone VARCHAR(32),
  is_active BOOLEAN NOT NULL DEFAULT TRUE,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  CONSTRAINT uq_customer_email_per_tenant UNIQUE (tenant_id, email),
  CONSTRAINT chk_phone_format CHECK (phone IS NULL OR phone ~ '^[0-9+\-() ]+$')
);
-- Index for tenant-scoped queries
CREATE INDEX ix_customer_tenant_active ON sales.customer (tenant_id, is_active);
-- Trigger or app logic to update updated_at on change.
Example 2 — Analytics fact_events (partitioned)
-- Goals: append-only, fast time-range scans and user filters
-- Generic warehouse SQL (conceptual)
CREATE TABLE core.fact_events (
  event_id BIGINT,
  occurred_at TIMESTAMP NOT NULL,
  user_id BIGINT,
  session_id STRING,
  event_name STRING NOT NULL,
  source STRING,
  payload VARIANT, -- JSON-like type in many warehouses
  load_id STRING,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
PARTITION BY DATE(occurred_at)
CLUSTER BY (user_id, event_name);
-- Notes: No PK enforced in many warehouses; ensure uniqueness via ETL and QA queries.
Example 3 — SCD Type 2 dimension (dim_customer)
-- Goals: track history; one current row per business key
CREATE TABLE dim.dim_customer (
  customer_sk BIGINT PRIMARY KEY, -- surrogate key
  customer_bk STRING NOT NULL,    -- natural/business key
  given_name STRING,
  family_name STRING,
  email STRING,
  is_active BOOLEAN,
  valid_from TIMESTAMP NOT NULL,
  valid_to TIMESTAMP,
  is_current BOOLEAN NOT NULL,
  load_id STRING,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Rules:
-- - (customer_bk, valid_from) unique
-- - is_current = TRUE implies valid_to IS NULL
-- - Query current state with is_current = TRUE

Design decisions: key trade-offs

  • Natural vs surrogate keys: Prefer surrogate for stability; keep natural key as alternate key for deduplication and joins.
  • Normalization vs denormalization: Normalize OLTP for integrity; denormalize analytics for scan efficiency and simpler queries.
  • Nullability: Allow NULL only when semantically unknown; otherwise use NOT NULL with defaults.
  • Wide vs narrow tables: Wide reduces joins but increases I/O; cluster/partition to mitigate. Narrow improves write throughput.
  • Soft deletes vs hard deletes: Prefer soft deletes or validity windows for lineage and reproducibility; hard delete only for compliance.
  • Encoding/collation: Choose UTF-8 and case rules consistently to prevent join mismatches.

Practical projects

  • Design a small OLTP schema for orders, customers, and payments with full constraints and indexes. Include comments on every column.
  • Build an analytics model: dim_customer (SCD2) and fact_orders with partitioning and clustering. Load a month of sample data.
  • Add data governance: tag PII columns, implement a masking view for analysts, and document retention and access rules.

Exercises

Do these and compare with the solutions. The same tasks appear in the Exercises section for convenience. Use the checklist to self-review.

  1. Exercise 1: Design a user_events table for both OLTP and warehouse use.
    • Define naming, columns, data types, keys, constraints, and indexes.
    • Choose partitioning and clustering for the warehouse version.
    • Add audit columns and a retention note.
  2. Exercise 2: Design orders and order_items with integrity rules.
    • PKs, FKs, CHECKs, defaults, and unique constraints.
    • Partitioning strategy for the warehouse version.
    • Indexing strategy for common queries.

Exercise self-check

  • Grain is explicitly stated in table comments or design notes.
  • Primary keys are stable and unique; natural keys captured as alternates.
  • Foreign keys or soft constraints exist with QA queries if FKs not enforced.
  • Partitioning and clustering choices match query patterns and data volume.
  • Indexes support top N filters/joins; no redundant indexes.
  • Nullability, defaults, and CHECK constraints enforce data rules.
  • Audit columns present; retention window defined.

Common mistakes and how to self-check

  • Missing grain definition: Add a one-line grain statement (e.g., one row per event occurrence).
  • Overusing TEXT/STRING: Replace with sized VARCHAR or structured types; add CHECKs.
  • Too many indexes: Drop low-selectivity or unused indexes after measuring usage.
  • No partition pruning: Partition on the filter used in 80% of scans (often occurred_at or order_date).
  • Surrogate key only: Keep natural key for dedup and conformance.
  • Unchecked JSON: Validate JSON shape on load or store parsed fields for hot queries.
  • PII mixed without tags: Tag sensitive columns; create masked views.

Mini challenge

You have a daily 500M-row clickstream table. Queries filter by event_date and user_id, and join to dim_campaign. Propose partitioning, clustering, and key strategy. Note any constraints and audit fields. Keep cost low and loading simple.

Sample direction (open after thinking)
  • Partition by event_date (daily).
  • Cluster by user_id, campaign_id.
  • No enforced PK; ensure uniqueness via (event_id) in ETL checks.
  • Columns: event_id, event_date, occurred_at, user_id, session_id, campaign_id, event_name, payload, load_id, created_at.
  • Retention: 180 days hot partitions; archive older to cheaper storage.

Who this is for

  • Data Architects defining platform-wide standards.
  • Data Engineers implementing schemas and pipelines.
  • Analytics Engineers modeling for BI and self-serve.

Prerequisites

  • Comfort with SQL DDL (CREATE/ALTER TABLE, indexes, constraints).
  • Understanding of OLTP vs analytics workloads.
  • Basic knowledge of your target database/warehouse features.

Learning path

  1. Start with naming, grain, and column standards.
  2. Add keys, constraints, and nullability rules.
  3. Introduce indexing, partitioning, and clustering by workload.
  4. Layer on audit columns, governance, and retention.
  5. Practice with OLTP and warehouse examples; run EXPLAIN/monitor costs.

Next steps

  • Refactor one existing table to align with these standards and measure improvement.
  • Document a template: required columns, naming rules, and review checklist.
  • Take the quick test to validate understanding. Everyone can take it for free; only logged-in users get saved progress.

Practice Exercises

2 exercises to complete

Instructions

Design a user_events table that has an OLTP version for minimal writes and a warehouse version optimized for analytics.

  1. Define table purpose and grain in a one-liner.
  2. List required columns: event_id, user_id, occurred_at, event_name, source, payload, plus audit columns.
  3. Choose data types for: Postgres (OLTP) and a columnar warehouse. Be explicit about timestamp/timezone.
  4. Specify keys and constraints: PK, uniqueness, CHECKs.
  5. Indexing: OLTP selective indexes; Warehouse partitioning and clustering.
  6. Add retention and governance notes (PII, masking).
Expected Output
Two DDL snippets (OLTP and warehouse) with comments describing grain, keys, constraints, and performance design.

Table Design Standards — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Table Design Standards?

AI Assistant

Ask questions about this tool