Menu

Topic 8 of 8

Naming Conventions

Learn Naming Conventions for free with explanations, exercises, and a quick test (for Data Architect).

Published: January 18, 2026 | Updated: January 18, 2026

Why this matters

Naming conventions make data discoverable, code reviewable, and systems maintainable. As a Data Architect, you define names for schemas, tables, columns, files, partitions, topics, indexes, and constraints. Good names prevent production bugs, simplify onboarding, and accelerate analytics.

  • Real task: Standardize names across a new warehouse (dim_ and fct_ tables, clear column units).
  • Real task: Define S3/ADLS paths and partition folders that tools can infer automatically.
  • Real task: Enforce constraint/index naming so incident triage is fast and unambiguous.

Concept explained simply

Names are contracts between people and systems. They should be predictable, stable, and self-explanatory. Pick rules once, then apply everywhere.

Mental model

Think of every name as an API endpoint: short, consistent, and purpose-revealing. If someone can guess a name without asking you, your convention is working.

Core rules and patterns

  • Case and separators: Use lowercase snake_case for most databases, files, and streams. Avoid spaces and mixedCase.
  • Singular vs. plural: Tables often plural (dim_customer, fct_orders). Columns singular (customer_id, order_amount).
  • Prefixes for object role: dim_, fct_ (or fct_), stg_ (staging), rpt_ (reporting marts). Pick one pattern and use it everywhere.
  • Column clarity: Include units, grain, and role. Examples: amount_usd, temperature_c, created_at_utc, is_active, customer_id.
  • Time grain suffixes: _h, _d, _wk, _m, _q, _y for hourly/daily/weekly/monthly/quarterly/yearly facts.
  • SCD type suffix: dim_customer_scd2 for Type 2. Avoid vague names like dim_customer_history.
  • Constraints/indexes: pk_, fk__, uq_
    _, ck_
    _, idx_
    _.
  • Environment tagging: Put env in higher-level containers (schemas/dbs/paths), not in business table names. Example schemas: prd_dw, stg_dw. Tables stay dim_customer, fct_orders.
  • Reserved words: Avoid keywords (order, group). If unavoidable, qualify: order_id instead of order.
  • Length and ASCII: Stay short but clear; stick to ASCII for portability.
  • Abbreviations: Only common, documented ones (qty, amt, id, usd). Avoid cryptic custom abbreviations.
  • Data lake layout: zone/domain/dataset/grain/partition=val. Use key=value folders for partitions (dt=YYYY-MM-DD, year=YYYY/month=MM/day=DD).
  • Streaming topics: env.team.domain.entity.event.vN or env.domain.entity.vN. Keep consistent ordering.
  • Worked examples

    Example 1: Warehouse tables and columns

    -- Schema (environment at schema level, not table name)
    prd_dw
    
    -- Tables
    prd_dw.dim_customer_scd2
    prd_dw.fct_orders_d
    
    -- Columns
    -- dim_customer_scd2
    customer_id (business key or surrogate if named customer_sk)
    customer_sk (surrogate key)
    first_name
    last_name
    email_hash_sha256 (PII handled)
    effective_from_utc
    effective_to_utc
    is_current
    
    -- fct_orders_d
    order_id
    customer_sk
    order_amount_usd
    order_created_at_utc
    order_date (date)
    
    -- Constraints & indexes
    pk_dim_customer_scd2 (customer_sk)
    fk_fct_orders_d_customer_sk (customer_sk -> dim_customer_scd2.customer_sk)
    idx_fct_orders_d_order_date (order_date)
    uq_dim_customer_scd2_bk (customer_id, effective_to_utc)
    

    Example 2: Data lake/ADLS paths

    -- Pattern: <zone>/<domain>/<dataset>/<grain>/dt=YYYY-MM-DD/part=<value>/
    prd/finance/orders/d/dt=2024-06-30/region=us/parquet/
    
    -- File names
    part-0001-0b.parquet (do not encode business meaning in file name; meaning is in the path)
    
    -- Alternative partitioning
    prd/marketing/campaign_events/h/year=2024/month=06/day=30/hour=14/
    

    Example 3: Streaming topics

    -- Pattern: env.team.domain.entity.event.vN
    prd.core.sales.order.created.v1
    stg.core.sales.order.updated.v1
    
    -- Schema registry subjects align with topic and version
    order_created_value.v1
    

    Example 4: Staging and reporting

    stg_dw.stg_orders_raw
    stg_dw.stg_customers_clean
    rpt_dw.rpt_sales_summary_m
    

    Step-by-step: define your standard

    1. Choose casing and character set. Default: lowercase snake_case, ASCII only.
    2. Decide scope placement. Environments (prd, stg) live in schemas/paths, not table names.
    3. Table classes. Pick prefixes (dim_, fct_, stg_, rpt_) and write 1–2 examples for each.
    4. Column rules. Keys (_id, _sk), booleans (is_/has_), timestamps (_at_utc), units (_usd, _cm), grain columns (order_date).
    5. Constraints and indexes. pk_, fk_, uq_, ck_, idx_ with table and column hints.
    6. Lake layout. Define zone/domain/dataset/grain and partition key=value folders.
    7. Streaming topics. Fix the token order and versioning strategy.
    8. Abbreviations list. Approve common ones; ban unclear ones.
    9. Examples + linter checklist. Document 10 canonical examples and a PR checklist.
    Copy-ready checklist
    • Lowercase snake_case everywhere
    • Tables: dim_, fct_, stg_, rpt_ prefixes
    • Columns include units/grain where relevant
    • Timestamps end with _utc
    • Booleans start with is_/has_
    • Constraints: pk_/fk_/uq_/ck_ + table + columns
    • Indexes: idx_ + table + columns
    • Partitions: key=value folders (dt=YYYY-MM-DD)
    • Env in schema/path, not table names
    • Reserved words avoided

    Exercises (do this now)

    Exercise 1: Warehouse naming refactor

    Rename the following to match the rules above. Include tables, columns, and constraints.

    Schema: production_dw
    Tables:
    CustomerHistory, OrdersDaily
    Columns:
    CustomerHistory: ID, FirstName, LastName, Email, From, To, Current
    OrdersDaily: OrderID, CustKey, Amount, CreatedAt, Date
    Constraints:
    PK on CustomerHistory(ID)
    FK on OrdersDaily(CustKey) references CustomerHistory(ID)
    
    Need a hint?
    • Use dim_customer_scd2 and fct_orders_d.
    • Add units and time zone to columns where needed.

    Exercise 2: Lake and streaming layout

    Design a path and topic names:

    Goal: store daily orders by region and expose created events.
    Current path idea: /prod/orders/20240630/us/
    Current topic idea: ordersCreated
    
    Need a hint?
    • Use zone/domain/dataset/grain and key=value partitions.
    • Include env and version in topics.

    Self-check checklist

    • Lowercase snake_case used consistently
    • Table prefixes clarify role (dim_, fct_, stg_, rpt_)
    • Columns carry units, UTC timestamps, and clear roles
    • Constraints/indexes named with pk_/fk_/idx_ patterns
    • Partitions use key=value (dt=YYYY-MM-DD)
    • Env appears in schemas/paths or topic tokens, not business table names

    Common mistakes and self-check

    • Mixing cases and separators. Fix: enforce lowercase snake_case linters in PRs.
    • Putting env in table names. Fix: keep env at schema/path level.
    • Ambiguous columns (amount, created_at). Fix: add units and timezone: amount_usd, created_at_utc.
    • Forgetting grain. Fix: suffix fact names with _h/_d/_m.
    • Vague history tables. Fix: scd suffix: dim_customer_scd2.
    • Cryptic abbreviations. Fix: maintain an approved abbreviation list.
    • Ignoring constraints/index names. Fix: standardize pk_/fk_/idx_ patterns for alerts and tooling.
    • Using reserved words. Fix: append role (order_id) or rename entirely.
    Quick self-audit

    Pick three objects at random. Can a teammate unfamiliar with the system guess their purpose and relationships? If not, rename.

    Practical projects

    • Create a one-page naming standard with 10 before/after examples and apply it to a toy star schema (2 dims, 1 fact).
    • Migrate a small dataset in a data lake to key=value partitions and validate tool compatibility.
    • Define streaming topic tokens for 5 core events and simulate a version bump (v1 to v2).

    Who this is for

    • Data Architects defining enterprise data standards
    • Data Engineers implementing warehouses/lakes/streams
    • Analytics Engineers building marts and reports

    Prerequisites

    • Basic SQL and DDL (schemas, tables, constraints)
    • Familiarity with data lake folders and partitions
    • Awareness of event streaming concepts

    Learning path

    • Start: Naming conventions (this page)
    • Next: Data types and units, partitioning strategy, indexing strategy
    • Then: Governance and documentation practices

    Next steps

    • Adopt the checklist in your next PR
    • Refactor a small domain to the standard
    • Share the one-page standard with your team

    Mini challenge

    In 20 minutes, write names for: a Type 2 customer dimension, a daily orders fact, an hourly clickstream fact, an orders_created topic, and a lake path for finance invoices by month. Keep to the rules above.

    Take the Quick Test

    Ready to check your understanding? Take the Quick Test below. Available to everyone; if you are logged in, your progress will be saved.

    Practice Exercises

    2 exercises to complete

    Instructions

    Rename the following objects to a clean standard. Include tables, columns (with units/timezone where relevant), and constraints/indexes.

    Schema: production_dw
    Tables:
    CustomerHistory, OrdersDaily
    Columns:
    CustomerHistory: ID, FirstName, LastName, Email, From, To, Current
    OrdersDaily: OrderID, CustKey, Amount, CreatedAt, Date
    Constraints:
    PK on CustomerHistory(ID)
    FK on OrdersDaily(CustKey) references CustomerHistory(ID)
    
    Expected Output
    A consistent schema (e.g., prd_dw), tables dim_customer_scd2 and fct_orders_d, columns in snake_case with units/timezone, and pk_/fk_/idx_/uq_ names.

    Naming Conventions — Quick Test

    Test your knowledge with 8 questions. Pass with 70% or higher.

    8 questions70% to pass

    Have questions about Naming Conventions?

    AI Assistant

    Ask questions about this tool