Why this matters
Naming conventions make data discoverable, code reviewable, and systems maintainable. As a Data Architect, you define names for schemas, tables, columns, files, partitions, topics, indexes, and constraints. Good names prevent production bugs, simplify onboarding, and accelerate analytics.
- Real task: Standardize names across a new warehouse (dim_ and fct_ tables, clear column units).
- Real task: Define S3/ADLS paths and partition folders that tools can infer automatically.
- Real task: Enforce constraint/index naming so incident triage is fast and unambiguous.
Concept explained simply
Names are contracts between people and systems. They should be predictable, stable, and self-explanatory. Pick rules once, then apply everywhere.
Mental model
Think of every name as an API endpoint: short, consistent, and purpose-revealing. If someone can guess a name without asking you, your convention is working.
Core rules and patterns
- Case and separators: Use lowercase snake_case for most databases, files, and streams. Avoid spaces and mixedCase.
- Singular vs. plural: Tables often plural (dim_customer, fct_orders). Columns singular (customer_id, order_amount).
- Prefixes for object role: dim_, fct_ (or fct_), stg_ (staging), rpt_ (reporting marts). Pick one pattern and use it everywhere.
- Column clarity: Include units, grain, and role. Examples: amount_usd, temperature_c, created_at_utc, is_active, customer_id.
- Time grain suffixes: _h, _d, _wk, _m, _q, _y for hourly/daily/weekly/monthly/quarterly/yearly facts.
- SCD type suffix: dim_customer_scd2 for Type 2. Avoid vague names like dim_customer_history.
- Constraints/indexes: pk_
, fk_
_ , uq_ _
, ck_ _
, idx_ _
. - Environment tagging: Put env in higher-level containers (schemas/dbs/paths), not in business table names. Example schemas: prd_dw, stg_dw. Tables stay dim_customer, fct_orders.
- Reserved words: Avoid keywords (order, group). If unavoidable, qualify: order_id instead of order.
- Length and ASCII: Stay short but clear; stick to ASCII for portability.
- Abbreviations: Only common, documented ones (qty, amt, id, usd). Avoid cryptic custom abbreviations.
- Data lake layout: zone/domain/dataset/grain/partition=val. Use key=value folders for partitions (dt=YYYY-MM-DD, year=YYYY/month=MM/day=DD).
- Streaming topics: env.team.domain.entity.event.vN or env.domain.entity.vN. Keep consistent ordering.
Worked examples
Example 1: Warehouse tables and columns
-- Schema (environment at schema level, not table name) prd_dw -- Tables prd_dw.dim_customer_scd2 prd_dw.fct_orders_d -- Columns -- dim_customer_scd2 customer_id (business key or surrogate if named customer_sk) customer_sk (surrogate key) first_name last_name email_hash_sha256 (PII handled) effective_from_utc effective_to_utc is_current -- fct_orders_d order_id customer_sk order_amount_usd order_created_at_utc order_date (date) -- Constraints & indexes pk_dim_customer_scd2 (customer_sk) fk_fct_orders_d_customer_sk (customer_sk -> dim_customer_scd2.customer_sk) idx_fct_orders_d_order_date (order_date) uq_dim_customer_scd2_bk (customer_id, effective_to_utc)Example 2: Data lake/ADLS paths
-- Pattern: <zone>/<domain>/<dataset>/<grain>/dt=YYYY-MM-DD/part=<value>/ prd/finance/orders/d/dt=2024-06-30/region=us/parquet/ -- File names part-0001-0b.parquet (do not encode business meaning in file name; meaning is in the path) -- Alternative partitioning prd/marketing/campaign_events/h/year=2024/month=06/day=30/hour=14/Example 3: Streaming topics
-- Pattern: env.team.domain.entity.event.vN prd.core.sales.order.created.v1 stg.core.sales.order.updated.v1 -- Schema registry subjects align with topic and version order_created_value.v1Example 4: Staging and reporting
stg_dw.stg_orders_raw stg_dw.stg_customers_clean rpt_dw.rpt_sales_summary_mStep-by-step: define your standard
- Choose casing and character set. Default: lowercase snake_case, ASCII only.
- Decide scope placement. Environments (prd, stg) live in schemas/paths, not table names.
- Table classes. Pick prefixes (dim_, fct_, stg_, rpt_) and write 1–2 examples for each.
- Column rules. Keys (_id, _sk), booleans (is_/has_), timestamps (_at_utc), units (_usd, _cm), grain columns (order_date).
- Constraints and indexes. pk_, fk_, uq_, ck_, idx_ with table and column hints.
- Lake layout. Define zone/domain/dataset/grain and partition key=value folders.
- Streaming topics. Fix the token order and versioning strategy.
- Abbreviations list. Approve common ones; ban unclear ones.
- Examples + linter checklist. Document 10 canonical examples and a PR checklist.
Copy-ready checklist
- Lowercase snake_case everywhere
- Tables: dim_, fct_, stg_, rpt_ prefixes
- Columns include units/grain where relevant
- Timestamps end with _utc
- Booleans start with is_/has_
- Constraints: pk_/fk_/uq_/ck_ + table + columns
- Indexes: idx_ + table + columns
- Partitions: key=value folders (dt=YYYY-MM-DD)
- Env in schema/path, not table names
- Reserved words avoided
Exercises (do this now)
Exercise 1: Warehouse naming refactor
Rename the following to match the rules above. Include tables, columns, and constraints.
Schema: production_dw Tables: CustomerHistory, OrdersDaily Columns: CustomerHistory: ID, FirstName, LastName, Email, From, To, Current OrdersDaily: OrderID, CustKey, Amount, CreatedAt, Date Constraints: PK on CustomerHistory(ID) FK on OrdersDaily(CustKey) references CustomerHistory(ID)Need a hint?
- Use dim_customer_scd2 and fct_orders_d.
- Add units and time zone to columns where needed.
Exercise 2: Lake and streaming layout
Design a path and topic names:
Goal: store daily orders by region and expose created events. Current path idea: /prod/orders/20240630/us/ Current topic idea: ordersCreatedNeed a hint?
- Use zone/domain/dataset/grain and key=value partitions.
- Include env and version in topics.
Self-check checklist
- Lowercase snake_case used consistently
- Table prefixes clarify role (dim_, fct_, stg_, rpt_)
- Columns carry units, UTC timestamps, and clear roles
- Constraints/indexes named with pk_/fk_/idx_ patterns
- Partitions use key=value (dt=YYYY-MM-DD)
- Env appears in schemas/paths or topic tokens, not business table names
Common mistakes and self-check
- Mixing cases and separators. Fix: enforce lowercase snake_case linters in PRs.
- Putting env in table names. Fix: keep env at schema/path level.
- Ambiguous columns (amount, created_at). Fix: add units and timezone: amount_usd, created_at_utc.
- Forgetting grain. Fix: suffix fact names with _h/_d/_m.
- Vague history tables. Fix: scd suffix: dim_customer_scd2.
- Cryptic abbreviations. Fix: maintain an approved abbreviation list.
- Ignoring constraints/index names. Fix: standardize pk_/fk_/idx_ patterns for alerts and tooling.
- Using reserved words. Fix: append role (order_id) or rename entirely.
Quick self-audit
Pick three objects at random. Can a teammate unfamiliar with the system guess their purpose and relationships? If not, rename.
Practical projects
- Create a one-page naming standard with 10 before/after examples and apply it to a toy star schema (2 dims, 1 fact).
- Migrate a small dataset in a data lake to key=value partitions and validate tool compatibility.
- Define streaming topic tokens for 5 core events and simulate a version bump (v1 to v2).
Who this is for
- Data Architects defining enterprise data standards
- Data Engineers implementing warehouses/lakes/streams
- Analytics Engineers building marts and reports
Prerequisites
- Basic SQL and DDL (schemas, tables, constraints)
- Familiarity with data lake folders and partitions
- Awareness of event streaming concepts
Learning path
- Start: Naming conventions (this page)
- Next: Data types and units, partitioning strategy, indexing strategy
- Then: Governance and documentation practices
Next steps
- Adopt the checklist in your next PR
- Refactor a small domain to the standard
- Share the one-page standard with your team
Mini challenge
In 20 minutes, write names for: a Type 2 customer dimension, a daily orders fact, an hourly clickstream fact, an orders_created topic, and a lake path for finance invoices by month. Keep to the rules above.
Take the Quick Test
Ready to check your understanding? Take the Quick Test below. Available to everyone; if you are logged in, your progress will be saved.
Practice Exercises
2 exercises to complete
Instructions
Rename the following objects to a clean standard. Include tables, columns (with units/timezone where relevant), and constraints/indexes.
Schema: production_dw Tables: CustomerHistory, OrdersDaily Columns: CustomerHistory: ID, FirstName, LastName, Email, From, To, Current OrdersDaily: OrderID, CustKey, Amount, CreatedAt, Date Constraints: PK on CustomerHistory(ID) FK on OrdersDaily(CustKey) references CustomerHistory(ID)Expected Output
A consistent schema (e.g., prd_dw), tables dim_customer_scd2 and fct_orders_d, columns in snake_case with units/timezone, and pk_/fk_/idx_/uq_ names.Naming Conventions — Quick Test
Test your knowledge with 8 questions. Pass with 70% or higher.
8 questions70% to passHave questions about Naming Conventions?
AI Assistant
Ask questions about this tool