Why this matters for Data Architects
Conceptual and logical data modeling turn business language into precise, technology-agnostic structures. As a Data Architect, you use these models to align stakeholders, prevent costly rework, and guide downstream design (physical schemas, integration patterns, and governance). Good models reduce ambiguity, improve data quality, and make platforms scalable.
- Conceptual model: big-picture view of domains, entities, and relationships in business terms.
- Logical model: detailed attributes, keys, cardinalities, and constraints independent of any specific database.
Who this is for
- Data Architects defining enterprise data blueprints and standards.
- Business/Data Analysts aligning requirements with data structures.
Prerequisites
- Basic SQL reading knowledge (SELECT, JOIN) to validate relationships.
- Comfort discussing business processes and KPIs with stakeholders.
- Familiar with data quality concepts (uniqueness, completeness, consistency).
Learning path
- Define scope and domains: Identify business domains, boundaries, and outcomes (reports, integrations, decisions).
- Conceptual modeling: List entities, high-level relationships, and ubiquitous language; capture assumptions.
- Business glossary: Standardize terms and definitions; record ownership and allowed values.
- Logical modeling: Detail attributes, data types (abstract), keys, cardinalities, and optionality.
- Normalization: Decompose to eliminate redundancy (aim 3NF/BCNF where appropriate).
- Keys & identifiers: Choose natural vs surrogate keys; define alternate and composite keys.
- Master/Reference data: Model golden records, survivorship rules, and reference datasets.
- Cross-domain integration: Define data contracts, mapping, conformance, lineage, and change capture.
- Review & governance: Peer reviews, sign-offs, versioning, and model catalogs.
Facilitation tips
- Start workshops with business outcomes, not tables. Ask: "What questions must we answer reliably?"
- Draw relationships before attributes. Validate cardinalities with real scenarios.
- Document open questions and assumptions explicitly; resolve them before logical detail.
Worked examples
1) From questions to a conceptual model (Retail)
Business asks: "Which products drive repeat customer purchases by region?" Entities emerge: Customer, Order, Order Item, Product, Store, Region. Relationships: Customer places Order; Order has Order Item; Order Item references Product; Store belongs to Region; Order occurs at Store.
Conceptual representation (text)
- Customer — places — Order (1..* orders per customer)
- Order — contains — Order Item (1..* items per order)
- Order Item — references — Product (many items per product over time)
- Store — located in — Region (many stores per region)
- Order — occurs at — Store (1 store per order)
2) Identifying attributes, keys, and cardinalities
- Customer: customer_id (surrogate), email (natural candidate), name, created_at.
- Order: order_id (natural from POS? often surrogate), customer_id (FK), order_datetime, total_amount.
- Order Item: order_id + line_number (composite PK), product_id (FK), qty, unit_price.
- Product: product_id (surrogate), sku (candidate), name, category.
Cardinalities: Customer 1..* Order; Order 1..* Order Item; Product 1..* Order Item; Store 1..* Order; Region 1..* Store.
3) Normalizing to 3NF (Logic → abstract DDL)
-- Note: abstract types; implement later in physical design
CREATE TABLE Customer (
customer_id STRING PRIMARY KEY,
email STRING UNIQUE NOT NULL,
full_name STRING,
created_at TIMESTAMP
);
CREATE TABLE Product (
product_id STRING PRIMARY KEY,
sku STRING UNIQUE,
product_name STRING NOT NULL,
category STRING
);
CREATE TABLE Store (
store_id STRING PRIMARY KEY,
store_name STRING,
region_id STRING NOT NULL
);
CREATE TABLE Region (
region_id STRING PRIMARY KEY,
region_name STRING UNIQUE NOT NULL
);
CREATE TABLE "Order" (
order_id STRING PRIMARY KEY,
customer_id STRING NOT NULL,
store_id STRING NOT NULL,
order_datetime TIMESTAMP,
total_amount DECIMAL,
FOREIGN KEY (customer_id) REFERENCES Customer(customer_id),
FOREIGN KEY (store_id) REFERENCES Store(store_id)
);
CREATE TABLE OrderItem (
order_id STRING NOT NULL,
line_number INT NOT NULL,
product_id STRING NOT NULL,
qty INT NOT NULL,
unit_price DECIMAL NOT NULL,
PRIMARY KEY (order_id, line_number),
FOREIGN KEY (order_id) REFERENCES "Order"(order_id),
FOREIGN KEY (product_id) REFERENCES Product(product_id)
);
Note how Region is separated from Store to avoid repeating region properties on every store.
4) Modeling Master Data (Customer golden record)
Sources: CRM, Billing, Support. We define a golden Customer with survivorship rules:
- Identifier: customer_guid (surrogate).
- Match rules: email exact; phone normalized; name fuzzy.
- Survivorship: email from CRM if verified else most recent non-null; address from Billing; marketing_opt_in from latest consent event.
CREATE TABLE CustomerMaster (
customer_guid STRING PRIMARY KEY,
email STRING,
phone STRING,
full_name STRING,
address STRING,
marketing_opt_in BOOLEAN,
master_version INT,
effective_from TIMESTAMP,
effective_to TIMESTAMP
);
CREATE TABLE CustomerCrossRef (
customer_guid STRING NOT NULL,
source_system STRING NOT NULL,
source_customer_id STRING NOT NULL,
PRIMARY KEY (customer_guid, source_system),
UNIQUE (source_system, source_customer_id)
);
5) Cross-domain integration (Customer ↔ Orders)
Orders originate from POS with pos_customer_id. Integration uses cross-reference to map pos_customer_id → customer_guid. We maintain stable joins even if source IDs change.
Integration contract checklist
- Key mapping rule (natural → surrogate)
- Allowed nullability and defaults
- Change data semantics (updates, late-arriving, deletes)
- Versioning and effective dating
- Data quality thresholds and rejection logic
Drills and exercises
- Write 5 business questions and extract entities and relationships for each.
- For a chosen domain, list natural, surrogate, and alternate keys for 5 entities.
- Normalize a denormalized customer order spreadsheet to 3NF.
- Draft 10 glossary terms with owner, definition, and allowed values.
- Create a mini MDM design: matching fields, survivorship rules, and cross-reference.
- Define an integration contract: field-level mapping and change handling.
Common mistakes and debugging tips
- Mashing conceptual and logical views together. Tip: keep a clean conceptual diagram; derive logical details later.
- Overusing natural keys that are volatile (e.g., emails). Tip: introduce surrogates; keep natural keys with unique constraints.
- Skipping cardinalities. Tip: validate with concrete scenarios and sample data counts.
- Over-normalizing read-heavy domains. Tip: model in 3NF, but consider denormalized projections later in physical design.
- Ignoring reference data stewardship. Tip: assign owners, refresh cycles, and validation rules.
- Ambiguous glossary terms. Tip: add examples and non-examples; record synonyms and deprecated terms.
Mini project: Modernize a marketplace data model
Scenario: A marketplace has Sellers, Buyers, Listings, Orders, Payouts, and Disputes. Build conceptual and logical models and a simple MDM for Seller.
- Conceptual: Draw entities and relationships (Buyer places Order; Order contains Items; Item references Listing; Seller owns Listing; Payout settles Orders; Buyer may raise Dispute).
- Glossary: Define Seller, Buyer, Listing, Order, Payout, Dispute with owners.
- Logical: Attributes, keys, cardinalities; model Dispute states and transitions as reference data.
- Normalization: Achieve 3NF; separate Payout and Payout Line.
- MDM: Create SellerMaster and SellerCrossRef with match and survivorship rules.
- Integration: Specify mapping from Payments system to Orders/Payouts with change semantics.
Acceptance checklist
- Every entity has a clear business owner and definition.
- All relationships have cardinality and optionality defined.
- Keys chosen with justification (natural vs surrogate).
- No repeating groups or transitive dependencies (3NF).
- MDM covers matching and survivorship across at least 2 sources.
Practical projects
- Customer 360 Lite: Conceptual/logical model for customer profiles across CRM and Support, with basic golden record rules.
- Product Catalog: Normalize product, category, variant, and attribute models; define reference data and validation.
- Subscription Billing: Model accounts, plans, subscriptions, invoices, payments; handle proration and cancellations.
Subskills
- Domain Modeling And Boundaries — Identify domains, subdomains, and clear ownership boundaries.
- Entity Relationship Modeling — Define entities and relationships with accurate cardinalities and optionality.
- Defining Business Glossary — Standardize terms with owners, definitions, and allowed values.
- Normalization Principles — Apply 1NF, 2NF, 3NF/BCNF to reduce redundancy.
- Identifying Keys And Relationships — Choose primary, alternate, composite, and surrogate keys.
- Handling Master Data Concepts — Model golden records, cross-references, and survivorship.
- Modeling Cross Domain Integration — Specify contracts, mappings, and change semantics across domains.
- Review And Governance Of Models — Run reviews, approvals, versioning, and catalog updates.
Next steps
- Work through each subskill with the drills above.
- Complete the mini project and ask a peer to review for clarity and completeness.
- Take the skill exam to validate your understanding. Anyone can take it; logged-in users get saved progress.