Why this matters
As a Data Architect, you turn business processes into reliable models. Getting keys and relationships right prevents duplicates, broken joins, and integrity issues that later become costly migrations. You will use this skill when designing conceptual ERDs, logical schemas, and data contracts across domains like customers, orders, inventory, and analytics marts.
- Define stable identifiers (primary keys) for each entity.
- Choose between natural and surrogate keys with trade-offs.
- Resolve many-to-many relationships safely with associative entities.
- Specify cardinality (1:1, 1:n, n:m) and optionality (must have vs may have).
- Prepare models that translate cleanly into physical databases and data warehouses.
Concept explained simply
Keys are attributes that uniquely identify a record. Relationships link entities based on business rules. Together, they ensure that data connects consistently across the system.
- Primary key (PK): the chosen unique identifier for an entity.
- Candidate keys: all attributes or combinations that could uniquely identify a record; PK is one of them.
- Surrogate key: a system-generated identifier (e.g., integer or UUID).
- Natural key: a real-world identifier (e.g., ISBN, email) that already exists in the domain.
- Foreign key (FK): an attribute that references another entity’s PK to form a relationship.
- Cardinality: how many of one entity relate to how many of another (1:1, 1:n, n:m).
- Optionality: whether the relationship is required (1) or optional (0) at each side (e.g., 0..n, 1..n).
Mental model
Imagine each entity as a set of unique cards. The primary key is the card’s unique code. Relationships are lines connecting cards across decks. Cardinality says how many lines are allowed; optionality says whether a line must exist.
Step-by-step method
- List entities: Name the core nouns in the domain (Customer, Order, Product).
- List attributes: For each entity, list properties (OrderDate, Email).
- Find candidate keys: Which single or combined attributes uniquely identify a record?
- Choose the primary key: Prefer stable, minimal, unique keys. Decide natural vs surrogate with trade-offs.
- Identify relationships: For every pair of entities that interact, ask: who owns what? how many? must it exist?
- Set cardinality and optionality: Express as 0..1, 1..1, 0..n, or 1..n on each side.
- Resolve many-to-many: Create an associative entity (bridge) with FKs and its own PK.
- Check integrity: Can every FK find a valid parent? Are PKs minimal and stable?
- Document decisions: Record why you picked a surrogate or natural key and any constraints.
Pro tips
- Pick surrogate keys when natural keys are long, mutable, or composite.
- Keep natural keys as alternate keys if they are meaningful for deduplication (e.g., ISBN).
- Use consistent naming: singular entity names, PK as EntityID, FKs as EntityID in child entities.
- Model optionality from business rules, not convenience. Ask stakeholders “Can this exist without that?”
Worked examples
Example 1: Customer–Order–OrderLine
- Entities: Customer, Order, OrderLine, Product
- Keys:
- Customer: CustomerID (PK, surrogate), Email (alternate unique)
- Order: OrderID (PK), CustomerID (FK)
- OrderLine: OrderLineID (PK), OrderID (FK), ProductID (FK)
- Product: ProductID (PK), SKU (alternate unique)
- Relationships:
- Customer 1..n Orders (mandatory on Order side: 1..n; optional on Customer side for having orders: 0..n)
- Order 1..n OrderLines (1..n)
- Product 1..n OrderLines; OrderLine belongs to exactly one Product (1..n)
- Notes: OrderLine carries quantity and price-at-time. No many-to-many remains unresolved.
Example 2: Students–Courses–Enrollment (resolving many-to-many)
- Entities: Student, Course, Enrollment
- Keys:
- Student: StudentID (PK), Email (alternate)
- Course: CourseID (PK), Code (alternate)
- Enrollment: EnrollmentID (PK), StudentID (FK), CourseID (FK), Term, Grade
- Relationships:
- Student n..m Course via Enrollment
- In logical model, Enrollment has FKs to Student and Course; optionally enforce unique(StudentID, CourseID, Term)
- Notes: Enrollment is an associative entity; it carries attributes (Term, Grade) about the relationship.
Example 3: Products–Suppliers–Inventory
- Entities: Product, Supplier, SupplierProduct, Inventory
- Keys:
- Product: ProductID (PK), SKU (alternate)
- Supplier: SupplierID (PK), TaxID (alternate)
- SupplierProduct: SupplierProductID (PK), SupplierID (FK), ProductID (FK), SupplierSKU
- Inventory: InventoryID (PK), ProductID (FK), LocationID, OnHandQty
- Relationships:
- Supplier n..m Product via SupplierProduct
- Product 1..n Inventory (per Location)
- Notes: Consider unique(SupplierID, ProductID) on SupplierProduct to avoid duplicates.
Example 4: Self-relationship (Users–Friendship)
- Entities: User, Friendship
- Keys:
- User: UserID (PK), Handle (alternate)
- Friendship: FriendshipID (PK), UserID_A (FK to User), UserID_B (FK to User), CreatedAt
- Relationship: User n..m User via Friendship (self-referencing). Enforce rule: UserID_A < UserID_B and unique(UserID_A, UserID_B) to avoid duplicates and inverses.
Exercises
Do these before the quick test. Tip: state PKs, alternate keys, FKs, cardinality, and optionality explicitly.
Exercise 1 — Retail Bookstore
Mirror of exercise ex1 below.
Open instructions
Entities observed: Book, Author, Publisher, Customer, Order, OrderLine.
- Propose PKs and at least one alternate key where appropriate.
- Define relationships with cardinality and optionality.
- Resolve any many-to-many with an associative entity and name it.
- List constraints you would add (e.g., unique combinations).
Exercise 2 — Clinic Scheduling
Mirror of exercise ex2 below.
Open instructions
Entities observed: Patient, Doctor, Appointment, Prescription, Medication.
- Choose PKs (surrogate or natural) and justify briefly.
- Define relationships, including optionality (can a Prescription exist without an Appointment?).
- Identify any many-to-many and resolve it.
- List unique constraints that prevent duplicates.
Self-check checklist
- Every entity has a single, stable PK.
- Candidate keys identified; alternates added where useful.
- All many-to-many relationships resolved via associative entities.
- Cardinality and optionality stated on both sides.
- Foreign keys always reference an existing parent entity.
- Business rules captured as uniqueness or NOT NULL constraints.
Common mistakes and how to self-check
- Using mutable attributes as PKs (e.g., email). Fix: choose a surrogate PK; keep email as alternate unique.
- Leaving many-to-many unmodeled. Fix: introduce an associative entity and move relationship attributes into it.
- Missing optionality. Fix: ask whether the child can exist without the parent; mark 0..1 vs 1..1 accordingly.
- Overusing composite PKs. Fix: prefer a simple surrogate PK and enforce natural uniqueness with a unique constraint.
- Ambiguous naming (IDs not clearly referencing parents). Fix: use ParentEntityID naming for FKs.
Practical projects
- Design an ERD for a subscription service (User, Plan, Subscription, Payment). Include failed payments and plan changes.
- Model a logistics network (Warehouse, Route, Vehicle, Shipment, Stop). Capture route assignments and stop sequences.
- Create a learning platform model (Learner, Course, Module, Enrollment, Assessment). Include attempts and grading.
Mini tasks for each project
- List entities and candidate keys.
- Choose PKs and justify natural vs surrogate.
- Define relationships with cardinality and optionality.
- Resolve all many-to-many cases; add relationship attributes.
- Write three uniqueness constraints that enforce business rules.
Who this is for
- Aspiring and practicing Data Architects
- Data Engineers designing source schemas and marts
- Analysts formalizing business concepts into datasets
Prerequisites
- Basic understanding of entities and attributes
- Familiarity with relational concepts (unique, not null)
- Ability to read simple ER diagrams
Learning path
- Identify entities and attributes from business narratives.
- List candidate keys; choose stable PKs.
- Define relationships with cardinality and optionality.
- Resolve many-to-many via associative entities.
- Validate with stakeholders; iterate on edge cases.
Next steps
- Complete the exercises, then take the quick test below.
- Apply the method to one of the practical projects.
- Review your model with a peer and refine optionality and uniqueness constraints.
Note: The quick test is available to everyone; only logged-in users will see saved progress.
Mini challenge
In a marketplace with Buyer, Seller, Listing, Offer, and Payment, identify all keys, relationships, cardinality, and optionality. Name any associative entities and list at least three business rules as unique constraints. Keep it to one page.