Menu

Topic 7 of 8

Normalization Principles

Learn Normalization Principles for free with explanations, exercises, and a quick test (for Data Architect).

Published: January 18, 2026 | Updated: January 18, 2026

Why this matters

As a Data Architect, you turn messy, duplicate-prone data into reliable structures that feed analytics, applications, and governance. Normalization gives you the rules to remove anomalies (insert, update, delete problems), reduce redundancy, and produce logical models that scale. You will use this when designing source system schemas, standardizing conceptual models into logical ones, and preparing stable layers before denormalized marts.

  • Real tasks: designing OLTP schemas, reviewing vendor data models, mapping conceptual entities to 3NF, and approving naming/keys/relationships.
  • Impact: fewer bugs, easier changes, smaller storage, simpler lineage, and predictable joins.

Concept explained simply

Normalization is a set of rules that ensure each fact lives in one place and depends on the right key.

Mental model

Imagine putting each unique fact on a labeled card. Cards are grouped by what uniquely identifies them (their key). If a card depends on another card (like City depends on ZipCode), you split it so each group is defined by its own key. The result: no duplicates, no contradictory facts.

The rules (practical version)

  • 1NF: No repeating groups; attributes are atomic; each field has a single value per row.
  • 2NF: If a table has a composite key, non-key attributes must depend on the whole key, not part of it (no partial dependency).
  • 3NF: Non-key attributes must not depend on other non-key attributes (no transitive dependency).
  • BCNF (bonus): Every determinant is a candidate key. Use when odd dependencies remain even after 3NF.
Common anomalies you avoid
  • Insert anomaly: Cannot add a course until a student enrolls.
  • Update anomaly: Changing a product name requires updating many rows.
  • Delete anomaly: Deleting the last order also deletes the only record of a customer.

Worked examples

Example 1 — Student enrollments to 3NF

Raw table (EnrollmentRaw):

StudentID, StudentName, CourseID, CourseName, Grade

Issues: StudentName depends only on StudentID. CourseName depends only on CourseID. Grade depends on the pair (StudentID, CourseID). Partial and transitive dependencies exist.

Normalize it
  1. 1NF: Already atomic.
  2. 2NF: Split attributes that depend on part of the composite key.
    • Students(StudentID PK, StudentName)
    • Courses(CourseID PK, CourseName)
    • Enrollments(StudentID FK, CourseID FK, Grade, PK: StudentID+CourseID)
  3. 3NF: No non-key attribute depends on another non-key attribute in any table. We are done.

Example 2 — Orders with customer data

Raw table (OrderLinesRaw):

OrderID, OrderDate, CustomerID, CustomerName, ProductID, ProductName, Qty

Dependencies: OrderID → OrderDate, CustomerID; CustomerID → CustomerName; ProductID → ProductName; Key for lines is (OrderID, ProductID).

Normalize it
  • Orders(OrderID PK, OrderDate, CustomerID FK)
  • Customers(CustomerID PK, CustomerName)
  • Products(ProductID PK, ProductName)
  • OrderLines(OrderID FK, ProductID FK, Qty, PK: OrderID+ProductID)

Now CustomerName changes in one place. No non-key depends on another non-key in the same table.

Example 3 — Employees, departments, projects

Raw table (AssignmentsRaw):

EmpID, EmpName, DeptID, DeptName, ProjectID, ProjectName, Role

Given: EmpID → EmpName, DeptID; DeptID → DeptName; ProjectID → ProjectName. Key is (EmpID, ProjectID).

Normalize it
  • Employees(EmpID PK, EmpName, DeptID FK)
  • Departments(DeptID PK, DeptName)
  • Projects(ProjectID PK, ProjectName)
  • Assignments(EmpID FK, ProjectID FK, Role, PK: EmpID+ProjectID)

How to apply this step-by-step

  1. List candidate keys and functional dependencies (FDs).
  2. Check 1NF: remove repeating groups; split multi-valued fields.
  3. Check 2NF: if key is composite, remove attributes depending on part of the key.
  4. Check 3NF: remove attributes that depend on other non-key attributes.
  5. Consider BCNF if some determinant is not a candidate key.
  6. Name tables by entity, choose stable natural/surrogate keys, and define FK relationships.
Self-check checklist
  • Each non-key attribute depends on the whole key and nothing but the key.
  • No attribute contains lists (commas, arrays) or repeated groups.
  • Updates to a business fact happen in one table only.
  • Joins reflect real relationships (1:1, 1:N, N:M via bridges).

Exercises

Complete these and compare with the solutions in the collapsible sections of each exercise on this page. Your progress in the quick test is available to everyone; only logged-in users get saved progress.

  1. Exercise 1: Normalize EnrollmentRaw(StudentID, StudentName, CourseID, CourseName, Grade) to 3NF. Specify tables, keys, and FKs.
  2. Exercise 2: For ProductSales(ProductID, ProductName, CategoryName, CategoryManager, OrderID, OrderDate, Qty) with key (OrderID, ProductID), identify partial and transitive dependencies, then propose 3NF tables.
  3. Exercise 3: Given FDs for Assignment(EmpID, EmpName, DeptID, DeptName, ProjectID, ProjectName, Role) and key (EmpID, ProjectID), produce a 3NF schema and list FDs satisfied by each table.
Pre-submission checklist
  • Did you state functional dependencies before splitting tables?
  • Did you keep business-friendly names and stable keys?
  • Did you avoid embedding lists and repeated groups?
  • Did you make many-to-many relationships explicit with bridge tables?

Common mistakes and how to catch them

  • Forgetting the composite key: leads to 2NF violations. Fix by checking each attribute against the whole key.
  • Hiding lists in strings: “Tags = red,blue,green” is not 1NF. Create a bridge table.
  • Over-normalizing analytics outputs: keep source/OLTP in 3NF, then design star schemas for analytics separately when needed.
  • Using unstable natural keys: if ProductName changes, do not use it as PK; use a surrogate and keep the natural key as a unique attribute.
Quick self-audit
  • Pick any attribute and ask: which key determines it? If answer is not a key, fix the design.
  • Simulate an update: how many rows change? If more than one for a single fact, there is redundancy.

Practical projects

  • Normalize a small CRM: Leads, Accounts, Contacts, Activities. Deliver ERD, 3NF tables, and sample insert statements.
  • Catalog-to-order flow: Products, Categories, Customers, Orders, OrderLines. Show FDs and justify all keys.
  • Learning platform: Students, Courses, Instructors, Enrollments, Assignments, Grades. Include bridge tables for many-to-many relations.

Who this is for

  • Aspiring and current Data Architects structuring OLTP and integration layers.
  • Data Engineers designing staging and core layers before dimensional models.
  • Analysts migrating from spreadsheets to relational models.

Prerequisites

  • Basic relational database knowledge (tables, keys, foreign keys).
  • Comfort reading simple ER diagrams and understanding one-to-many and many-to-many relationships.

Learning path

  1. Conceptual entities and relationships.
  2. Keys and functional dependencies.
  3. Normalization (1NF → 3NF, BCNF).
  4. Integrity constraints and naming conventions.
  5. From logical (3NF) to physical design (indexes, data types).
  6. Purposeful denormalization when needed (analytics, performance).

Mini challenge

Given Table: ShipmentsRaw(ShipmentID, ShipmentDate, WarehouseID, WarehouseCity, OrderID, CustomerID, CustomerName). Identify the minimal set of tables in 3NF and their keys, then write one sentence explaining which anomaly you prevented.

When you are ready, take the quick test at the end of this page.

Practice Exercises

3 exercises to complete

Instructions

Given EnrollmentRaw(StudentID, StudentName, CourseID, CourseName, Grade), normalize to 3NF. List:

  • Final tables with columns
  • Primary keys
  • Foreign keys
  • One sentence per split explaining the dependency you fixed
Expected Output
Students(StudentID PK, StudentName); Courses(CourseID PK, CourseName); Enrollments(StudentID FK->Students.StudentID, CourseID FK->Courses.CourseID, Grade, PK: StudentID+CourseID). Explanation mentions removing partial dependencies on StudentID and CourseID.

Normalization Principles — Quick Test

Test your knowledge with 10 questions. Pass with 70% or higher.

10 questions70% to pass

Have questions about Normalization Principles?

AI Assistant

Ask questions about this tool