luvv to helpDiscover the Best Free Online Tools
Topic 12 of 31

Normalization and Relational Modeling

Learn Normalization and Relational Modeling for free with explanations, exercises, and a quick test (for Data Analyst).

Published: December 19, 2025 | Updated: December 19, 2025

Why this matters

As a Data Analyst, you often join multiple tables, calculate metrics, and explain discrepancies. Good normalization and relational modeling help you:

  • Prevent duplicate or conflicting facts (e.g., two different customer addresses for one order).
  • Write simpler, faster, and more reliable JOINs for analyses.
  • Avoid update, insert, and delete anomalies that skew dashboards.
  • Scale your datasets without messy fixes later.

Concept explained simply

Normalization is organizing data so that each fact lives in one place and relationships are explicit. This reduces redundancy and prevents anomalies.

  • Primary Key (PK): a column or set of columns uniquely identifying a row.
  • Foreign Key (FK): a column referencing a PK in another table; enforces relationships.
  • Functional dependency: A → B means A determines B (knowing A gives you B).
  • Normal forms (practical set):
    • 1NF: Columns are atomic (no lists), no repeating groups.
    • 2NF: In tables where the PK is composite, non-key columns depend on the whole key, not part of it.
    • 3NF: No transitive dependencies on a key (non-key columns do not determine other non-key columns).
    • BCNF: Every determinant is a candidate key (stronger than 3NF).
  • Relationships: 1:1, 1:M, M:N (handled via a junction table).
  • Keys: Natural key (from the data) vs surrogate key (artificial, e.g., ID). Use stable, non-volatile keys. Add a surrogate if natural keys are long or may change.
Quick reference: Normal forms and anomalies
  • Update anomaly: Changing a fact in one row but not in its duplicates.
  • Insert anomaly: Cannot insert a fact because another fact is missing.
  • Delete anomaly: Deleting a row accidentally removes unrelated facts.
  • Goal of 3NF/BCNF: Each fact is stored once; dependencies are clear.

Mental model

Imagine a well-labeled toolbox. Each drawer holds one kind of tool (fact). A label (key) tells you exactly where a tool is. Drawers connect with guides (foreign keys). If every screwdriver is only in the screwdriver drawer, you never wonder which copy is correct. That is normalization.

Worked examples

Example 1: Students and Courses (from a messy sheet to 3NF)

Raw sheet columns: StudentID, StudentName, CourseID, CourseName, InstructorName, InstructorEmail, Grade

Issues: StudentName repeats per course; Instructor details repeat per course; potential update anomalies.

  1. Identify entities: Student, Course, Instructor, Enrollment.
  2. Proposed tables:
    • Students(StudentID PK, StudentName)
    • Instructors(InstructorID PK, InstructorName, InstructorEmail)
    • Courses(CourseID PK, CourseName, InstructorID FK → Instructors)
    • Enrollments(StudentID FK, CourseID FK, Grade, PK(StudentID, CourseID))
-- 3NF design
CREATE TABLE Students (
  StudentID INT PRIMARY KEY,
  StudentName VARCHAR(100) NOT NULL
);
CREATE TABLE Instructors (
  InstructorID INT PRIMARY KEY,
  InstructorName VARCHAR(100) NOT NULL,
  InstructorEmail VARCHAR(100) UNIQUE NOT NULL
);
CREATE TABLE Courses (
  CourseID INT PRIMARY KEY,
  CourseName VARCHAR(150) NOT NULL,
  InstructorID INT NOT NULL,
  FOREIGN KEY (InstructorID) REFERENCES Instructors(InstructorID)
);
CREATE TABLE Enrollments (
  StudentID INT NOT NULL,
  CourseID INT NOT NULL,
  Grade CHAR(2),
  PRIMARY KEY (StudentID, CourseID),
  FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
  FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);

Result: No repeating instructor info across courses; each fact stored once; simple joins for analytics.

Example 2: Orders and addresses (normalization with a snapshot)

We want a record of the shipping address used on the order date. Customer addresses change over time, so we keep a snapshot on Orders.

  • Customers(CustomerID PK, Name, Email)
  • Addresses(AddressID PK, CustomerID FK, Line1, City, State, PostalCode, IsDefault)
  • Orders(OrderID PK, CustomerID FK, OrderDate, ShipToName, ShipToLine1, ShipToCity, ShipToState, ShipToPostalCode)
  • OrderItems(OrderID FK, ProductID FK, Qty, UnitPriceAtOrder, PK(OrderID, ProductID))

This design is normalized and also preserves historical truth: the shipping address at the time of the order is stored in Orders as a snapshot attribute set (by business choice).

Example 3: Products and tags (many-to-many)

  • Products(ProductID PK, Name)
  • Tags(TagID PK, TagName UNIQUE)
  • ProductTags(ProductID FK, TagID FK, PK(ProductID, TagID))
CREATE TABLE Products (
  ProductID INT PRIMARY KEY,
  Name VARCHAR(120) NOT NULL
);
CREATE TABLE Tags (
  TagID INT PRIMARY KEY,
  TagName VARCHAR(80) UNIQUE NOT NULL
);
CREATE TABLE ProductTags (
  ProductID INT NOT NULL,
  TagID INT NOT NULL,
  PRIMARY KEY (ProductID, TagID),
  FOREIGN KEY (ProductID) REFERENCES Products(ProductID),
  FOREIGN KEY (TagID) REFERENCES Tags(TagID)
);
-- Query example: count products per tag
SELECT t.TagName, COUNT(*) AS ProductCount
FROM Tags t
JOIN ProductTags pt ON pt.TagID = t.TagID
GROUP BY t.TagName;

Step-by-step: how to normalize and model

  1. List entities (nouns) and attributes from real tasks and reports.
  2. Choose primary keys; prefer stable natural keys, or add surrogate IDs if needed.
  3. Enforce 1NF: split lists into rows; remove repeating groups into child tables.
  4. If you have composite keys, check 2NF: move columns that depend on only part of the key to their own tables.
  5. Check 3NF: remove transitive dependencies by creating new tables for attributes determined by non-keys.
  6. Model relationships: 1:1 (rare), 1:M (FK on many-side), M:N (introduce junction table).
  7. Add constraints: NOT NULL, UNIQUE, CHECK, and FOREIGN KEYs to enforce integrity.
  8. Index PKs and FKs; consider additional indexes for common joins/filters.
  9. Only denormalize for strong, measured reasons (performance, immutable snapshots); document why.

Exercises

Try the exercise below. Use the checklist to verify your design. Hints and a full solution are available inside the collapsible block.

Exercise 1: Normalize an order management sheet

Goal: Convert the spreadsheet into a 3NF relational model with appropriate keys and FKs.

Spreadsheet sample (rows)
OrderID, OrderDate, CustomerName, CustomerEmail, ProductSKU, ProductName, UnitPrice, Qty, CustomerCity
1001, 2024-01-05, Alice Grey, alice@example.com, SKU-1, Notebook, 5.00, 3, Boston
1001, 2024-01-05, Alice Grey, alice@example.com, SKU-2, Pen, 1.50, 5, Boston
1002, 2024-01-06, Ben Fox, ben@example.com, SKU-2, Pen, 1.50, 2, Denver
    
  • Identify entities, primary keys, and relationships.
  • Eliminate redundancies and define 1:M and M:N structures as needed.
  • Decide which attributes should be stored as an order-time snapshot (e.g., UnitPrice).

Use this checklist as you design:

  • All columns are atomic; no repeated groups (1NF)
  • No attribute depends on part of a composite key (2NF)
  • No transitive dependency on a key (3NF)
  • PKs chosen; FKs enforce relationships
  • Unique constraints where appropriate (e.g., CustomerEmail)

Open the exercise in the Exercises panel for hints and solution.

Common mistakes and how to self-check

  • Storing lists in a single column (e.g., "red, blue"): Check 1NF—split into child rows.
  • Using text names as FKs: Prefer numeric or short PKs; enforce referential integrity with FOREIGN KEYs.
  • Ignoring junction tables for M:N: If multiple rows relate to multiple rows, add a junction table.
  • Over-normalizing small, static lookups: Keep a tiny lookup table, not dozens of micro-tables.
  • Forgetting historical snapshots: Prices and addresses at order time may need to be copied by design (document it).

Self-check: For any non-key column X, ask: what uniquely determines X? Ensure that determinant is a candidate key in the same table. If not, refactor.

Practical projects

  • Design a Sales Analytics schema: Customers, Products, Orders, OrderItems, and Promotions with a junction table for many-to-many promo applications.
  • Create a Learning Management schema: Students, Courses, Instructors, Enrollments, Assignments, Submissions.
  • Build an Event Tracking schema: Users, Sessions, Events, EventProps (key-value table normalized by property name lookup).

Who this is for

  • Data Analysts who write queries across multiple tables and need reliable joins.
  • Anyone cleaning spreadsheets that need to become trustworthy databases.

Prerequisites

  • Basic SQL: SELECT, JOIN, GROUP BY.
  • Comfort with primary keys and foreign keys.

Learning path

  1. Review 1NF/2NF/3NF and identify anomalies in your current datasets.
  2. Practice modeling entities and relationships for a familiar domain.
  3. Add constraints (PK, FK, UNIQUE, CHECK) and test inserts/updates.
  4. Run analytic queries on your model; iterate if joins feel awkward.
  5. Evaluate if any denormalization is warranted and document the reason.

Next steps

  • Complete the exercise and compare with the solution.
  • Take the Quick Test to check understanding (available to everyone; only logged-in users get saved progress).
  • Apply the design process to a dataset you use at work or in study.

Mini challenge

Design a schema for a simple ticketing system: Users submit Tickets, each Ticket can have multiple Comments, and Tickets can have multiple Labels. Choose keys, model relationships, and show the junction table for labels. Aim for 3NF.

Test yourself

Open the Quick Test below to validate your understanding. The test is available to everyone; only logged-in users get saved progress.

Practice Exercises

1 exercises to complete

Instructions

Convert the spreadsheet into a normalized schema and provide CREATE TABLE statements with PKs, FKs, and constraints.

Spreadsheet sample
OrderID, OrderDate, CustomerName, CustomerEmail, ProductSKU, ProductName, UnitPrice, Qty, CustomerCity
1001, 2024-01-05, Alice Grey, alice@example.com, SKU-1, Notebook, 5.00, 3, Boston
1001, 2024-01-05, Alice Grey, alice@example.com, SKU-2, Pen, 1.50, 5, Boston
1002, 2024-01-06, Ben Fox, ben@example.com, SKU-2, Pen, 1.50, 2, Denver
  • Identify entities: Customers, Products, Orders, OrderItems.
  • Choose keys (consider surrogate IDs) and unique constraints (e.g., CustomerEmail, ProductSKU).
  • Store UnitPrice on OrderItems as the price at order time (snapshot).
Expected Output
A 3NF design with four tables: Customers, Products, Orders, OrderItems; PKs and FKs set; UNIQUE(CustomerEmail), UNIQUE(ProductSKU); UnitPriceAtOrder stored in OrderItems.

Normalization and Relational Modeling — Quick Test

Test your knowledge with 8 questions. Pass with 70% or higher.

8 questions70% to pass

Have questions about Normalization and Relational Modeling?

AI Assistant

Ask questions about this tool