Normalization Examples

Normalization is a database design technique used to organize data in a database. Its goals are to reduce data redundancy and improve data integrity.

Understanding Normal Forms

Normalization involves a series of guidelines called normal forms. The most common ones are:

Example: De-normalizing a Student Table

Consider a table that stores student information and their enrolled courses. Without normalization, it might look like this:


CREATE TABLE StudentCourseEnrollment (
    StudentID INT,
    StudentName VARCHAR(100),
    StudentAddress VARCHAR(255),
    CourseID VARCHAR(10),
    CourseName VARCHAR(100),
    InstructorName VARCHAR(100),
    PRIMARY KEY (StudentID, CourseID)
);

INSERT INTO StudentCourseEnrollment (StudentID, StudentName, StudentAddress, CourseID, CourseName, InstructorName) VALUES
(101, 'Alice Smith', '123 Main St', 'CS101', 'Introduction to Programming', 'Dr. Evans'),
(101, 'Alice Smith', '123 Main St', 'MA201', 'Calculus II', 'Prof. Davis'),
(102, 'Bob Johnson', '456 Oak Ave', 'CS101', 'Introduction to Programming', 'Dr. Evans'),
(103, 'Charlie Brown', '789 Pine Ln', 'PH101', 'General Physics', 'Dr. Lee');
                

This table has several issues:

  • Redundancy: Student name and address are repeated for each course a student enrolls in. Course name and instructor name are repeated for each student enrolled in that course.
  • Update Anomalies: If a student's address changes, it needs to be updated in multiple rows.
  • Deletion Anomalies: If a student enrolls in only one course and then drops it, their name and address might be lost entirely if not handled carefully.

Applying Normal Forms

1NF: Atomic Values and Unique Records

The initial table is already in 1NF as each cell contains a single value, and the primary key (StudentID, CourseID) ensures uniqueness. However, the redundancy is apparent.

2NF: Removing Partial Dependencies

To achieve 2NF, we identify attributes dependent on only part of the primary key. In our case, `CourseName` and `InstructorName` depend only on `CourseID`. `StudentName` and `StudentAddress` depend only on `StudentID`.

We can split this into three tables:


-- Students Table
CREATE TABLE Students (
    StudentID INT PRIMARY KEY,
    StudentName VARCHAR(100),
    StudentAddress VARCHAR(255)
);

-- Courses Table
CREATE TABLE Courses (
    CourseID VARCHAR(10) PRIMARY KEY,
    CourseName VARCHAR(100),
    InstructorName VARCHAR(100)
);

-- Enrollment Table
CREATE TABLE Enrollments (
    StudentID INT,
    CourseID VARCHAR(10),
    PRIMARY KEY (StudentID, CourseID),
    FOREIGN KEY (StudentID) REFERENCES Students(StudentID),
    FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
);
                

Data after 2NF:


-- Students
INSERT INTO Students (StudentID, StudentName, StudentAddress) VALUES
(101, 'Alice Smith', '123 Main St'),
(102, 'Bob Johnson', '456 Oak Ave'),
(103, 'Charlie Brown', '789 Pine Ln');

-- Courses
INSERT INTO Courses (CourseID, CourseName, InstructorName) VALUES
('CS101', 'Introduction to Programming', 'Dr. Evans'),
('MA201', 'Calculus II', 'Prof. Davis'),
('PH101', 'General Physics', 'Dr. Lee');

-- Enrollments
INSERT INTO Enrollments (StudentID, CourseID) VALUES
(101, 'CS101'),
(101, 'MA201'),
(102, 'CS101'),
(103, 'PH101');
                

3NF: Removing Transitive Dependencies

The tables are now in 2NF. Let's consider if there are any transitive dependencies. In our current setup, there are none. For instance, in the Students table, StudentName and StudentAddress are directly dependent on StudentID. In the Courses table, CourseName and InstructorName are directly dependent on CourseID.

A common example of a transitive dependency might be if we had an InstructorID in the Courses table, and also an InstructorName and InstructorOffice in a separate Instructors table. If InstructorName and InstructorOffice were directly in the Courses table, they would be transitively dependent on CourseID (via InstructorID).

Example of a transitive dependency that would require further normalization (not present in our current 2NF example):

If Courses table had CourseID, CourseName, InstructorName, InstructorOffice, and InstructorName and InstructorOffice were also in a separate Instructors table linked by InstructorID. Then InstructorName and InstructorOffice would be transitively dependent on CourseID.

To achieve 3NF in such a scenario, we'd further split the Courses table.

Benefits of Normalization

While normalization is crucial for good database design, in some high-performance scenarios, a degree of de-normalization might be considered for read-heavy workloads to optimize query performance, but this should be a conscious decision made after understanding the trade-offs.