Normalization Process: UNF to BCNF - CSU357 - Shoolini U

Normal Forms - 1NF, 2NF, 3NF, BCNF

1. Introduction to Normal Forms

Normal Forms in Database Management Systems (DBMS) are standards for database schema design. They reduce data redundancy and improve data integrity. The progression through normal forms, from Unnormalized Form (UNF) to Boyce-Codd Normal Form (BCNF), involves a series of rules applied to the structure of tables. Each form aims to handle specific types of anomalies and inefficiencies, making the database more structured, efficient, and scalable.

1.1 Unnormalized Form (UNF)

UNF is the starting point in the normalization process. Tables in UNF can contain repeating groups, multi-valued attributes, and no primary key defined. This form is closest to how data might be stored in a flat-file database where normalization principles are not applied.

1.2 First Normal Form (1NF)

To satisfy the First Normal Form, a table must have its data in atomic form, meaning each cell contains unique and indivisible values, and there must be a primary key. It addresses the problem of repeating groups by ensuring that each record has a unique identifier and that the values in each column of a table are atomic.

1.3 Second Normal Form (2NF)

A table is in the Second Normal Form if it is in 1NF and all non-key attributes are fully functionally dependent on the primary key. This means there cannot be any partial dependency of any column on the primary key, effectively handling issues with partial dependencies where a non-primary key attribute depends only on a part of the composite primary key.

1.4 Third Normal Form (3NF)

To be in the Third Normal Form, a table must be in 2NF, and all its attributes must be directly dependent on the primary key. It eliminates transitive dependency, where non-primary key attributes depend on other non-primary key attributes, ensuring that each non-key attribute is only provided by the key.

1.5 Boyce-Codd Normal Form (BCNF)

A table is in BCNF if it is in 3NF and for every one of its non-trivial dependencies X → Y, X is a superkey. BCNF is stricter than 3NF, designed to handle the anomalies 3NF does not cover, particularly relating to situations where multiple candidate keys exist and overlap.

Transformations

2.1 UNF to 1NF Transformation

To transform a table from UNF to 1NF, identify all repeating groups and separate them into distinct tables, ensuring each table has a primary key. Data should be divided into atomic values where each column represents a single value, not a set of values.

2.2 1NF to 2NF Transformation

To move from 1NF to 2NF, identify and eliminate partial dependencies. This often involves separating the table into additional tables where each non-key attribute is fully functionally dependent on the primary key.

-- Example: Splitting a table into 2NF
CREATE TABLE Employees (
    EmployeeID INT,
    DepartmentID INT,
    EmployeeName VARCHAR(100),
    DepartmentName VARCHAR(100),
    PRIMARY KEY (EmployeeID, DepartmentID)
);
-- Split into
CREATE TABLE Employees (
    EmployeeID INT,
    EmployeeName VARCHAR(100),
    DepartmentID INT,
    PRIMARY KEY (EmployeeID)
);
CREATE TABLE Departments (
    DepartmentID INT,
    DepartmentName VARCHAR(100),
    PRIMARY KEY (DepartmentID)
);

2.3 2NF to 3NF Transformation

Advancing from 2NF to 3NF involves the removal of transitive dependencies. Create separate tables for any attributes that do not depend directly on the primary key, assigning them their own primary key if necessary.

2.4 UNF to BCNF Transformation

While the direct transformation from UNF to BCNF is uncommon due to the complexity of ensuring every determinant is a candidate key, lets us go through the steps nonetheless. Typically, a database is normalized sequentially through 1NF, 2NF, and 3NF before reaching BCNF. The process involves rigorous application of BCNF rules after achieving 3NF, especially in databases with complex relationships and overlapping candidate keys.

2.4.1 UNF to 1NF

To convert a table from UNF to 1NF, follow these steps:

  • Eliminate Repeating Groups: Identify any repeating groups within the table. A repeating group is a set of two or more columns that can contain multiple values for a single record. Split these into separate tables.
  • Ensure Atomicity: Each field must contain atomic values, meaning each field should contain only a single value of its specified type. If a field contains multiple values (e.g., a list), split these into individual records.
  • Define a Primary Key: Each table must have a primary key that uniquely identifies each record. If no natural primary key exists, consider adding a surrogate key.

2.4.2 1NF to 2NF

Once in 1NF, to achieve 2NF, address partial dependencies:

  • Identify Partial Dependencies: These occur when an attribute is dependent on only a part of a composite primary key.
  • Eliminate Partial Dependencies: Split the table into two or more tables to ensure that all non-key attributes are fully functionally dependent on the primary key.
  • Establish Relationships: Use foreign keys to maintain the relationship between the newly created tables and ensure referential integrity.

2.4.3 2NF to 3NF

Moving to 3NF involves eliminating transitive dependencies:

  • Identify Transitive Dependencies: A transitive dependency occurs when a non-primary key attribute depends on another non-primary key attribute.
  • Eliminate Transitive Dependencies: Create separate tables for attributes involved in a transitive dependency. The attribute on which another attribute is transitively dependent becomes the primary key of the new table.
  • Define Foreign Keys: Ensure that the original table references the new table(s) through foreign keys to preserve the relationships between the data.

2.4.4 3NF to BCNF

Finally, to achieve BCNF, which is stricter than 3NF, follow these steps:

  • Identify Non-trivial Dependencies: In BCNF, for every non-trivial dependency (A → B), A must be a superkey. Identify dependencies where this is not the case.
  • Resolve Violations: For any such identified dependency, decompose the table into two or more tables to ensure that each non-trivial dependency has a superkey on its left side.
  • Ensure Minimal Dependency Preservation: The decomposition should be done in a way that preserves dependencies and does not lose any data. This might involve the creation of additional tables or the adjustment of existing table structures.
  • Verify BCNF: After decomposition, verify that all tables now meet the BCNF criteria, meaning there are no non-trivial dependencies violating BCNF rules.

It is worth reiterating again that transforming directly from UNF to BCNF is typically not practical due to the complexity and the risk of losing sight of important dependencies. Sequential normalization provides a more manageable and error-proof path to ensuring data integrity and reducing redundancy.