Normalization
Normalization is the process of breaking down a complex and large table or relation into smaller multiple tables. It is used to reduce repetition of data, identify dependencies between the data and make the database more flexible and easier to use.
Advantages of normalization are:
■ It reduces data redundancy.
■ It reduces wastage of storage.
■ It removes inconsistency in the database.
■ It simplifies the structure of tables.
■ It makes the database operation easier.
■ It avoids loss of data from the database.
■ It improves the performance of the system.
Normalization use different normal forms. Some of the normal forms are: First Normal Form (1NF), Second Normal Form (2NF) Third Normal Form (3NF).
Type of normalization
- First normal form (1NF)
- Second normal form (2NF)
- Third normal form (3NF)
First normal form (1NF) – A table is said to be in first normal form if it has no repeating groups. A repeating group is a set of columns that store similar information that repeats in the same table. Consider the following table. which contains the tracking information.
Contacts |
Contact _ ID L_ name F_ name Contact_data1 Contact_desc1 Contact_date2 Contact_desc2 |
This table contains a repeating group of the data and description of two conversation. The only advantage of designing table like this is that it avoids the need for relationship. But has the following disadvantages:
- This structure limits the number of conversation to two, which will create problem when more than two conversation need to be stored.
- Thus structure also makes it difficult to do any kind of meaningful searching using the column.
To remove the repeating group is to move to another table, which is then related to present table with common key field. The primary key of parent table is stored in second table. A table is in 1NF if there is no repeating of groups.
Contacts | Conversation |
Contact_ ID L_ name F_ name |
Contact_ ID Contact_ date Contact_ desc |
Table in 1NF
Second normal form(2NF) – A table is said to be in second normal form if it is already in first normal form and if every non key column depends on the entire key. It is the elimination of redundant date. Redundant date is date that is expressed multiple times unnecessarily or depend only on part of multi value key. In other word, when the column’s value is depend upon the value of one column of table, but not another, it is considered redundant. For example consider the table employee.
Employee |
Emp_ No L_ name F_ name Dept_ code Description |
This table contains the redundant data, namely the department description, which depends only on the Dept_ code and does not depend on the Emp_ No, which is the primary key of the table. So strong the Dept_ code and description in another table eliminate redundancy.
Employee | Department |
Emp_ No L_ name F_ name Dept_ code |
Dept_ code Description |
When the columns that depends on any key other than primary key is removed, then the resulting table is in 2NF.
Third normal form (3NF) – A table is said to be in third normal form if it is already in second normal form and if non key columns are not dependent on each other. The columns in each table should be a group of columns in which the data in each column contributes to the description of each row in the table. For a given row with a unique key, each column in the appearing in that row should contribute to the description of the row.
For example consider the following table contacts:
Contacts |
Contact_ ID L_ name F_ name Company_ name Company_ location |
In the above table, F_ name and L_ name contribute to a describing a specific contact using the primary key Contact_ id. But the Company_ name and company_ location do not contribute to describing the record with a given Contact_ id. So store Company_ name and Company_ location in another table.
Contacts | Company |
Contact_ ID L_ name F_ name Company_ ID |
Company_ ID Company_ name Company_ location |
When all the columns in a table describe and depend upon the primary key, the table is said to satisfy the 3NF.