First Normal Form (1NF) sets the very basic rules for an organized database as follows:
- Eliminate duplicate columns from the same table.
- Create separate tables for each group of related data and identify each row with a unique column or set of columns (i.e., primary key).
The first rule requires that data must not be duplicated within the same row of a table. This concept is often referred to as the atomicity of a table and tables that comply with this rule are said to be atomic. For example, let’s closely examine a table within a human resources database that stores the manager-employee relationship. Let’s assume the business rule that each manager may have one or more employees while each employee may have only one manager.
When we are creating a list or spreadsheet to track this information, we might create a table with the following fields:
- Manager
- Employee1
- Employee2
- Employee3
- Employee4
However, recall that the first rule imposed by 1NF is to eliminate duplicate columns from the same table. Clearly, the Employee1-Employee4 columns are duplicates. Take a moment and think about the potential problems raised by this scenario. If a manager only has one employee – the Employee2-Employee4 columns are simply wasted storage space (a precious database commodity). Furthermore, imagine the case where a manager already has 4 employees – what happens if he or she takes on another employee? The whole table structure would require modification.
It is at this point that a second idea usually occurs to database novices: We don’t want to have more than one column and we want to allow for a flexible amount of data storage.
We could try something like this:
- Manager
- Employees (Employees field contains multiple entries in the form “Mary, Bill, Joe”, for example.)
This solution is closer, but it is not quite correct. The Employees column is still duplicative and non-atomic. What happens when we need to add or remove an employee? We need to read and write the entire contents of the table. That’s not a big deal in this situation, but what if one manager had over a hundred employees? Also, it complicates the process of selecting data from the database in future queries.
The following is a table that correctly satisfies the first rule of 1NF:
- Manager
- Employee
In this case, each employee has a single entry, but managers may have multiple entries.
We must also identify each row with a unique column or set of columns (the primary key). You might take a look at the table above and suggest the use of the employee column as a primary key. In fact, the employee column is a good candidate for a primary key due to the fact that our business rules specified that each employee may have only one manager. However, the data that we’ve chosen to store in our table makes this a less than ideal solution. What happens if we hire another employee named Jim? How do we store his manager-employee relationship in the database?
It’s best to use a truly unique identifier (such as an employee ID) as a primary key.
Our final table would look like this:
- Manager ID
- Employee ID