Designing your tables to comply with 3NF (third normal form) is usually sufficient to ensure good design so, most of the time, you can stop right here. The higher normal forms address less common data problems. They are included here so you'll know what they are and what to do about them if you come across them. These normal forms do get a bit more complicated.
The truth is there are times when you will want to denormalize your data. That means you may sometimes want to put the data in two normalized tables back into one denormalized table. The reasons for doing this are usually associated with performance, for example, the speed at which queries run. But at least it will be a conscious decision and represents one of the beauties of normalization. Properly normalized tables can always be put back together with no loss or gain of data.
A table is in third normal form (3NF) and all determinants are candidate keys.
Boyce-Codd normal form (BCNF) can be thought of as a "new" third normal form. It was introduced to cover situations that the "old" third normal form did not address. Keep in mind the mean of a determinant (determines the value in another field) and candidate keys (qualify for designation as primary key). This normal form applies to situations where you have overlapping candidate keys.
If a table has no non-key fields, it is automatically in BCNF.
A Practical Approach
Look for potential problems in updating existing data (modification anomaly) and in entering new data (insertion anomaly).
Imagine that you were designing a table for a college to hold information about courses, students, and teaching assistants. You have the following business rules.
Some sample data:
CourseNum | Student | TA |
---|---|---|
ENG101 | Jones | Clark |
ENG101 | Grayson | Chen |
ENG101 | Samara | Chen |
MAT350 | Grayson | Powers |
MAT350 | Jones | O'Shea |
MAT350 | Berg | Powers |
To uniquely identify each record, you could choose CourseNum + Student as a primary key. This would satisfy third normal form also because the combination of CourseNum and Student determines the value in TA. Another candidate key would be Student + TA. In this case, you have overlapping candidate keys (Student is in both). The second choice, however, would not comply with third normal form because the CourseNum is not determined by the combination of Student and TA; it only depends on the value in TA (see the business rules). This is the situation that Boyce-Codd normal form addresses; the combination of Student + TA could not be considered to be a candidate key.
If you wanted to assign a TA to a course before any students enrolled, you couldn't because Student is part of the primary key. Also, if the name of a TA changed, you would have to update it in multiple records.
If you assume you have just these fields, this data would be better stored in three tables: one with CourseNum and Student, another with Student and TA, and a third with CourseNum and TA.
*CourseNum | *Student |
---|---|
ENG101 | Jones |
ENG101 | Grayson |
ENG101 | Samara |
MAT350 | Grayson |
MAT350 | Jones |
MAT350 | Berg |
*Student | *TA |
---|---|
Jones | Clark |
Grayson | Chen |
Samara | Chen |
Grayson | Powers |
Jones | O'Shea |
Berg | Powers |
*CourseNum | *TA |
---|---|
ENG101 | Clark |
ENG101 | Chen |
MAT350 | O'Shea |
MAT350 | Powers |
Above, showing tables that comply with BCNF
A table is in Boyce-Codd normal form (BCNF) and there are no multi-valued dependencies.
A multi-valued dependency occurs when, for each value in field A, there is a set of values for field B and a set of values for field C but fields B and C are not related.
A Practical Approach
Look for repeated or null values in non-key fields.
A multi-valued dependency occurs when the table contains fields that are not logically related. An often used example is the following table:
*Movie | *Star | *Producer |
---|---|---|
Once Upon a Time | Julie Garland | Alfred Brown |
Once Upon a Time | Mickey Rooney | Alfred Brown |
Once Upon a Time | Julie Garland | Muriel Humphreys |
Once Upon a Time | Mickey Rooney | Muriel Humphreys |
Moonlight | Humphrey Bogart | Alfred Brown |
Moonlight | Julie Garland | Alfred Brown |
A movie can have more than one star and more than one producer. A star can be in more than one movie. A producer can produce more than one movie. The primary key would have to include all three fields and so this table would be in BCNF. But you have unnecessarily repeated values, with the data maintenance problems that causes, and you would have trouble with deletion anomalies.
The Star and the Producer really aren't logically related. The Movie determines the Star and the Movie determines the Producer. The answer is to have a separate table for each of those logical relationships - one holding Movie and Star and the other with Movie and Producer, as shown below:
*Movie | *Star |
---|---|
Once Upon a Time | Julie Garland |
Once Upon a Time | Mickey Rooney |
Moonlight | Humphrey Bogart |
Moonlight | Julie Garland |
*Movie | *Producer |
---|---|
Once Upon a Time | Alfred Brown |
Once Upon a Time | Muriel Humphreys |
Moonlight | Alfred Brown |
Above, showing tables that comply with 4NF
Below is another example of a common design error, and it's easily spotted by all the missing or blank values.
DeptCode | ProjectNum | ProjectMgrID | Equipment | PropertyID |
---|---|---|---|---|
IS | 36-272-TC | EN1-15 | CD-ROM | 657 |
IS | VGA desktop monitor | 305 | ||
AC | 35-152-TC | EN1-15 | ||
AC | Dot-matrix printer | 358 | ||
AC | Calculator with tape | 239 | ||
TW | 30-452-T3 | EN1-10 | 486 PC | 275 |
TW | 30-457-T3 | EN1-15 | ||
TW | 31-124-T3 | EN1-15 | Laser printer | 109 |
TW | 31-238-TC | EN1-15 | Hand-held scanner | 479 |
RI | Fax machine | 775 | ||
MK | Laser printer | 858 | ||
MK | Answering machine | 187 | ||
TW | 31-241-TC | EN1-15 | Standard 19200 bps modem | 386 |
SL | 486 Laptop PC | 772 | ||
SL | Electronic notebook | 458 |
Above, a table with many null values (note: it also does not comply with 3NF and BCNF)
It's the same problem here because not all of the data is logically related. As usual, the answer is more tables - one to hold the information on the equipment assigned to departments (with PropertyID as the primary key) and another with projects and departments. You'd have to know the business rules to know whether a project might involve more than one department or manager and be able to figure out the primary key. Assuming a project can have only one manager and be associated with only one department, the tables would be as follows.
*PropertyID | Equipment | DeptCode |
---|---|---|
657 | CD-ROM | IS |
305 | VGA desktop monitor | IS |
358 | Dot-matrix printer | AC |
239 | Calculator with tape | AC |
275 | 486 PC | TW |
109 | Laser printer | TW |
479 | Hand-held scanner | TW |
775 | Fax machine | RI |
858 | Laser printer | MK |
187 | Answering machine | MK |
386 | Standard 19200 bps modem | TW |
772 | 486 Laptop PC | SL |
458 | Electronic notebook | SL |
*ProjectNum | ProjectMgrID | DeptCode |
---|---|---|
36-272-TC | EN1-15 | IS |
35-152-TC | EN1-15 | AC |
30-452-T3 | EN1-10 | TW |
30-457-T3 | EN1-15 | TW |
31-124-T3 | EN1-15 | TW |
31-238-TC | EN1-15 | TW |
31-241-TC | EN1-15 | TW |
Above, tables that eliminate the null values and comply with 4NF
A table is in fourth normal form (4NF) and there are no cyclic dependencies.
A cyclic dependency can occur only when you have a multi-field primary key consisting of three or more fields. For example, let's say your primary key consists of fields A, B, and C. A cyclic dependency would arise if the values in those fields were related in pairs of A and B, B and C, and A and C.
Fifth normal form is also called projection-join normal form. A projection is a new table holding a subset of fields from an original table. When properly formed projections are joined, they must result in the same set of data that was contained in the original table.
A Practical Approach
Look for the number of records that will have to be added or maintained
Following is some sample data about buyers, the products they buy, and the companies they buy from on behalf of MegaMall, a large department store.
*Buyer | *Product | *Company |
---|---|---|
Chris | jeans | Levi |
Chris | jeans | Wrangler |
Chris | shirts | Levi |
Lori | jeans | Levi |
Above, a table with cyclic dependencies
The primary key consists of all three fields. One data maintenance problem that occurs is that you need to add a record for every buyer who buys a product for every company that makes that product or they can't buy from them. That may not appear to be a big deal in this sample of 2 buyers, 2 products, and 2 companies (2 X 2 X 2 = 8 total records). But what if you went to 20 buyers, 50 products, and 100 companies (20 X 50 X 100 = 100,000 potential records)? It quickly gets out of hand and becomes impossible to maintain.
You might be tempted to solve this by dividing this into the following two tables.
*Buyer | *Product |
---|---|
Chris | jeans |
Chris | shirts |
Lori | jeans |
*Product | *Company |
---|---|
jeans | Wrangler |
jeans | Levi |
shirts | Levi |
However, if you joined the two tables above on the Product field, it would produce a record not part of the original data set (it would say that Lori buys jeans from Wrangler). This is where the projection-join concept comes in.
The correct solution would be three tables:
*Buyer | *Product |
---|---|
Chris | jeans |
Chris | shirts |
Lori | jeans |
*Product | *Company |
---|---|
jeans | Wrangler |
jeans | Levi |
shirts | Levi |
*Buyer | *Company |
---|---|
Chris | Levi |
Chris | Wrangler |
Lori | Levi |
Above, tables that comply with 5NF
When the first two tables are joined by Product and the result joined to the third table by Buyer and Company, the result is the original set of data.
In our scenario of 20 buyers, 50 products, and 100 companies, you would have, at most, 1000 records in the Buyers table (20 X 50), 5000 records in the Products table (50 X 100), and 2000 records in the Companies table (20 X 100). With a maximum of 8000 records, these tables would be much easier to maintain than the possible 100,000 records we saw earlier.