Higher normal forms
Contents:
1. Multivalued dependencies
2. Fourth normal form
3. Fifth normal form
A brief history of normal forms:
Form Inventor Year First Normal Form Codd ?? Second Normal Form Codd 1972 Third Normal Form Codd 1972 Boyce-Codd Normal Form Heath
Boyce & Codd1971
1974Fourth Normal Form Fagin 1977 Fifth Normal Form Fagin 1979
These are summarized in A simple guide to
five normal forms in relational database theory
, a nice 1983
ACM article by William Kent.
1. Multivalued dependencies
Definition: A1 A2 … An ⇒ B1 B2 … Bm is an MVD (a multivalued dependency) for a relation R if whenever two tuples s and t in R agree on A1, A2, … An, there must exist a tuple u in R that:
- agrees with s and t on A1, A2, … An.
- agrees with s on B1, B2, … Bm.
- agrees with t on all other attributes.
Example 1:
dealers table name make product Smith Ford car Smith Ford truck Smith Toyota car Smith Toyota truck Jones Toyota car Jones Toyota truck Jones VW car The idea here is that a dealer can sell several makes of vehicles, but for each make the dealer deals, the dealer must sell all products in the line. Thus, Jones, having chosen to work with Toyota, must sell both its trucks and its cars. However, we're imagining that VW doesn't produce trucks, so VW only requires Jones to sell their cars to work with VW.
Notice that name ⇒ make is not an MVD: If it were, then we could take the next-to-last row as t and the last row as s (since they agree on name) and conclude that there must be a row that agrees with both on name (Jones), with s on make (VW) and with t on product (truck). However, the tuple (Jones, VW, truck) is not in the table, so name ⇒ make is not an MVD.
However, make ⇒ product is an MVD. Consider the rows agreeing on, say, Toyota, for their make: If you take the set of all values for product in these rows {car, truck} and the set of all values for name in these rows {Smith, Jones}, then combinations of something from one set and something from the other set are included in the Toyota rows. This is required by being a MVD.
Example 2:
posts(name, email, subject, body) where somebody posting a message may list multiple e-mail addresses, in which case there will be a row added for each e-mail address, with the same subject and body.In this case, name ⇒ email is an MVD.
Notes:
Any FD is an MVD.
Any MVD is equivalent to another MVD with the same LHS but a RHS containing all attributes not appearing in the original MVD. For example, if name ⇒ email is an MVD, so must be name ⇒ subject body.
You cannot
split
the RHS of an MVD as you can with FDs. So if name ⇒ subject body is an MVD, this does not imply that name ⇒ subject is an MVD.
2. Fourth normal form
A relation is in 4NF (fourth normal form) if every MVD has a left side that is a superkey.
While we won't look at an algorithm for breaking a non-4NF table into 4NF tables, we can look at what we would do to solve the problem for our two examples.
Example 1:
We need just two tables.
dealer_makes table name make Smith Ford Smith Toyota Jones Toyota Jones VW
make_products table make product Ford car Ford truck Toyota car Toyota truck VW car
Example 1:
We need two tables:
addresses(name, email)
posts(name, subject, body)
3. Fifth normal form
Going back to our automobile dealers, suppose agents could also specialize in some vehicle types.
dealers table name make product Smith Ford car Smith Ford truck Smith Toyota car Smith Toyota truck Jones Toyota car Jones VW car
(This is the same as Example 2 above, except we deleted (Jones, Toyota, truck) since we're imagining that Jones specializes only in cars.)
This has no MVDs, so it is in fourth normal form. Nonetheless, the table contains some redundancy: For example, it lists that Smith deals with both cars and trucks twice, and it also lists that Smith deals for Ford and Toyota twice. For this reason, it is not in fifth normal form.
(You might think that that name ⇒ make was an MVD based on what appears in the above table. But if Jones were to add minivans as a specialty, and if VW were to make minivans but Toyota were not, then (Jones, VW, minivan) would go into the table. We could then look at the Jones-Toyota-car and Jones-VW-minivan rows and conclude there should also be a Jones-Toyota-minivan row, which there is not.)
We won't give a formal definition of fifth normal form. But in this case, we can split the table into three smaller tables to reduce the redundancy — and because we can perform this split, it's not in fifth normal form.
dealer_products table name product Smith car Smith truck Jones car
dealer_makes table name make Smith Ford Smith Toyota Jones Toyota Jones VW
make_products table make product Ford car Ford truck Toyota car Toyota truck VW car