As a quick refresher, and for a way to solidify my on-the-job experience with some RDB theory, I’ve decided to talk about normal forms.
For obvious reasons we want to reduce data redundancy in our DB: security, resources, manageability, data integrity… But what’s the best process to do this? The best solution is to turn to normal forms. Normal forms are a useful criteria for database (DB) normalization; they describe the degree of vulnerability for logical inconsistencies between tables in a DB. In a way, normal forms describe the standard of normalization with multiple degrees.
The higher the normal form, the more normalized a table is, and the lower the risk of data inconsistency. Thus in most (but not in all) cases we should strive for the HNF (Highest Normal Form) standard. However, for these set of posts, I will only talk up to the third normal form: 3NF
First Normal Form: 1NF
This is the most important criteria, and almost any relational DB should abide by it. The most essential conditions of 1NF deal with duplication\repetition:
No duplicate rows in a table\No duplicate domains (with same meaning) across columns
The no duplicate rows criteria is straight forward enough, but the second part needs further explanation. As an example, look at the table below. Here, the domain and meaning of a column is duplicated (albeit with a different column name) to keep track of multiple suppliers for a single products. A normalized DB would split this table into multiple ones. In this case (a many to many relationship), we would need to split this table up into 3: a product table, a supplier table, and a junction table. The junction table would have a primary key made up of the product and supplier foreign keys.
As a side note: the wikipedia article on 1NF gives the example of phone numbers associated to a person. They give 3 columns: “Tel. No. 1”, “Tel.. No. 2”, and “Tel No. 3.” This of course is a good example and similar to mine above because each columns is of the same domain and meaning. However, this would still be 1NF if the headings were “Home Number”, “Work Number”, and “Cell Number.” Although the columns may have the same domain, they have a completely different meaning–there is only one valid number for each cell. You still may want to separate the table into 2 or 3, but it’s not necessary unless you want to abide by Codd’s standard and not have null entries.
The intersection of every column and row contains only one value from a single domain.* Ie, don’t have multiple domains within a cell.
In my experience, breaking this rule can be tempting when you have a difficult to maintain and complex infrastructure in place. Essentially it’s putting multiple values of the same domain into a cell. Many times, but always reluctantly, I’ve broken this rule because of time and other project constraints. As a one to one flat relationship evolved into a one to many (or many to many) relationship, we simply tacked on multiple variables deliminated by some character. Luckily we didn’t do anything too complex with the data except displayed it in a UI, which parsed through the string of that column. Ugly, ugly, I know. What we should have done is split a table such as this into 3 as explained above.
*Codd’s (1970) version does not allow a nullable intersection, but more modern reformulations (Date, 2003) allow them. Essentially, if you want to abide by Codd’s standard, never have a nullable type in any table.
Other Criteria for 1NF
While the other conditions are necessary and important, they are not as likely to be broken: the order of the columns or rows cannot have meaning or significance, and all columns are regular. The reason why these conditions aren’t as likely to be broken is because they would take significant effort to implement (for the first part) or aren’t that useful (for the second part). If for example you implemented some code that attached some significance to the first 10 rows, or the last 10 rows, then you would be breaking the 1NF criteria.
The chances are that if you work with any RDBMS, you very well know the 1NF criteria intuitively. We all feel a little dirty when we don’t meet this minimal standard. For the next post, I will talk about 2NF and 3NF.