As we saw in our introductory example, distributing data in different tables is a very common practice, and even immediate when dealing with real cases. Before looking at the query techniques that SQL offers, let’s talk about data normalization. The standardization of data in SQL refers to the work of Dr Codd, who in the 1960s and 1970s laid the theoretical foundations for modeling data in databases. He set out a number of principles which were then applied by software vendors, such as IBM, Oracle, or Microsoft. Even today these principles are applied. We will therefore outline them briefly.
The first relational model established by Dr. Codd has twelve rules. We’re just going to see the first rules, in a simplified form.
The first normal form essentially defines the notion of atomic information. It states the following:
- the row order does not carry information in a table;
- the order of the columns does not carry information in a table;
- a table cannot contain the same row several times;
- each cell in a row carries only one piece of information, which cannot be divided;
- a table exposes all its information in its rows and columns (no hidden information).
All of these principles are actually quite trivial. The fact that there cannot be a duplicate row implies that each row must have an identity , which supposes that one can always define a primary key on it. Usually, this primary key is carried by a particular column, but this is not always the case. The notion of primary key is defined by the second normal form.
Note that this form indicates that each cell can only carry one piece of information. Strictly speaking, a column
numéro de téléphonein our table
Marinsshould therefore not have several numbers separated by commas (a case that we never meet in reality, of course!).
The second normal form defines the notion of primary key.
A table verifying the second normal form must first verify the first normal form. If we define a primary key
PKon this table, possibly made up of several columns, then all the columns of the table must depend only on the whole of
PK, and not on one of its parts.
Theoretically, this statement may seem a little complicated, on an example it becomes very simple.
Table 10. Music database – first version
|Albert||Floor||12, rue des fruits|
|Albert||Harpsichord||12, rue des fruits|
|André||Clarinet||23, flower street|
|André||Oboe||23, flower street|
|André||Flute||23, flower street|
|Antoine||Bassoon||34, vegetable street|
A more complex approach could be to take the fields
(nom, instrument)as the primary key. This second approach also fails, since the address we have added depends only on the name of the musician, and not on the instrument he is playing. We therefore violate the fact that a column cannot depend on a component (that is to say on a part) of a primary key.
This table therefore violates the second normal form, so it has to be reorganized. We have to split our information into two tables.
Table 11. Music base – second version: musicians table
|Albert||12, rue des fruits|
|André||23, flower street|
|Antoine||34, vegetable street|
Table 12. Music base – second version: practiced instruments table
In this last table, the primary key is made up of the two columns of the table. Strictly speaking, it would even be necessary to split this second table into two tables, following the model we used to build the relationship between our sailors and their communities of birth.
The second normal form imposes on us that it is not possible that a column can depend on a part of the primary key defined on the table. The third imposes the complement on us: a column does not have the right to depend on a column that is not part of the primary key.
Let us take an example of a structure which respects the second normal form, and which nevertheless poses an immediate problem. We have a first table of sailors:
Example 59. Third normal form – Marines table
create table sailors (name, date of birth) primary key (name);
Example 60. Third normal form – Embarkations table
create embarkation table (sailor, embarkation_date, ship, tonnage) primary key (sailor, embarkation_date);
embarquementsfollows the second normal form well: no column depends on a part of the primary key. However, the column
bateau, which is not part of the primary key, actually serves as the key to the column
tonnage, since it
tonnageof course depends on the
bateau. If we change the value of
bateau, which is part of the primary key, we must also change that of
In this, it violates the third normal form: a column cannot depend on anything other than the primary key of a table.
There are still many ways to normalize data. The principle underlying all these normalizations is however always the same: uniqueness of the dependencies, uniqueness of the SQL actions to be carried out for the update of a single data.
It is also up to each developer to think about his problem, and to see which normal form is the easiest to solve his problem.