A database very rarely has a single table. In general, the data is spread over many tables, and extracting what you want in an intelligible format requires complex, and sometimes expensive, queries.
The notion that we present now in this introductory chapter is the notion of join. Establishing a join between two tables consists in crossing the information carried by each of these tables, according to very precise criteria. Understanding this notion is essential in SQL.
Let’s continue on our example, adding to our table, the town where our sailors were born.
Table 2. Places of birth
|name||First name||Place of birth|
The naive approach would be to add a column to our table, of type
varchar, and to populate this column with this new data. This approach could certainly work, but it would pose a problem from the outset. In our very simple example, out of 9 sailors, we already have 2 who were born in the same place: Saint Malo. We can assume that if we had thousands of sailors in our table, the number of duplicates in this column would be quite large.
Processing a query like “I want to extract all sailors born in Saint Malo” would work trivially, by selecting all “Saint Malo” values in the “place of birth” column. It is better that the entry of the values of this column happens without error, because if the value “Saint Malau” has been entered instead of “Saint Malo”, the associated sailor will never be selected by such a request! Worse than that, detecting this kind of errors on very large volumes of data can become a real problem: our sailor may be lost forever in the oceanic immensity of badly arranged databases.
Moreover, registering a municipality in a table
Marinsis not a natural way to deal with such a problem. After all, a commune exists regardless of whether a sailor was born there.
A good way to deal with this problem is to recognize the independence of these municipalities, and to offer them their own table. In order to be able to reference these municipalities, we add a column to this table
ID, which will bear a number, unique in this table, and which will serve as our reference key.
So let’s create the
Communesfollowing table .
Table 3. Places of birth
Table 4. Table
Marinswith places of birth
|name||First name||Place of birth|
Example 16. Joined request on Marins and Communes
select Marins.nom, Marins.prenom, Communes.nom from Marins join Communes on Marins.id_commune = Communes.id order by Marins.nom;
LAST NAME SURNAME LAST NAME ----------------------- --------------------------- --------------- Auguin Christophe Granville Bart Jean Dunkirk Cartier Jacques St Malo Dumont d'Urville Jules Condé Dupleix Joseph Landrecies Moitessier Bernard Saigon Montcalm Louis Nîmes Surcouf Robert Saint Malo Tabarly Eric Nantes
Let us detail here the novelties of this request.
- Each field is preceded by the name of the table to which it belongs. This makes it possible to remove ambiguities in the case where two tables have fields with the same name. We could have omitted it for the field
Marins.prenom, since there is only one field
prenomin the two tables.
- The keyword
fromis followed by the name of the main table of the request:
- The table
Communesis joined to the table
Marinby the keyword
on. The join is set by equality between fields
Formally, a join between two tables is a Cartesian product between these two tables. The Cartesian product between a set A and a set B is the set of all the pairs formed by an element of A and an element of B. The number of rows of the joined table is therefore the product of the number of rows initial tables.
The selection of rows in the joined table is then done in a conventional way, as if it were a normal table.
Of course, this joined table is just one way of reasoning. In no case does the server build it: if it were, it would have to manage tables of enormous size, even for small queries.
Note the old SQL syntax for writing this join. We must clearly prefer the new (by the way not that new …) syntax above. It will allow us to handle null values explicitly, which is not the case with this old syntax.
Example 17. Joined request on Marins and Communes – 2
select Sailors.name, Sailors.prenom, Communes.lastname from Sailors, Communes where Sailors.id_commune = Communes.id order by Sailors.name;
The way we have dealt with the problem of the association between a sailor and his birth municipality is absolutely correct, but it needs to be secured. Indeed, two problems can arise:
- what happens if the column
IDis not entered for a given municipality?
- what happens if several municipalities have the same
- what happens if you create a sailor with an incorrect value of
- what happens if we delete a municipality referenced by a cell
To each of these questions an SQL answer will provide.
Communesas a primary key answers the first two queries. This notion is central in SQL, and we will review it in detail. Let us simply indicate here that a column declared as being a primary key of a table cannot take null values, and cannot carry duplicates.
Note however, that it is possible in SQL to prevent the presence of duplicates in a column, as well as the presence of null values, without using the primary key mechanism.
The last two questions are a little more subtle: we want to establish a double constraint between a column of a table, and a column of another table. Indeed, when inserting a new sailor, we want
id_commune, if it is not zero, necessarily reference a value from the column
IDof the table
Communes, and that conversely, it is not possible to delete a municipality if it is referenced by a sailor.
The tool that SQL provides us to do this and the foreign key . A column can be declared to be a foreign key. In this case, the values it carries must match those of another column, usually another table. If, during creation, the inserted value does not match any value, then the insertion fails. On the other side of the relation, if one tries to erase the referenced value, an error is generated, which can be managed by application or not.