Course 10: Selection on several tables in SQL

Oct 2, 2021
Course 12: Data Type in SQL

A database very rarely has a single table. In general, the data is spread over many tables, and extracting what you want in an intelligible format requires complex, and sometimes expensive, queries.

The notion that we present now in this introductory chapter is the notion of join. Establishing a join between two tables consists in crossing the information carried by each of these tables, according to very precise criteria. Understanding this notion is essential in SQL.



3.1. Adding place of birth



Let’s continue on our example, adding to our table, the town where our sailors were born.


Table 2. Places of birth

name First name Place of birth
Auguin Christophe Granville
Bart Jeans Dunkirk
Cartier Jacques Saint Malo
Dumont d’Urville Jules Condé
Dupleix Joseph Landrecies
Moitessier Bernard Saigon
Montcalm Louis Nimes
Surcouf Robert Saint Malo
Tabarly Eric Nantes

The naive approach would be to add a column to our table, of type varchar, and to populate this column with this new data. This approach could certainly work, but it would pose a problem from the outset. In our very simple example, out of 9 sailors, we already have 2 who were born in the same place: Saint Malo. We can assume that if we had thousands of sailors in our table, the number of duplicates in this column would be quite large.

Processing a query like “I want to extract all sailors born in Saint Malo” would work trivially, by selecting all “Saint Malo” values ​​in the “place of birth” column. It is better that the entry of the values ​​of this column happens without error, because if the value “Saint Malau” has been entered instead of “Saint Malo”, the associated sailor will never be selected by such a request! Worse than that, detecting this kind of errors on very large volumes of data can become a real problem: our sailor may be lost forever in the oceanic immensity of badly arranged databases.

Moreover, registering a municipality in a table Marinsis not a natural way to deal with such a problem. After all, a commune exists regardless of whether a sailor was born there.

A good way to deal with this problem is to recognize the independence of these municipalities, and to offer them their own table. In order to be able to reference these municipalities, we add a column to this table ID, which will bear a number, unique in this table, and which will serve as our reference key.

So let’s create the Communesfollowing table .

Example 14. Creating the Commons table

create   table Communes (
   id  int,
   name,   varchar (32)

Table 3. Places of birth

ID Municipality name
1 Condé
2 Dunkirk
3 Landrecies
4 Nantes
5 Nimes
6 Saigon
7 Saint Malo
8 Granville

Once this table has been created and filled in, we need to add a column to the table Marins.

Example 15. Adding the column id_commune

alter   table Sailors   add (common_id   int );

All we have to do is fill in the table Marinswith the following values.

Table 4. Table Marinswith places of birth

name First name Place of birth
Auguin Christophe 8
Bart Jeans 2
Cartier Jacques 7
Dumont d’Urville Jules 1
Dupleix Joseph 3
Moitessier Bernard 6
Montcalm Louis 5
Surcouf Robert 7
Tabarly Eric 4

Once our tables have been updated, we can write a query on the table Marinsand the table Communes, which displays the name of each sailor and his town of birth.

Example 16. Joined request on Marins and Communes

select Marins.nom, Marins.prenom, Communes.nom
     from Marins
     join Communes   on Marins.id_commune = Communes.id
     order   by Marins.nom;

The result is of course the following.

----------------------- --------------------------- ---------------
Auguin Christophe Granville
Bart Jean Dunkirk
Cartier Jacques St Malo
Dumont d'Urville Jules Condé
Dupleix Joseph Landrecies
Moitessier Bernard Saigon
Montcalm Louis Nîmes
Surcouf Robert Saint Malo
Tabarly Eric Nantes

Let us detail here the novelties of this request.

  • Each field is preceded by the name of the table to which it belongs. This makes it possible to remove ambiguities in the case where two tables have fields with the same name. We could have omitted it for the field Marins.prenom, since there is only one field prenomin the two tables.
  • The keyword fromis followed by the name of the main table of the request: Marins.
  • The table Communesis joined to the table Marinby the keyword on. The join is set by equality between fields idand id_communetables Communesand Marinsrespectively

Formally, a join between two tables is a Cartesian product between these two tables. The Cartesian product between a set A and a set B is the set of all the pairs formed by an element of A and an element of B. The number of rows of the joined table is therefore the product of the number of rows initial tables.

The selection of rows in the joined table is then done in a conventional way, as if it were a normal table.

Of course, this joined table is just one way of reasoning. In no case does the server build it: if it were, it would have to manage tables of enormous size, even for small queries.

Note the old SQL syntax for writing this join. We must clearly prefer the new (by the way not that new …) syntax above. It will allow us to handle null values ​​explicitly, which is not the case with this old syntax.

Example 17. Joined request on Marins and Communes – 2

select Sailors.name, Sailors.prenom, Communes.lastname
     from Sailors, Communes 
     where Sailors.id_commune = Communes.id 
     order   by Sailors.name;

3.2. Primary keys and foreign keys

The way we have dealt with the problem of the association between a sailor and his birth municipality is absolutely correct, but it needs to be secured. Indeed, two problems can arise:

  • what happens if the column IDis not entered for a given municipality?
  • what happens if several municipalities have the same ID?
  • what happens if you create a sailor with an incorrect value of id_commune?
  • what happens if we delete a municipality referenced by a cell id_commune?

To each of these questions an SQL answer will provide.

Declaring the IDtable column Communesas a primary key answers the first two queries. This notion is central in SQL, and we will review it in detail. Let us simply indicate here that a column declared as being a primary key of a table cannot take null values, and cannot carry duplicates.

Note however, that it is possible in SQL to prevent the presence of duplicates in a column, as well as the presence of null values, without using the primary key mechanism.

The last two questions are a little more subtle: we want to establish a double constraint between a column of a table, and a column of another table. Indeed, when inserting a new sailor, we want id_commune, if it is not zero, necessarily reference a value from the column IDof the table Communes, and that conversely, it is not possible to delete a municipality if it is referenced by a sailor.

The tool that SQL provides us to do this and the foreign key . A column can be declared to be a foreign key. In this case, the values ​​it carries must match those of another column, usually another table. If, during creation, the inserted value does not match any value, then the insertion fails. On the other side of the relation, if one tries to erase the referenced value, an error is generated, which can be managed by application or not.

Leave a Reply

Your email address will not be published. Required fields are marked *