In a relational database, each table typically stores information about a single entity, such as ‘Student,’ ‘School,’ or ‘Teacher.’ The process of combining information stored in rows across multiple tables is achieved through a method called table join. The ANSI SQL standard defines the JOIN ... ON
command to execute table joins.
Create and Query Related Tables
We establish relationships between tables using the PRIMARY KEY
and REFERENCES
keywords.
CREATE TABLE user (
id SERIAL,
CONSTRAINT user_id_pk PRIMARY KEY (id);
);
CREATE TABLE user_profile (
// ... some other fields like id, first_name, last_name, birth etc.
user_id INTEGER,
CONSTRAINT fk_user_id FOREIGN KEY (user_id) REFERENCES user(id),
);
Primary key is a non null value that have to be unique for each row in the table.
user
table has a primary key id
, while user_profile
has user_id
field that is a reference to id
field on the user
table.
FOREIGN KEY
requires that it’s values already exists in the columns it references. It can reference only columns that have unique values for each row e.g. primary keys.
Now, to find user and his profile you could use the WHERE
clause like this:
// BAD PRACTICE EXAMPLE!
SELECT *
FROM user, user_profile
WHERE user.id = user_profile.user_id;
While this notation is valid and can be used, it’s more common and considered good practice to use explicit join syntax with the JOIN
keyword, as it makes the query more readable and helps in avoiding accidental cross joins. The explicit join syntax is especially important when dealing with more complex queries involving multiple tables.
The proper way of joining tables goes as follows:
SELECT *
FROM user JOIN user_profile
ON user.id = user_profile.user_id;
This query will return columns from both tables where the expression in the ON
clause evaluates to true
.
Understand JOIN types.
When you use JOIN ... ON
with an asterisk after SELECT
, the results will include all columns from both tables. This means both the primary key and foreign key will be included, and both of those columns will consist of the same duplicated data.
Join types allow you to choose how and when rows from tables are returned.
To illustrate this, I will use the example of two simple tables. Imagine you have an online shop, and in your database, you store products that are available in a certain year.
Table 1
id | product_names_2020 |
---|---|
1 | milk |
2 | bread |
3 | butter |
4 | cheese |
5 | soda |
Table 2
id | product_names_2021 |
---|---|
2 | bread |
3 | butter |
6 | ham |
JOIN and INNER JOIN
This is the most commonly used type of join. The result includes only the rows that have matching values in the specified columns. If there is no match, the row from both tables will be excluded from the result set
SELECT *
FROM product_names_2020
JOIN product_names_2021
ON product_names_2020.id = product_names_2021.id;
This will result in:
id product_names_2020 id product_names_2021
-- ------------------ -- ------------------
2 bread 2 bread
3 butter 3 butter
This is the expected outcome from a join query. Rows that don’t have the same IDs in both tables are not included in the result.
JOIN with USING
If you are using identical column names in the join ON
clause, you can use the USING
keyword to simplify the syntax and remove redundant table. When joining tables on multiple columns, you can list them separated with commas in parentheses like this: USING (col_a, col_b)
.
SELECT *
FROM product_names_2020
JOIN product_names_2021
USING (id);
id product_names_2020 product_names_2021
-- ------------------ ------------------
2 bread bread
3 butter butter
LEFT JOIN and RIGHT JOIN
This join will return all rows from one side of the ON
clause, even if they don’t have a match in the other table.
The left table is the one listed first in the FROM
clause, and the right table is the one listed after the JOIN
keyword.
SELECT *
FROM product_names_2020
LEFT JOIN product_names_2021
ON product_names_2020.id = product_names_2021.id;
id product_names_2020 id product_names_2021
-- ------------------ -- ------------------
1 milk
2 bread 2 bread
3 butter 3 butter
4 cheese
5 soda
LEFT JOIN
and RIGHT JOIN
can be used:
- when you want to get all rows from one of the tables
- you are looking for missing values
- When you know some rows in a joined table don’t have matching values.
FULL OUTER JOIN
You use this command if you want to see every row from both tables
SELECT *
FROM product_names_2020
FULL OUTER JOIN product_names_2021
ON product_names_2020.id = product_names_2021.id;
id product_names_2020 id product_names_2021
-- ------------------ -- ------------------
1 milk
2 bread 2 bread
3 butter 3 butter
4 cheese
5 soda
6 ham
CROSS JOIN
A cross join query, also known as a cartesian product
, lines up each row from the left table with each row from the right table. Because a cross join doesn’t need to find matches, you can omit the ON
clause.
SELECT *
FROM product_names_2020
FULL OUTER JOIN product_names_2021;
id product_names_2020 id product_names_2021
-- ------------------ -- ------------------
1 milk 2 bread
1 milk 3 butter
1 milk 6 ham
2 bread 2 bread
2 bread 3 butter
2 bread 6 ham
3 butter 2 bread
3 butter 3 butter
3 butter 6 ham
4 cheese 2 bread
4 cheese 3 butter
4 cheese 6 ham
5 soda 2 bread
5 soda 3 butter
5 soda 6 ham
NULL value
NULL
is a special value that indicates a lack of data. It is not to be confused with 0
, ''
, or false
, as they can have different meanings behind them. Unlike 0
, ''
, or false
, NULL
can be used across datatypes. When a JOIN
query returns empty rows, they come with a value of NULL
. This means we can search for NULL
values using a combination of ¨C49C and ¨C50C.
SELECT *
FROM product_names_2020
LEFT JOIN product_names_2021
ON product_names_2020.id = product_names_2021.id
WHERE product_names_2021.id IS NULL;
id product_names_2020 id product_names_2021
-- ------------------ -- ------------------
1 milk
4 cheese
5 soda