In a relational database, each table typically stores information about a single entity, such as ‘Student,’ ‘School,’ or ‘Teacher.’ The process of combining information stored in rows across multiple tables is achieved through a method called table join. The ANSI SQL standard defines the JOIN ... ON command to execute table joins.

Create and Query Related Tables

We establish relationships between tables using the PRIMARY KEY and REFERENCES keywords.

CREATE TABLE user (
    id SERIAL,
    CONSTRAINT user_id_pk PRIMARY KEY (id);
);

CREATE TABLE user_profile (
    // ... some other fields like id, first_name, last_name, birth etc.
  user_id INTEGER,
  CONSTRAINT fk_user_id FOREIGN KEY (user_id) REFERENCES user(id),
);

Primary key is a non null value that have to be unique for each row in the table.

user table has a primary key id, while user_profile has user_id field that is a reference to id field on the user table.

FOREIGN KEY requires that it’s values already exists in the columns it references. It can reference only columns that have unique values for each row e.g. primary keys.

Now, to find user and his profile you could use the WHERE clause like this:

// BAD PRACTICE EXAMPLE!
SELECT *
FROM user, user_profile
WHERE user.id = user_profile.user_id;

While this notation is valid and can be used, it’s more common and considered good practice to use explicit join syntax with the JOIN keyword, as it makes the query more readable and helps in avoiding accidental cross joins. The explicit join syntax is especially important when dealing with more complex queries involving multiple tables.

The proper way of joining tables goes as follows:

SELECT *
FROM user JOIN user_profile
ON user.id = user_profile.user_id;

This query will return columns from both tables where the expression in the ON clause evaluates to true.

Understand JOIN types.

When you use JOIN ... ON with an asterisk after SELECT, the results will include all columns from both tables. This means both the primary key and foreign key will be included, and both of those columns will consist of the same duplicated data.

Join types allow you to choose how and when rows from tables are returned.

To illustrate this, I will use the example of two simple tables. Imagine you have an online shop, and in your database, you store products that are available in a certain year.

Table 1

idproduct_names_2020
1milk
2bread
3butter
4cheese
5soda

Table 2

idproduct_names_2021
2bread
3butter
6ham

JOIN and INNER JOIN

This is the most commonly used type of join. The result includes only the rows that have matching values in the specified columns. If there is no match, the row from both tables will be excluded from the result set

SELECT * 
FROM product_names_2020 
JOIN product_names_2021 
ON product_names_2020.id = product_names_2021.id;

This will result in:

id product_names_2020 id product_names_2021
-- ------------------ -- ------------------
 2              bread  2              bread
 3             butter  3             butter

This is the expected outcome from a join query. Rows that don’t have the same IDs in both tables are not included in the result.

JOIN with USING

If you are using identical column names in the join ON clause, you can use the USING keyword to simplify the syntax and remove redundant table. When joining tables on multiple columns, you can list them separated with commas in parentheses like this: USING (col_a, col_b).

SELECT * 
FROM product_names_2020 
JOIN product_names_2021 
USING (id);
id product_names_2020 product_names_2021
-- ------------------ ------------------
 2              bread              bread
 3             butter             butter

LEFT JOIN and RIGHT JOIN

This join will return all rows from one side of the ON clause, even if they don’t have a match in the other table.

The left table is the one listed first in the FROM clause, and the right table is the one listed after the JOIN keyword.

SELECT * 
FROM product_names_2020 
LEFT JOIN product_names_2021
ON product_names_2020.id = product_names_2021.id;
id product_names_2020 id product_names_2021
-- ------------------ -- ------------------
 1               milk                 
 2              bread  2              bread
 3             butter  3             butter
 4             cheese               
 5               soda  

LEFT JOIN and RIGHT JOIN can be used:

  • when you want to get all rows from one of the tables
  • you are looking for missing values
  • When you know some rows in a joined table don’t have matching values.

FULL OUTER JOIN

You use this command if you want to see every row from both tables

SELECT * 
FROM product_names_2020 
FULL OUTER JOIN product_names_2021
ON product_names_2020.id = product_names_2021.id;
id product_names_2020 id product_names_2021
-- ------------------ -- ------------------
 1               milk                 
 2              bread  2              bread
 3             butter  3             butter
 4             cheese                     
 5               soda                     
                       6                ham  

CROSS JOIN

A cross join query, also known as a cartesian product, lines up each row from the left table with each row from the right table. Because a cross join doesn’t need to find matches, you can omit the ON clause.

SELECT * 
FROM product_names_2020 
FULL OUTER JOIN product_names_2021;
id product_names_2020 id product_names_2021
-- ------------------ -- ------------------
 1               milk  2              bread
 1               milk  3             butter
 1               milk  6                ham          
 2              bread  2              bread
 2              bread  3             butter
 2              bread  6                ham 
 3             butter  2              bread
 3             butter  3             butter
 3             butter  6                ham
 4             cheese  2              bread
 4             cheese  3             butter  
 4             cheese  6                ham  
 5               soda  2              bread 
 5               soda  3             butter     
 5               soda  6                ham                        

NULL value

NULL is a special value that indicates a lack of data. It is not to be confused with 0, '', or false, as they can have different meanings behind them. Unlike 0, '', or false, NULL can be used across datatypes. When a JOIN query returns empty rows, they come with a value of NULL. This means we can search for NULL values using a combination of ¨C49C and ¨C50C.

SELECT *
FROM product_names_2020
LEFT JOIN product_names_2021
ON product_names_2020.id = product_names_2021.id
WHERE product_names_2021.id IS NULL;
id product_names_2020 id product_names_2021
-- ------------------ -- ------------------
 1               milk                 
 4             cheese               
 5               soda