Previous Page
Next Page

Understanding Joins

One of SQL's most powerful features is the capability to join tables on-the-fly within data retrieval queries. Joins are one of the most important operations you can perform using SQL SELECT, and a good understanding of joins and join syntax is an extremely important part of learning SQL.

Before you can effectively use joins, you must understand relational tables and the basics of relational database design. What follows is by no means a complete coverage of the subject, but it should be enough to get you up and running.

Understanding Relational Tables

The best way to understand relational tables is to look at a realworld example.

Suppose you had a database table containing a product catalog, with each catalog item in its own row. The kind of information you would store with each item would include a product description and price, along with vendor information about the company that creates the product.

Now suppose you had multiple catalog items created by the same vendor. Where would you store the vendor information (things such as vendor name, address, and contact information)? You wouldn't want to store that data along with the products for several reasons:

  • Because the vendor information is the same for each product that vendor produces, repeating the information for each product is a waste of time and storage space.

  • If vendor information changes (for example, if the vendor moves or his area code changes), you would need to update every occurrence of the vendor information.

  • When data is repeated (that is, the vendor information is used with each product), there is a high likelihood that the data will not be entered exactly the same way each time. Inconsistent data is extremely difficult to use in reporting.

The key here is that having multiple occurrences of the same data is never a good thing, and that principle is the basis for relational database design. Relational tables are designed so information is split into multiple tables, one for each data type. The tables are related to each other through common values (and thus the relational in relational design).

In our example, you can create two tables, one for vendor information and one for product information. The vendors table contains all the vendor information, one table row per vendor, along with a unique identifier for each vendor. This value, called a primary key, can be a vendor ID, or any other unique value. (Primary keys were first mentioned in Chapter 1, "Understanding SQL").

The products table stores only product information, and no vendor specific information other than the vendor ID (the vendors table's primary key). This key, called a foreign key, relates the vendors table to the products table, and using this vendor ID enables you to use the vendors table to find the details about the appropriate vendor.

New Term

Foreign Key A column in one table that contains the primary key values from another table, thus defining the relationships between tables.


What does this do for you? Well, consider the following:

  • Vendor information is never repeated, and so time and space are not wasted.

  • If vendor information changes, you can update a single record in the vendors table. Data in related tables does not change.

  • As no data is repeated, the data used is obviously consistent, making data reporting and manipulation much simpler.

The bottom line is that relational data can be stored efficiently and manipulated easily. Because of this, relational databases scale far better than non-relational databases.

New Term

Scale Able to handle an increasing load without failing. A well-designed database or application is said to scale well.


Why Use Joins?

As just explained, breaking data into multiple tables enables more efficient storage, easier manipulation, and greater scalability. But these benefits come with a price.

If data is stored in multiple tables, how can you retrieve that data with a single SELECT statement?

The answer is to use a join. Simply put, a join is a mechanism used to associate tables within a SELECT statement (and thus the name join). Using a special syntax, multiple tables can be joined so a single set of output is returned, and the join associates the correct rows in each table on-the-fly.

Note

Maintaining Referential Integrity It is important to understand that a join is not a physical entityin other words, it does not exist in the actual database tables. A join is created by MySQL as needed, and it persists for the duration of the query execution.

When using relational tables, it is important that only valid data is inserted into relational columns. Going back to the example, if products were stored in the products table with an invalid vendor ID (one not present in the vendors table), those products would be inaccessible because they would not be related to any vendor.

To prevent this from occurring, MySQL can be instructed to only allow valid values (ones present in the vendors table) in the vendor ID column in the products table. This is known as maintaining referential integrity, and is achieved by specifying the primary and foreign keys as part of the table definitions (as will be explained in Chapter 21, "Creating and Manipulating Tables").



Previous Page
Next Page