Foreign Keys and Referential Integrity
A foreign key relationship allows you to declare that an index in one table is related to an index in another. It also allows you to place constraints on what may be done to the tables in the relationship. The database enforces the rules of this relationship to maintain referential integrity. For example, the score table in the sampdb sample database contains a student_id column, which we use to relate score records to students in the student table. When we created these tables in Chapter 1, we set up some explicit relationships between them. One of these was that we declared score.student_id to be a foreign key for the student.student_id column. That prevents a record from being entered into the score table unless its student_id value exists in the student table. In other words, the foreign key prevents entry of scores for non-existent students.
Foreign keys are not useful just for record entry, but for deletes and updates as well. For example, we could set up a constraint such that if a student is deleted from the student table, all corresponding records for the student in the score table are deleted automatically as well. This is called "cascaded delete," because the effect of the delete cascades from one table to another. Cascaded update is possible as well. For example, with cascaded update, changing a student's student_id value in the student table also changes the value in the student's corresponding score table records.
Foreign keys help maintain the consistency of your data, and they provide a certain measure of convenience. Without foreign keys, you are responsible for keeping track of inter-table dependencies and maintaining their consistency from within your applications. In some cases, doing this might not be much more work than issuing a few extra DELETE statements to make sure that when you delete a record from one table, you also delete the corresponding records in any related tables. But it is extra work, and if the database engine will perform consistency checks for you, why not let it? Automatic checking capability becomes especially useful if your tables have particularly complex relationships. You likely will not want to be responsible for implementing these dependencies in your applications.
In MySQL, the InnoDB storage engine provides foreign key support. This section describes how to set up InnoDB tables to define foreign keys, and how foreign keys affect the way you use tables. First, it's necessary to define some terms:
InnoDB enforces these rules to guarantee that the foreign key relationship stays intact with no mismatches. This is called "referential integrity."
The syntax for defining a foreign key in a child table is as follows, with optional parts shown in square brackets:
[CONSTRAINT constraint_name] FOREIGN KEY [index_name] (index_columns) REFERENCES tbl_name (index_columns) [ON DELETE action] [ON UPDATE action] [MATCH FULL | MATCH PARTIAL | MATCH SIMPLE]
Note that although all parts of this syntax are parsed, InnoDB does not implement the semantics for all the clauses: The MATCH clause is not supported and is ignored if you specify it. Also, some action values are recognized but have no effect. InnoDB pays attention to the following parts of the definition:
To set up a foreign key relationship, follow these guidelines:
In Chapter 1, we created tables for the grade-keeping project that have simple foreign key relationships. Now let's work through an example that is more complex. Begin by creating tables named parent and child, such that the child table contains a foreign key that references the par_id column in the parent table:
CREATE TABLE parent ( par_id INT NOT NULL, PRIMARY KEY (par_id) ) ENGINE = INNODB; CREATE TABLE child ( par_id INT NOT NULL, child_id INT NOT NULL, PRIMARY KEY (par_id, child_id), FOREIGN KEY (par_id) REFERENCES parent (par_id) ON DELETE CASCADE ON UPDATE CASCADE ) ENGINE = INNODB;
The foreign key in this case uses ON DELETE CASCADE to specify that when a record is deleted from the parent table, MySQL also should remove child records with a matching par_id value automatically. ON UPDATE CASCADE indicates that if a parent record par_id value is changed, MySQL also should change any matching par_id values in the child table to the new value.
Now insert a few records into the parent table, and then add some records to the child table that have related key values:
mysql> INSERT INTO parent (par_id) VALUES(1),(2),(3); mysql> INSERT INTO child (par_id,child_id) VALUES(1,1),(1,2); mysql> INSERT INTO child (par_id,child_id) VALUES(2,1),(2,2),(2,3); mysql> INSERT INTO child (par_id,child_id) VALUES(3,1);
These statements result in the following table contents, where each par_id value in the child table matches a par_id value in the parent table:
mysql> SELECT * FROM parent; +--------+ | par_id | +--------+ | 1 | | 2 | | 3 | +--------+ mysql> SELECT * FROM child; +--------+----------+ | par_id | child_id | +--------+----------+ | 1 | 1 | | 1 | 2 | | 2 | 1 | | 2 | 2 | | 2 | 3 | | 3 | 1 | +--------+----------+
mysql> INSERT INTO child (par_id,child_id) VALUES(4,1); ERROR 1216 (23000): Cannot add or update a child row: a foreign key constraint fails
mysql> DELETE FROM parent WHERE par_id = 1;
MySQL deletes the record from the parent table:
mysql> SELECT * FROM parent; +--------+ | par_id | +--------+ | 2 | | 3 | +--------+
In addition, it cascades the effect of the DELETE statement to the child table:
mysql> SELECT * FROM child; +--------+----------+ | par_id | child_id | +--------+----------+ | 2 | 1 | | 2 | 2 | | 2 | 3 | | 3 | 1 | +--------+----------+
mysql> UPDATE parent SET par_id = 100 WHERE par_id =2; mysql> SELECT * FROM parent; +--------+ | par_id | +--------+ | 3 | | 100 | +--------+ mysql> SELECT * FROM child; +--------+----------+ | par_id | child_id | +--------+----------+ | 3 | 1 | | 100 | 1 | | 100 | 2 | | 100 | 3 | +--------+----------+
The preceding example shows how to arrange for deletes or updates of a parent record to cause cascaded deletes or updates of any corresponding child records. The ON DELETE and ON UPDATE clauses allow for other actions. For example, one possibility is to let the child records remain in the table but have their foreign key columns set to NULL. To do this, it's necessary to make several changes to the definition of the child table:
To see the effect of these changes, re-create the parent table using the original definition and load the same initial records into it. Then create the child table using the new definition shown here:
CREATE TABLE child ( par_id INT NULL, child_id INT NOT NULL, UNIQUE (par_id, child_id), FOREIGN KEY (par_id) REFERENCES parent (par_id) ON DELETE SET NULL ON UPDATE SET NULL ) ENGINE = INNODB;
With respect to inserting new records, the child table behaves the same. That is, it allows insertion of records with par_id values found in the parent table, but disallows entry of values that aren't listed there:
mysql> INSERT INTO child (par_id,child_id) VALUES(1,1),(1,2); mysql> INSERT INTO child (par_id,child_id) VALUES(2,1),(2,2),(2,3); mysql> INSERT INTO child (par_id,child_id) VALUES(3,1); mysql> INSERT INTO child (par_id,child_id) VALUES(4,1); ERROR 1216 (23000): Cannot add or update a child row: a foreign key constraint fails
mysql> DELETE FROM parent WHERE par_id = 1; mysql> SELECT * FROM child; +--------+----------+ | par_id | child_id | +--------+----------+ | NULL | 1 | | NULL | 2 | | 2 | 1 | | 2 | 2 | | 2 | 3 | | 3 | 1 | +--------+----------+
In this case, the child records that had 1 in the par_id column are not deleted. Instead, the par_id column is set to NULL, as specified by the ON DELETE SET NULL constraint.
Updating a parent record has a similar effect:
mysql> UPDATE parent SET par_id = 100 WHERE par_id = 2; mysql> SELECT * FROM child; +--------+----------+ | par_id | child_id | +--------+----------+ | NULL | 1 | | NULL | 1 | | NULL | 2 | | NULL | 2 | | NULL | 3 | | 3 | 1 | +--------+----------+
To see what foreign key relationships an InnoDB table has, use the SHOW CREATE TABLE or SHOW TABLE STATUS statement.
Living Without Foreign Keys
If your MySQL server doesn't have InnoDB support, or you are using another storage engine because you need features that InnoDB does not support (such as FULLTEXT indexes or spatial data types), you cannot take advantage of foreign keys. What should you do to maintain the integrity of relationships between your tables?
The constraints that foreign keys enforce often are not difficult to implement through application logic. Sometimes, it's simply a matter of how you approach the data entry process. Consider the student and score tables from the grade-keeping project. These are related by a foreign key relationship through the student_id values in each table. Suppose that we had created these as MyISAM tables rather than as InnoDB tables. MyISAM does not support foreign keys, so in this case the relationship between the tables would be implicit rather than explicit. When you administer a test or quiz and have a new set of scores to add to the database, you'd have to make sure that you don't add score records with student_id values that are not listed in the student table.
In some respects, this is simply a matter of taking the proper approach to data entry. To avoid inserting scores for non-existent students, the way you'd enter a set of scores probably would be to use an application that begins with a list of students from the student table. For each one, it would take the score and use the student's ID number to generate a new score table record. With this procedure, you would never enter a record for a student that doesn't exist. Nevertheless, it would still be possible to enter a bad record, for example, if you issued an INSERT statement manually. (With InnoDB tables and foreign keys, no such possibility exists.)
What about the case where you delete a student record? Suppose that you want to delete student number 13. This also implies you want to delete any score records for that student. With a foreign key relationship in place that specifies cascading delete, you'd simply delete the student table record with the following statement and let MySQL take care of removing the corresponding score table records automatically:
DELETE FROM student WHERE student_id = 13;
DELETE FROM student WHERE student_id = 13; DELETE FROM score WHERE student_id = 13;
Another way to do this is to use a multiple-table delete that achieves the same effect as a cascaded delete with a single query. But watch out for a subtle trap. The following statement appears to do the trick, but it's actually not quite correct:
DELETE student, score FROM student, score WHERE student.student_id = 13 AND student.student_id = score.student_id;
The problem with this statement is that it will fail in the case where the student doesn't have any scores. The WHERE clause will find no matches and thus will not delete anything from the student table. In this case, a LEFT JOIN is more appropriate, because it will identify which student table record to delete even in the absence of any matching score table records:
DELETE student, score FROM student LEFT JOIN score USING (student_id) WHERE student.student_id = 13;