13.4 Optimizing the Logical Database Structure

This section describes techniques for organizing data that can help you achieve better query performance.

13.4.1 Choosing Appropriate Table Types

When creating a table, ask yourself what types of queries you'll use it for. Then choose a table type that uses a locking level appropriate for the anticipated query mix. MyISAM table-level locking works best for a query mix that is heavily skewed toward retrievals and includes few updates. Use InnoDB if you must process a query mix containing many updates. InnoDB's use of row-level locking and multi-versioning provides good concurrency for a mix of retrievals and updates. One query can update rows while other queries read or update different rows of the table.

If you're using MyISAM tables, choose their structure to reflect whether you consider efficiency of processing speed or disk usage to be more important. Different MyISAM storage formats have different performance characteristics. This influences whether you choose fixed-length or variable-length columns to store string data:

Use fixed-length columns (CHAR) for best speed. Fixed-length columns allow MySQL to create the table with fixed-length rows. The advantage is that fixed-length rows all are stored in the table at positions that are a multiple of the row length and can be looked up very quickly. The disadvantage is that fixed-length values are always the same length even for values that do not use the full width of the column, so the column takes more storage space.
Use variable-length columns (VARCHAR, TEXT, BLOB) for best use of disk space. For example, values in a VARCHAR column take only as much space as necessary to store each value and on average use less storage than a CHAR column. The disadvantage is that variable-length columns result in variable-length rows. These are not stored at fixed positions within the table, so they cannot be retrieved as quickly as fixed-length rows. In addition, the contents of variable-length rows might not even be stored all in one place, another source of processing overhead.

For InnoDB tables, it is also true that CHAR columns take more space on average than VARCHAR. But there is no retrieval speed advantage for InnoDB as there is with MyISAM, because the InnoDB engine implements storage for both CHAR and VARCHAR in a similar way. In fact, retrieval of CHAR values might be slower because on average they require more information to be read from disk.

If a MyISAM table contains a mix of fixed-length and variable-length columns, the table format will be dynamic. However, if many of the queries on the table access only its fixed-length columns, it is sometimes possible to gain advantages both of static tables (faster retrieval) and of dynamic tables (lower storage requirements) by splitting the table into two tables. Use a fixed-format table to hold the fixed-length columns and a dynamic-format table to hold the variable-length columns. To split the table into two, use this procedure:

Make sure that the table contains a primary key that allows each record to be uniquely identified. (You might use an AUTO_INCREMENT column, for example.)
Create a second table that has columns for all the variable-length columns in the original table, plus a column to store values from the primary key of the original table. (This column should be a primary key as well, but should not be an AUTO_INCREMENT column.)
Copy the primary key column and the variable-length columns from the original table to the second table. The second table will be dynamic.
Use ALTER TABLE to drop the variable-length columns (but not the primary key) from the original table. MySQL will notice that the table no longer contains any variable-length columns and convert it to static format.

After modifying the table structure this way, queries that retrieve only fixed-width columns can use the static table, and will be quicker. For queries that retrieve both fixed-width and variable-width columns, join the two tables using the primary key values to match up rows.

Another option with MyISAM tables is to use compressed read-only tables.

For more information about MyISAM table structure, see section 14.2.1, "MyISAM Storage Formats."

MERGE tables can use a mix of compressed and uncompressed tables. This can be useful for time-based records. For example, if you log records each year to a different log file, you can use an uncompressed log table for the current year so that you can update it, but compress the tables for past years to save space. If you then create a MERGE table from the collection, you can easily run queries that search all tables together.

13.4.2 Using Summary Tables

Suppose that you run an analysis consisting of a set of retrievals that each perform a complex SELECT of a set of records (perhaps using an expensive join), and that differ only in the way they summarize the records. That's inefficient because it unnecessarily does the work of selecting the records repeatedly. A better technique is to select the records once, and then use them to generate the summaries. In such a situation, consider the following strategy:

Select the set of to-be-summarized records into a temporary table. In MySQL, you can do this easily with a CREATE TEMPORARY TABLE … SELECT statement.
Create any appropriate indexes on the temporary table.
Calculate the summaries using the temporary table.

The following example creates a summary table containing the average GNP value of countries in each continent. Then it compares the summary information to individual countries to find those countries with a GNP much less than the average and much more than the average.

First, create the summary table:


mysql> CREATE TABLE ContinentGNP

    -> SELECT Continent, AVG(GNP) AS AvgGNP

    -> FROM Country GROUP BY Continent;

mysql> SELECT * FROM ContinentGNP;

+---------------+---------------+

| Continent     | AvgGNP        |

+---------------+---------------+

| Asia          | 150105.725490 |

| Europe        | 206497.065217 |

| North America | 261854.789189 |

| Africa        |  10006.465517 |

| Oceania       |  14991.953571 |

| Antarctica    |      0.000000 |

| South America | 107991.000000 |

+---------------+---------------+

Next, compare the summary table to the original table to find countries that have a GNP less than 1% of the continental average:


mysql> SELECT

    ->     Country.Continent, Country.Name,

    ->     Country.GNP AS CountryGNP,

    ->     ContinentGNP.AvgGNP AS ContinentAvgGNP

    -> FROM Country, ContinentGNP

    -> WHERE

    ->     Country.Continent = ContinentGNP.Continent

    ->     AND Country.GNP < ContinentGNP.AvgGNP * .01

    -> ORDER BY Country.Continent, Country.Name;

+-----------+-------------------------------+------------+-----------------+

| Continent | Name                          | CountryGNP | ContinentAvgGNP |

+-----------+-------------------------------+------------+-----------------+

| Asia      | Bhutan                        |     372.00 |   150105.725490 |

| Asia      | East Timor                    |       0.00 |   150105.725490 |

| Asia      | Laos                          |    1292.00 |   150105.725490 |

| Asia      | Maldives                      |     199.00 |   150105.725490 |

| Asia      | Mongolia                      |    1043.00 |   150105.725490 |

| Europe    | Andorra                       |    1630.00 |   206497.065217 |

| Europe    | Faroe Islands                 |       0.00 |   206497.065217 |

| Europe    | Gibraltar                     |     258.00 |   206497.065217 |

| Europe    | Holy See (Vatican City State) |       9.00 |   206497.065217 |

| Europe    | Liechtenstein                 |    1119.00 |   206497.065217 |

...

Use the summary table again to find countries that have a GNP more than 10 times the continental average:


mysql> SELECT

    ->     Country.Continent, Country.Name,

    ->     Country.GNP AS CountryGNP,

    ->     ContinentGNP.AvgGNP AS ContinentAvgGNP

    -> FROM Country, ContinentGNP

    -> WHERE

    ->     Country.Continent = ContinentGNP.Continent

    ->     AND Country.GNP > ContinentGNP.AvgGNP * 10

    -> ORDER BY Country.Continent, Country.Name;

+---------------+---------------+------------+-----------------+

| Continent     | Name          | CountryGNP | ContinentAvgGNP |

+---------------+---------------+------------+-----------------+

| Asia          | Japan         | 3787042.00 |   150105.725490 |

| Europe        | Germany       | 2133367.00 |   206497.065217 |

| North America | United States | 8510700.00 |   261854.789189 |

| Africa        | South Africa  |  116729.00 |    10006.465517 |

| Oceania       | Australia     |  351182.00 |    14991.953571 |

+---------------+---------------+------------+-----------------+

The technique of using a summary table has several benefits:

Calculating the summary information a single time reduces the overall computational burden by eliminating most of the repetition involved in performing the initial record selection.
If the original table is a type that is subject to table-level locking, such as a MyISAM table, using a summary table leaves the original table available more of the time for updates by other clients by reducing the amount of time that the table remains locked.
If the summary table is small enough that it's reasonable to hold in memory, you can increase performance even more by making it a HEAP table. Queries on the table will be especially fast because they require no disk I/O. When the HEAP table no longer is needed, drop it to free the memory allocated for it.
Some queries are difficult or impossible to perform without using a summary table. For example, you cannot compute a summary from a set of rows and compare each row to the summarized value within a single query. However, you can use a summary table and join it to the original table to do this.

Use of summary tables has the disadvantage that the records they contain are up-to-date only as long as the original values remain unchanged, and thus so are any summaries calculated from them. If the original table rarely or never changes, this might be only a minor concern. For many applications, summaries that are close approximations are sufficiently accurate.

The summary table technique can be applied at multiple levels. Create a summary table that holds the results of an initial summary, and then summarize that table in different ways to produce secondary summaries. This avoids the computational expense of generating the initial summary repeatedly.

When a summary consists of a single value, you need not create a table at all. Use a SQL variable to hold the value. Then you can use the value for comparison purposes in subsequent queries without having to calculate it again.

< Day Day Up >