Section 9.5. MyISAM

9.5. MyISAM

MyISAM is the original and default MySQL storage engine, an extension of the ISAM (Indexed Sequential Access Method) model developed by IBM. MyISAM tables are comprised of three distinct files on disk. A .frm file stores the table definition, a .MYD file stores the row data, and a .MYI file stores the indexes.

The MyISAM engine is very fast for reads and very fast for writes but not at the same time. MyISAM uses table-level locking, with three different lock types. Most read operations use a READ LOCAL lock that allows other reads to the table but blocks updates other than concurrent inserts (data inserted at the end of the table). READ locks, used mainly by the MySQL tools suite, are similar but also block concurrent inserts. The final lock type, WRITE, is used whenever an insert, update, or delete is needed and blocks all other reads and writes to the table. Because each table has a single lock, any writes will block reads and (mostly) vice versa. Writes also block other writes (apart from concurrent inserts), so the maximum level of write concurrency is low. MyISAM does not support transactions.

A very useful feature of MyISAM tables is the FULLTEXT index type. FULLTEXT indexes allow you to search easily over the contents of text columns, with MySQL taking care of the tokenization and matching. FULLTEXT indexes also support Boolean queries, allowing you to search for the presence, absence, and combination of terms:

mysql> SELECT * FROM articles WHERE MATCH (title,body)
    -> AGAINST ('+foo -"bar baz"' IN BOOLEAN MODE);

The FULLTEXT index is only available for MyISAM tables. For more information about using FULLTEXT indexes, you can consult the MySQL online manual at http://www.mysql.com/.

MyISAM is the only table type to support GIS (geographical and spatial) data using R-Tree indexes. This allows you to enter spatial data (such as the latitude and longitude of records) and search using bounding boxes and other spatial tools in a fast and efficient way.

9.5.1. InnoDB

The InnoDB storage engine provides a host of features missing in the MyISAM engine. InnoDB is fully ACID compliant (atomicity, consistency, isolation, durability) with support for transactions. InnoDB writes a transaction journal and can use it to recover from crashes by replaying the journal and rolling back uncommitted transactions. This avoids the problem of recovery time with MyISAM growing in relation to the size of the dataset. Recovery time after a crash in InnoDB is not fixed, but has a ceiling based on the frequency of the check pointing, rather than the size of the dataset.

Locking in InnoDB is achieved via MVCC (Multi-Versioned Concurrency Control), which allows fast row level locking without ever blocking reads. In a nutshell, MVCC works by keeping a copy of rows that are being modified by a transaction. Read requests read from this copy of the row while the transaction is writing new copies of rows. When the transaction is committed, the new rows replace the old copies as the new read source.

MVCC gives us a huge amount of concurrency in InnoDB tables. We can continue to read from tables while they're being written to and perform writes to different parts of the table without blocking each other. The downsides to InnoDB are fairly minortables take up to around three times as much space as their MyISAM equivalents (which can be a big issue for huge datasets), and we can't use FULLTEXT indexes.

We can, however, use foreign keys and features such as ON DELETE CASCADE to manage referential integrity. InnoDB stores indexes using B-trees with clustered primary keys. Unlike MyISAM, row data in InnoDB tables is ordered on disk by the primary key. This allows for much faster reads of data in sequential primary key order because the disk heads don't spend as much time seeking.

9.5.2. BDB

MySQL can store table data using the Berkeley DB system. BDB databases consist of a list of key value pairs with extremely fast access indexed by key. MySQL simulates full tables on top of BDB with the restriction that tables must have a primary index; otherwise, MySQL will create a hidden five byte auto incremented key for you. The BDB engine stores each row as a single key value pair with the primary index as the key. Indexes are then stored as more key value pairs with a value of the primary index.

BDB supports transactions, supported by a journaled transaction log and periodic check pointing, so recovery times after a table crash are similar to InnoDB. Because of the way the data is stored, sequential table scans are relatively slow.

Under the hood, BDB uses page-level locking for writes. The size of pages varies according to a number of things, including your particular MySQL version and the size of your data, but is comprised of some number of rows. This allows for a high concurrency of reads, as with InnoDB (but not as high because some reads will still have to wait to acquire a lock), but without the overhead of having to establish so many locks when updating a large number of rows.

9.5.3. Heap

The heap (also called "memory") table type keeps all table data in memory and doesn't persist it to disk. When you restart the MySQL server, heap tables will be empty. They will, however, keep their schema as this is persisted to disk; to remove a heap table, you'll have to DROP it like you would any other table type.

Heap tables use table-level locks but are typically very fast with good concurrency anyway since writes are so fast. Heap tables have support for hash indexes in addition to B-trees, allowing extremely fast lookups for constant terms. Of course, the downside is that the dataset must be small enough to fit into memory. A heap table that gets swapped to disk by the kernel will be much slower than other table types. Heap tables are great for time-sensitive data such as real-time event queuing as long as you don't mind losing the data during a restart. Heap tables also allow you to use MySQL as a general in-memory cache. Using MySQL rather than memcached will get you worse performance and allow a smaller dataset (since MySQL will be using a lot of memory itself) but gains you a drop in general complexity (assuming you're already using MySQL elsewhere in your application) and adds SQL semantics at no extra cost.