|< Day Day Up >|
You can't examine the more advanced features of a database system for very long before transactions enter the mix. A transaction is a group of SQL queries that are treated atomically, as a single unit of work. Either the entire group of queries is applied to a database, or none of them are. Little of this section is specific to MySQL. If you're already familiar with ACID transactions, feel free to skip ahead to the section "Transactions in MySQL."
A banking application is the classic example of why transactions are necessary. Imagine a bank's database with a two tables: checking and savings. To move $200 from Jane's checking account to her savings account, you need to perform at least three steps:
The entire operation should be wrapped in a transaction so that if any one of the steps fails, they can all be rolled back.
BEGIN; [step 1] SELECT balance FROM checking WHERE customer_id = 10233276; [step 2] UPDATE checking SET balance = balance - 200.00 WHERE customer_id = 10233276; [step 3] UPDATE savings SET balance = balance + 200.00 WHERE customer_id = 10233276; COMMIT;
But transactions alone aren't the whole story. What happens if the database server crashes while performing step 3? Who knows? The customer probably just lost $200. What if another process comes along between Steps 2 and 3 and removes the entire checking account balance? The bank has given the customer a $200 credit without even knowing it.
Simply having transactions isn't sufficient unless the database server passes what is known as the ACID test. ACID is an acronym for Atomicity, Consistency, Isolation, and Durability—four tightly related criteria that are required in a well-behaved transaction processing system. Transactions that meet those four criteria are often referred to as ACID transactions.
2.3.1 Benefits and Drawbacks
ACID transactions ensure that banks don't lose your money. By wrapping arbitrarily complex logic into single units of work, the database server takes some of the burden off application developers. The database server's ACID properties offer guarantees that reduce the need for code guarding against race conditions and handling crash recovery.
The downside of this extra security is that the database server has to do more work. It also means that a database server with ACID transactions will generally require more CPU power, memory, and disk space than one without them. As mentioned earlier, this is where MySQL's modularity comes into play. Because you can decide on a per-table basis if you need ACID transactions or not, you don't need to pay the performance penalty on a table that really won't benefit from transactions.
2.3.2 Isolation Levels
The previous description of isolation was a bit simplistic. Isolation is more complex than it might first appear because of some peculiar cases that can occur. The SQL standard defines four isolation levels with specific rules for which changes are and aren't visible inside and outside a transaction. Let's look at each isolation level and the type of problems that can occur.
220.127.116.11 Read uncommitted
In the read uncommitted isolation level, transactions can view the results of uncommitted transactions. At this level, many problems can occur unless you really, really know what you are doing and have a good reason for doing it. Read uncommitted is rarely used in practice. Reading uncommitted data is also known as a dirty read.
18.104.22.168 Read committed
The default isolation level for most database systems is read committed. It satisfies the simple definition of isolation used earlier. A transaction will see the results only of transactions that were already committed when it began, and its changes won't be visible to others until it's committed.
Imagine you have a Perl script that runs nightly to fetch price data about your favorite stocks. For each stock, it fetches the data and adds a record to the StockPrice table with the day's numbers. So to update the information for Amazon.com, the transaction might look like this:
BEGIN; SELECT @id := id FROM Stock WHERE ticker = 'AMZN'; INSERT INTO StockPrice VALUES (@id, '2002-05-03', 20.50, 21.10, 20.08, 21.02); COMMIT;
But what if, between the select and insert, Amazon's id changes from 4 to 17 and a new stock is added with id 4? Or what if Amazon is removed entirely? You'll end up inserting a record with the wrong id in the first case. And in the second case, you've inserted a record for which there is no longer a corresponding row in the Stock table. Neither of these is what you intended.
The problem is that you have a nonrepeatable read in the query. That is, the data you read in the SELECT becomes invalid by the time you execute the INSERT. The repeatable read isolation level exists to solve this problem.
22.214.171.124 Repeatable read
At the repeatable read isolation level, any rows that are read during a transaction are locked so that they can't be changed until the transaction finishes. This provides the perfect solution to the problem mentioned in the previous section, in which Amazon's id can change or vanish entirely. However, this isolation level still leaves the door open to another tricky problem: phantom reads.
Using the same data, imagine that you have a script that performs some analysis based on the data in the StockPrice table. And let's assume it does this while the nightly update is also running.
The analysis script does something like this:
BEGIN; SELECT * FROM StockPrice WHERE close BETWEEN 10 and 20; // think for a bit SELECT * FROM StockPrice WHERE close BETWEEN 10 and 20; COMMIT;
But the nightly update script inserts between those two queries new rows that happen to match the close BETWEEN 10 and 20 condition. The second query will find more rows that the first one! These additional rows are known as phantom rows (or simply phantoms). They weren't locked the first time because they didn't exist when the query ran.
Having said all that, we need to point out that this is a bit more academic than you might think. Phantom rows are such a common problem that InnoDB's locking (known as next-key locking) prevents this from happening. Rather than locking only the rows you've touched in a query, InnoDB actually locks the slot following them in the index structure as well.
The highest level of isolation, serializable, solves the phantom read problem by ordering transactions so that they can't conflict. At this level, a lot of timeouts and lock contention may occur, but the needs of your application may bring you to accept the decreased performance in favor of the data stability that results.
Table 2-4 summarizes the various isolation levels and the drawbacks associated with each one. Keep in mind that as you move down the list, you're sacrificing concurrency and performance for increased safety.
For example, consider these two transactions running against the StockPrice table:
BEGIN; UPDATE StockPrice SET close = 45.50 WHERE stock_id = 4 and date = '2002-05-01'; UPDATE StockPrice SET close = 19.80 WHERE stock_id = 3 and date = '2002-05-02'; COMMIT;
BEGIN; UPDATE StockPrice SET high = 20.12 WHERE stock_id = 3 and date = '2002-05-02'; UPDATE StockPrice SET high = 47.20 WHERE stock_id = 4 and date = '2002-05-01'; COMMIT;
If you're unlucky, each transaction will execute its first query and update a row of data, locking it in the process. Each transaction will then attempt to update its second row only to find that it is already locked. Left unchecked, the two transactions will wait for each other to complete—forever.
To combat this problem, database systems implement various forms of deadlock detection and timeouts. The more sophisticated systems, such as InnoDB, will notice circular dependencies like the previous example and return an error. Others will give up after the query exceeds a timeout while waiting for a lock. InnoDB's default timeout is 50 seconds. In either case, applications that use transactions need to be able to handle deadlocks and possibly retry transactions.
2.3.4 Transaction Logging
Some of the overhead involved with transactions can be mitigated through the use of a transaction log. Rather than directly updating the tables on disk each time a change occurs, the system can update the in-memory copy of the data (which is very fast) and write a record of the change to a transaction log on disk. Then, at some later time, a process (or thread) can actually apply the changes that the transaction log recorded. The serial disk I/O required to append events to the log is much faster than the random seeks required to update data in various places on disk.
As long as events are written to the transaction log before a transaction is considered committed, having the changes in a log will not affect the durability of the system. If the database server crashes before all changes have been applied from the transaction log, the database will continue applying changes from the transaction log when it is restarted and before it accepts new connections.
2.3.5 Transactions in MySQL
MySQL provides two transaction-safe storage engines: Berkeley DB (BDB) and InnoDB. Their specific properties are discussed in next section. Each one offers the basic BEGIN/COMMIT/ROLLBACK functionality. They differ in their supported isolation levels, locking characteristics, deadlock detection, and other features.
By default MySQL operates in AUTOCOMMIT mode. This means that unless you've explicitly begun a transaction, it automatically executes each query in a separate transaction. You can enable AUTOCOMMIT for the current connection by running:
SET AUTOCOMMIT = 1;
Disable it by executing:
SET AUTOCOMMIT = 0;
Changing the value of AUTOCOMMIT has no effect on non-transaction-safe tables such as MyISAM or HEAP.
126.96.36.199 Implicit commits
Certain commands, when issued during an open transaction, cause MySQL to commit the transaction before they execute. Typically these are commands that make significant changes, such as removing or renaming a table.
Here is the list of commands for which MySQL implicitly commits a transaction:
As additional features are added to MySQL, it is possible that other commands will be added to the list, so be sure to check the latest available documentation.
188.8.131.52 Isolation levels
To set the level for the whole session (connection), use:
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED
Here's how to set the global level:
SET GLOBAL TRANSACTION ISOLATION LEVEL SERIALIZABLE
The default isolation level can also be set using the --transaction-isolation command-line option when starting the server or set via my.cnf.
184.108.40.206 Mixing storage engines in transactions
Transaction management in MySQL is currently handled by the underlying storage engines, not at a higher level. Thus, you can't reliably mix tables stored in transactional engines (such as InnoDB and BDB) in a single transaction. A higher-level transaction management service may someday be added to MySQL, making it safe to mix and match transaction-safe tables in a transaction. Until then, don't expect it to work.
If you mix transaction-safe and non-transaction-safe tables (such as InnoDB and MyISAM) in a transaction, the transaction will work properly if all goes well. However, if a rollback is required, the changes to the non-transaction-safe table won't be undone. This leaves the database in an inconsistent state that may be difficult to recover from (and renders the entire point of transactions moot).
220.127.116.11 Simulating transactions
At times you may need the behavior of transactions when you aren't using a transaction-safe table. You can achieve something like transactions using MySQL's LOCK TABLES and UNLOCK TABLES commands. If you lock the tables that will be involved in the transaction and keep track of any changes that you make (in case you need to simulate a rollback), you'll have something equivalent to running at the serializable isolation level. But the process is kludgy and error prone, so if you really need transactions, we recommend using a transactional storage engine.
|< Day Day Up >|