2.3 Transactions

You can't examine the more advanced features of a database system for very long before transactions enter the mix. A transaction is a group of SQL queries that are treated atomically, as a single unit of work. Either the entire group of queries is applied to a database, or none of them are. Little of this section is specific to MySQL. If you're already familiar with ACID transactions, feel free to skip ahead to the section "Transactions in MySQL."

A banking application is the classic example of why transactions are necessary. Imagine a bank's database with a two tables: checking and savings. To move $200 from Jane's checking account to her savings account, you need to perform at least three steps:

Make sure her checking account balance is greater than $200.
Subtract $200 from her checking account balance.
Add $200 to her savings account balance.

The entire operation should be wrapped in a transaction so that if any one of the steps fails, they can all be rolled back.

A transaction is initiated (or opened) with the BEGIN statement and applied with COMMIT or rolled back (undone) with ROLLBACK. So the SQL for the transaction might look like this:

         BEGIN;

[step 1] SELECT balance FROM checking WHERE customer_id = 10233276;

[step 2] UPDATE checking SET balance = balance - 200.00 WHERE customer_id = 10233276;

[step 3] UPDATE savings  SET balance = balance + 200.00 WHERE customer_id = 10233276;

         COMMIT;

But transactions alone aren't the whole story. What happens if the database server crashes while performing step 3? Who knows? The customer probably just lost $200. What if another process comes along between Steps 2 and 3 and removes the entire checking account balance? The bank has given the customer a $200 credit without even knowing it.

Simply having transactions isn't sufficient unless the database server passes what is known as the ACID test. ACID is an acronym for Atomicity, Consistency, Isolation, and Durability—four tightly related criteria that are required in a well-behaved transaction processing system. Transactions that meet those four criteria are often referred to as ACID transactions.

Atomicity: Transactions must function as a single indivisible unit of work. The entire transaction is either applied or rolled back. When transactions are atomic, there is no such thing as a partially completed transaction: it's all or nothing.
Consistency: The database should always move from one consistent state to the next. Consistency ensures that a crash between Steps 2 and 3 doesn't result in $200 missing from the checking account. Because the transaction is never committed, none of the transaction's changes are ever reflected in the database.
Isolation: The results of a transaction are usually invisible to other transactions until the transaction is complete. This ensures that if a bank account summary runs after Step 2, but before Step 3, it still sees the $200 in the checking account. When we discuss isolation levels, you'll understand why we said usually invisible.
Durability: Once committed, the results of a transaction are permanent. This means that the changes must be recorded in such a way that system crashes won't lose the data. Of course, if the database server's disks fail, all bets are off. That's a hardware problem. We'll talk more about how you can minimize the effects of hardware failures in Chapter 6.

2.3.1 Benefits and Drawbacks

ACID transactions ensure that banks don't lose your money. By wrapping arbitrarily complex logic into single units of work, the database server takes some of the burden off application developers. The database server's ACID properties offer guarantees that reduce the need for code guarding against race conditions and handling crash recovery.

The downside of this extra security is that the database server has to do more work. It also means that a database server with ACID transactions will generally require more CPU power, memory, and disk space than one without them. As mentioned earlier, this is where MySQL's modularity comes into play. Because you can decide on a per-table basis if you need ACID transactions or not, you don't need to pay the performance penalty on a table that really won't benefit from transactions.

2.3.2 Isolation Levels

The previous description of isolation was a bit simplistic. Isolation is more complex than it might first appear because of some peculiar cases that can occur. The SQL standard defines four isolation levels with specific rules for which changes are and aren't visible inside and outside a transaction. Let's look at each isolation level and the type of problems that can occur.

2.3.2.1 Read uncommitted

In the read uncommitted isolation level, transactions can view the results of uncommitted transactions. At this level, many problems can occur unless you really, really know what you are doing and have a good reason for doing it. Read uncommitted is rarely used in practice. Reading uncommitted data is also known as a dirty read.

2.3.2.2 Read committed

The default isolation level for most database systems is read committed. It satisfies the simple definition of isolation used earlier. A transaction will see the results only of transactions that were already committed when it began, and its changes won't be visible to others until it's committed.

However, there are problems that can occur using that definition. To visualize the problems, refer to the sample data for the Stock and StockPrice tables as shown in Table 2-2 and Table 2-3.

Table 2-2. The Stock table

id

Ticker

Name

1

MSFT

Microsoft

2

EBAY

eBay

3

YHOO

Yahoo!

4

AMZN

Amazon

Table 2-3. The StockPrice table

stock_id

date

open

high

low

close

1

2002-05-01

21.25

22.30

20.18

21.30

2

2002-05-01

10.01

10.20

10.01

10.18

3

2002-05-01

18.23

19.12

18.10

19.00

4

2002-05-01

45.55

46.99

44.87

45.71

1

2002-05-02

21.30

21.45

20.02

20.21

2

2002-05-02

10.18

10.55

10.10

10.35

3

2002-05-02

19.01

19.88

19.01

19.22

4

2002-05-02

45.69

45.69

44.03

44.30

Imagine you have a Perl script that runs nightly to fetch price data about your favorite stocks. For each stock, it fetches the data and adds a record to the StockPrice table with the day's numbers. So to update the information for Amazon.com, the transaction might look like this:

BEGIN;

SELECT @id := id FROM Stock WHERE ticker = 'AMZN';

INSERT INTO StockPrice VALUES (@id, '2002-05-03', 20.50, 21.10, 20.08, 21.02);

COMMIT;

But what if, between the select and insert, Amazon's id changes from 4 to 17 and a new stock is added with id 4? Or what if Amazon is removed entirely? You'll end up inserting a record with the wrong id in the first case. And in the second case, you've inserted a record for which there is no longer a corresponding row in the Stock table. Neither of these is what you intended.

The problem is that you have a nonrepeatable read in the query. That is, the data you read in the SELECT becomes invalid by the time you execute the INSERT. The repeatable read isolation level exists to solve this problem.

2.3.2.3 Repeatable read

At the repeatable read isolation level, any rows that are read during a transaction are locked so that they can't be changed until the transaction finishes. This provides the perfect solution to the problem mentioned in the previous section, in which Amazon's id can change or vanish entirely. However, this isolation level still leaves the door open to another tricky problem: phantom reads.

Using the same data, imagine that you have a script that performs some analysis based on the data in the StockPrice table. And let's assume it does this while the nightly update is also running.

The analysis script does something like this:

BEGIN;

SELECT * FROM StockPrice WHERE close BETWEEN 10 and 20;

// think for a bit

SELECT * FROM StockPrice WHERE close BETWEEN 10 and 20;

COMMIT;

But the nightly update script inserts between those two queries new rows that happen to match the close BETWEEN 10 and 20 condition. The second query will find more rows that the first one! These additional rows are known as phantom rows (or simply phantoms). They weren't locked the first time because they didn't exist when the query ran.

Having said all that, we need to point out that this is a bit more academic than you might think. Phantom rows are such a common problem that InnoDB's locking (known as next-key locking) prevents this from happening. Rather than locking only the rows you've touched in a query, InnoDB actually locks the slot following them in the index structure as well.

2.3.2.4 Serializable

The highest level of isolation, serializable, solves the phantom read problem by ordering transactions so that they can't conflict. At this level, a lot of timeouts and lock contention may occur, but the needs of your application may bring you to accept the decreased performance in favor of the data stability that results.

Table 2-4 summarizes the various isolation levels and the drawbacks associated with each one. Keep in mind that as you move down the list, you're sacrificing concurrency and performance for increased safety.

Table 2-4. ANSI SQL isolation levels

Isolation level

Dirty reads possible

Non-repeatable reads possible

Phantom reads possible

Read uncommitted

Yes

Yes

Yes

Read committed

No

Yes

Yes

Repeatable read

No

No

Yes

Serializable

No

No

No

2.3.3 Deadlocks

Whenever multiple transactions obtain locks, there is the danger of encountering a deadlock condition. Deadlocks occur when two transactions attempt to obtain conflicting locks in a different order.

For example, consider these two transactions running against the StockPrice table:

Transaction #1:

BEGIN;

UPDATE StockPrice SET close = 45.50 WHERE stock_id = 4 and date = '2002-05-01';

UPDATE StockPrice SET close = 19.80 WHERE stock_id = 3 and date = '2002-05-02';

COMMIT;

Transaction #2:

BEGIN;

UPDATE StockPrice SET high  = 20.12 WHERE stock_id = 3 and date = '2002-05-02';

UPDATE StockPrice SET high  = 47.20 WHERE stock_id = 4 and date = '2002-05-01';

COMMIT;

If you're unlucky, each transaction will execute its first query and update a row of data, locking it in the process. Each transaction will then attempt to update its second row only to find that it is already locked. Left unchecked, the two transactions will wait for each other to complete—forever.

To combat this problem, database systems implement various forms of deadlock detection and timeouts. The more sophisticated systems, such as InnoDB, will notice circular dependencies like the previous example and return an error. Others will give up after the query exceeds a timeout while waiting for a lock. InnoDB's default timeout is 50 seconds. In either case, applications that use transactions need to be able to handle deadlocks and possibly retry transactions.

2.3.4 Transaction Logging

Some of the overhead involved with transactions can be mitigated through the use of a transaction log. Rather than directly updating the tables on disk each time a change occurs, the system can update the in-memory copy of the data (which is very fast) and write a record of the change to a transaction log on disk. Then, at some later time, a process (or thread) can actually apply the changes that the transaction log recorded. The serial disk I/O required to append events to the log is much faster than the random seeks required to update data in various places on disk.

As long as events are written to the transaction log before a transaction is considered committed, having the changes in a log will not affect the durability of the system. If the database server crashes before all changes have been applied from the transaction log, the database will continue applying changes from the transaction log when it is restarted and before it accepts new connections.

2.3.5 Transactions in MySQL

MySQL provides two transaction-safe storage engines: Berkeley DB (BDB) and InnoDB. Their specific properties are discussed in next section. Each one offers the basic BEGIN/COMMIT/ROLLBACK functionality. They differ in their supported isolation levels, locking characteristics, deadlock detection, and other features.

2.3.5.1 AUTOCOMMIT

By default MySQL operates in AUTOCOMMIT mode. This means that unless you've explicitly begun a transaction, it automatically executes each query in a separate transaction. You can enable AUTOCOMMIT for the current connection by running:

SET AUTOCOMMIT = 1;

Disable it by executing:

SET AUTOCOMMIT = 0;

Changing the value of AUTOCOMMIT has no effect on non-transaction-safe tables such as MyISAM or HEAP.

2.3.5.2 Implicit commits

Certain commands, when issued during an open transaction, cause MySQL to commit the transaction before they execute. Typically these are commands that make significant changes, such as removing or renaming a table.

Here is the list of commands for which MySQL implicitly commits a transaction:

ALTER TABLE
BEGIN
CREATE INDEX
DROP DATABASE
DROP TABLE
RENAME TABLE
TRUNCATE
LOCK TABLES
UNLOCK TABLES

As additional features are added to MySQL, it is possible that other commands will be added to the list, so be sure to check the latest available documentation.

2.3.5.3 Isolation levels

MySQL allows you to set the isolation level using the SET TRANSACTION ISOLATION LEVEL command. Unless otherwise specified, the isolation level is changed beginning with the next transaction.

To set the level for the whole session (connection), use:

SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED

Here's how to set the global level:

SET GLOBAL TRANSACTION ISOLATION LEVEL SERIALIZABLE

MySQL recognizes all four ANSI standard isolation levels, and as of Version 4.0.5 of MySQL, InnoDB supports all of them:

READ UNCOMMITTED
READ COMMITTED
REPEATABLE READ
SERIALIZABLE

The default isolation level can also be set using the --transaction-isolation command-line option when starting the server or set via my.cnf.

2.3.5.4 Mixing storage engines in transactions

Transaction management in MySQL is currently handled by the underlying storage engines, not at a higher level. Thus, you can't reliably mix tables stored in transactional engines (such as InnoDB and BDB) in a single transaction. A higher-level transaction management service may someday be added to MySQL, making it safe to mix and match transaction-safe tables in a transaction. Until then, don't expect it to work.

If you mix transaction-safe and non-transaction-safe tables (such as InnoDB and MyISAM) in a transaction, the transaction will work properly if all goes well. However, if a rollback is required, the changes to the non-transaction-safe table won't be undone. This leaves the database in an inconsistent state that may be difficult to recover from (and renders the entire point of transactions moot).

2.3.5.5 Simulating transactions

At times you may need the behavior of transactions when you aren't using a transaction-safe table. You can achieve something like transactions using MySQL's LOCK TABLES and UNLOCK TABLES commands. If you lock the tables that will be involved in the transaction and keep track of any changes that you make (in case you need to simulate a rollback), you'll have something equivalent to running at the serializable isolation level. But the process is kludgy and error prone, so if you really need transactions, we recommend using a transactional storage engine.

< Day Day Up >