Previous Section  < Day Day Up >  Next Section

9.2 Considerations and Tradeoffs

We considered calling this section "Things You Really Need To Think About" because backing up a running database is more complex than it may first appear to be. This isn't because backups are inherently difficult; it's because MySQL is a bit more complex that you might think.

When it comes to actually performing the backups, you can script the process yourself, use one of the prebuilt tools, or both. It all depends on your needs. In this section, we'll examine the major decisions you'll need to make and how they influence the backup techniques you can use. Then in the next section we'll look at the most popular tools.

9.2.1 Dump or Raw Backup?

One of the first decisions to make is the format of the backups you'd like to create. The result of a database dump is one or more files that contain the SQL statements (mostly INSERT and CREATE TABLE) necessary to re-create the data. Dumps are produced using mysqldump, described in more detail in Section 9.3, later in this chapter. You can perform dumps over the network so that your backups are created on a host other than your database server. It's possible to produce dumps of any MySQL table type.

Having the contents of the tables as SQL files provides a lot of flexibility. If you simply need to look for a few records, you can load the file in your favorite editor or use a tool such as grep or less to locate the data. The dumped data is quite readable.

Restoring a dump is easy. Because the dump file contains all the necessary information to re-create the table, you simply need to feed that file back into the mysql command-line tool. And if you need to restore only some of the records, you can directly edit the file directly or write a script to prefilter out the records you don't need. Raw backups don't provide this flexibility. You can't easily filter out records from a table when using a raw backup; you can operate only on whole tables.

There are some downsides to using dumps. A dump file consumes far more disk space than the table or database it represents. Not only are there a lot of INSERT statements in the file, all numeric data (which MySQL stores quite efficiently) becomes ASCII, using quite a bit more space. Dumps are more CPU-intensive to produce, so they'll take longer than other methods. Dump files compress rather well using tools such as gzip or bzip2. Also, reloading a dump requires that MySQL spend considerable CPU time to rebuild all the indexes.

Because there's often a fair amount of unused space and overhead in InnoDB's data files, you'll find that InnoDB tables often take far less space that you might expect when backed up.

While dumps have a lot of advantages, the extra space, time, and CPU power they require are often not worth expending—especially as your databases get larger and larger. It's more efficient to use a raw backup technique rather than using dumps. A raw backup is a direct copy of MySQL's data files as they exist on disk. Because the records aren't converted from their native format to ASCII, raw backups are much faster and more efficient than dumps. For ISAM and MyISAM tables, this means copying the data, index, and table definition files. For BDB and InnoDB tables, it also involves preserving the transaction logs and the data.

Both mysqlhotcopy and mysqlsnapshot, which we describe in some detail later, can be used to produce raw backups of ISAM and MyISAM tables. They do so by locking and flushing the tables before copying the underlying files. The tables may not be written to during the backup process. The InnoDB Hot Backup tool, also discussed later in this chapter, provides a raw backup of your InnoDB data without the need for downtime or locking. There is no equivalent tool for BDB tables.

Raw backups are most often used to back up a live server. To get a consistent backup, ISAM and MyISAM tables need to be locked so that no changes can occur until the backup completes. InnoDB tables have no such restriction.

Restoring a raw backup is relatively easy. For ISAM and MyISAM tables, you simply put the data files in MySQL's data directory. Unless you're using InnoDB's multiple-tablespace support in Version 4.1 or newer, InnoDB tables can't be restored individually from a raw backup because they are stored in shared tablespace files rather than individually. Instead, you'll need to shut down MySQL and restore the tablespace files.

If you have the luxury of shutting down MySQL to perform backups, the backup and restore processes can be greatly simplified. In fact, that's the next decision to consider.

9.2.2 Online or Offline?

Being able to shut down MySQL during backups means not having to worry about consistency problems (discussed in the next section), locking out changes from live applications, or degrading server performance. A nonrunning MySQL instance can be backed up using standard backup software. There's no danger of files changing. If MySQL isn't running, the backup process will likely be faster too; it won't be competing with MySQL for I/O and CPU cycles.

If you're planning to shut down MySQL during backups, make sure that your backup software is configured to back up all of the MySQL-related data. Ideally, you'd back up the entire system, but there may be cases when that isn't feasible. Large MySQL installations often span several filesystems. The binaries may be in one place, config files in another, and the data files elsewhere. Having them on different backup schedules could leave you with a difficult problem if you need to restore just after a major upgrade. The config files may not match the data file locations, for example.

9.2.3 Table Types and Consistency

Maintaining consistency is one of the most tricky and often overlooked issues in database backups. You need to ensure that you're getting a consistent snapshot of your data. Doing so requires an understanding of the types of tables you need to back up and how MySQL handles them.

If you're using MyISAM tables, simply making copies of the various data files isn't sufficient. You must guarantee that all changes have been flushed to disk and that MySQL won't be making changes to any of the tables during the backup process. The obvious solution is to obtain a read lock on each table before it is backed up. That will prevent anyone from making changes to the table while still allowing them to read from it.

That technique works well for a single table, but in a relational database, tables are often related to each other. Records inserted into one table depend on those in another. If that's not accounted for, you can end up with an inconsistent backup—records may exist in one table but have no counterparts in another. It all depends on the order in which the tables were copied and the likelihood that changes were made to one while the other was backed up.

So a good backup program needs to lock groups of related tables before they are copied. Rather than deal with that complexity, the popular solutions for MySQL give you the option of either locking all tables and keeping them locked until the backup is done, or locking and backing up tables one at a time.[3] If neither option appeals to you, there's a good chance that you need to script your own solution. See Section 9.4, later in this chapter, for details.

[3] Ideally, we'd have the option to unlock each table selectively after it is copied, but MySQL doesn't allow that yet.

9.2.4 Storage Requirements

The amount of space required to store backups must factor into the decision-making process. How much room does your backup media have? Tape, CD, DVD, and hard disks all have capacity limits, costs, and lifetimes.[4]

[4] But hard disks seem to be growing in capacity without bound. It shouldn't be long before you can buy a tera-byte hard disk.

After you've determined how much space you can afford and manage effectively, you need to consider how frequently you really need to perform backups. Do you need to back up all your data every day? Can you get by with backing up only your most active tables or databases daily and performing a full backup on the weekend? That will save a lot of space if much of your data changes infrequently.

When dealing with backups, it's a good idea to consider compression. If you're backing up to a tape drive with hardware compression, it's handled for you automatically. Otherwise, you can choose any compression scheme you'd like. Most dump files and raw backups compress rather well. However, if a lot of your data is already compressed (either compressed MyISAM tables or tables with BLOB fields that contain compressed data), there will be little benefit in further compression attempts.

If you have more than a few compressed MyISAM tables, not only should you avoid trying to compress them further, but you should also consider backing them up less frequently. Compressed MyISAM tables are read-only; by definition, they don't change often. You'd have to uncompress the table, make changes, and recompress it. That's rare.

The final issue to think about is retention. How long do you need to keep backups around? Rather than simply throwing out backups when you begin to run out of space, it's best to plan ahead. By taking into account the amount of data you must back up, the amount of space you need, and how long you want to keep data around, you won't run into surprises.

If you find yourself running out of space, consider staggering the backups that you do save. Rather than always deleting the oldest backups, you can use an alternative approach such as removing backups that fall on odd-numbered days. That would allow you to double the age of your oldest backup.

9.2.5 Replication

If you're using MySQL's replication features (described in Chapter 7), you can be a lot more flexible in your approach to backups. In fact, you may want to set up a slave just to simplify backups.

By performing backups on a slave, you eliminate the need ever to interrupt systems that may need to make changes on the master. In a 24 x 7 x 365 operation, this is an excellent way to ensure that you always have a copy of your data on another machine (this method is commonly used at Yahoo!). And since you can switch to the slave if the master dies, it significantly reduces the downtime when something does go wrong.

When backing up a slave, it's important always to save the replication files as well. That includes the master.info file, relay logs, relay index, and so on. Without them, you can't easily restore a slave that has suffered a failure. The files contain information about where the slave left off in the replication process. See Chapter 7 for more information.

    Previous Section  < Day Day Up >  Next Section