|< Day Day Up >|
Nobody likes to lose data. And since disks eventually die, often with little warning, it's wise to consider setting up a RAID (Redundant Array of Inexpensive Disks) array on your database servers to prevent a disk failure from causing unplanned downtime and data loss. But there are many different types of RAID to consider: RAID 0, 1, 0+1, 5, and 10. And what about hardware RAID versus software RAID?
From a performance standpoint, some options are better than others. The faster ones will sacrifice something to gain that performance—usually price or durability. In all cases, the more disks you have, the better performance you'll get. Let's consider the benefits and drawbacks of each RAID option.
Table 6-1 summarizes various RAID features.
6.2.1 Mix and Match
When deciding how to configure your disks, consider the possibility of multiple RAID arrays. RAID controllers aren't that expensive, so you might benefit from using RAID 5 or RAID 10 for your databases and a separate RAID 1 array for your transaction and replication logs. Some multichannel controllers can manage multiple arrays, and some can even bind several channel controllers together into a single controller to support more disks.
Doing this isolates most of the serial disk I/O from most of the random, seek-intensive I/O. This is because transaction and replication logs are usually large files that are read from and written to in a serial manner, usually by a small number of threads. So it's not necessary to have a lot of spindles available to spread the seeks across. What's important is having sufficient bandwidth, and virtually any modern pair of disks can fill that role nicely. Meanwhile, the actual data and indexes are being read from and written to by many threads simultaneously in a fairly random manner. Having the extra spindles associated with RAID 10 will boost performance. Or, if you simply have too much data to fit on a single disk, RAID 5's ability to create large volumes works to your advantage.
126.96.36.199 Sample configuration
To make this more concrete, let's see what such a setup might look like with both InnoDB and MyISAM tables. It's entirely possible to move most of the files around and leave symlinks in the original locations (at least on Unix-based systems), but that can be a bit messy, and it's too easy to accidentally remove a symlink (or accidentally back up symlinks instead of actual data!). Instead, you can adjust the my.cnf file to put files where they belong.
Let's assume you have a RAID 1 volume on which the following filesystems are mounted: /, /usr, and swap. You also have a RAID 5 (or RAID 10) filesystem mounted as /data. On this particular server, MySQL was installed from a binary tarball into /usr/local/mysql, making /usr/local/mysql/data the default data directory.
The goal is to keep the InnoDB logs and replication logs on the RAID-1 volume, while moving everything else to /data. These my.cnf entries can accomplish that:
datadir = /data/myisam log-bin = /usr/local/mysql/data/repl/bin-log innodb_data_file_path = ibdata1:16386M;ibdata2:16385M innodb_data_home_dir = /data/ibdata innodb_log_group_home_dir = /usr/local/mysql/data/iblog innodb_log_arch_dir = /usr/local/mysql/data/iblog
These entries provide two top-level directories in /data for MySQL's data files: ibdata for the InnoDB data and myisam for the MyISAM files. All the logs remain in or below /usr/local/mysql/data on the RAID 1 volume.
6.2.2 Hardware Versus Software
Some operating systems can perform software RAID. Rather than buying a dedicated RAID controller, the operating system's kernel splits the I/O among multiple disks. Many users shy away from using these features because they've long been considered slow or buggy.
In reality, software RAID is quite stable and performs rather well. The performance differences between hardware and software RAID tend not to be significant until they're under quite a bit of load. For smaller and medium-sized workloads, there's little discernible difference between them. Yes, the server's CPU must do a bit more work when using software RAID, but modern CPUs are so fast that the RAID operations consume a small fraction of the available CPU time. And, as we stressed earlier, the CPU is usually not the bottleneck in a database server anyway.
Even with software RAID, you can use multiple disk controllers to achieve redundancy at the hardware level without actually paying for a RAID controller. In fact, some would argue that having two non-RAID controllers is better than a single RAID controller. You'll have twice the available I/O bandwidth and have eliminated a single point of failure if you use RAID 1 or 10 across them.
Having said that, there is one thing that can be done with hardware RAID that simply can't be done in software: write caching. Many RAID controllers can add battery-backed RAM that caches reads and writes. Since there's a battery on the card, you don't need to worry about lost writes even when the power fails. If it does, the data stays in memory on the controller until the machine is powered back up. Most hardware RAID controllers can also read cache as well.
6.2.3 IDE or SCSI?
It's a perpetual question: do you use IDE or SCSI disks for your server? A few years ago, the answer was easy: SCSI. But the issue is further muddied by the availability of faster IDE bus speeds and IDE RAID controllers from 3Ware and other vendors. For our purposes, Serial-ATA is the same as IDE.
The traditional view is that SCSI is better than IDE in servers. While many people dismiss this argument, there's real merit to it when dealing with database servers. IDE disks handle requests in a sequential manner. If the CPU asks the disk to read four blocks from an inside track, followed by eight blocks from an outside track, then two more blocks from an inside track, the disk will do exactly what it's told; even if it's not the most efficient way to read all that data. SCSI disks have a feature known as Tagged Command Queuing (TCQ). TCQ allows the CPU to send several read/write requests to the disk at the same time. The disk controller then tries to find the optimal read/write pattern to minimize seeks.
IDE also suffers from scaling problems; you can't use more than one drive per IDE channel without suffering a severe performance hit. Because most motherboards offer only four IDE channels at most, you're stuck with only four disks unless you add an additional controller. Worse yet, IDE has rather restrictive cable limits. With SCSI, you can typically add 7 or 14 disks before purchasing a new controller. Furthermore, the constant downward price pressure on hard disks has affected SCSI as much as IDE.
On the other hand, SCSI disks still cost more than their IDE counterparts. When you're considering four or more disks, the price difference is significant enough that you might be able to purchase IDE disks and be able to afford another controller, possibly even an IDE RAID controller. Many MySQL users are quite happy using 3Ware IDE RAID controllers with 4-12 disks on them. It costs less than a SCSI option, and the performance is reasonably close to that of a high-end SCSI RAID controller.
6.2.4 RAID on Slaves
As we mentioned in the discussion of RAID 0, if you're using replication to create a cluster of slaves for your application, it's likely that you can save money on the slaves by using a different form of RAID. That means using a higher-performance configuration that doesn't provide redundancy (RAID 0), using fewer disks (RAID 5 instead of RAID 10), or using software rather than hardware RAID, for example. If you have enough slaves, you may not necessarily need the redundancy on the slaves. In the event that one slave suffers the loss of a disk, you can always synchronize it with another nearby slave to get it started again.
|< Day Day Up >|