Planning Your SVM Configuration
RAID Solutions You might get an exam question that describes an application and then asks which RAID solution would be best suited for it. For example, a financial application with mission-critical data would require mirroring to provide the best protection for the data, whereas a video editing application would require striping for the pure performance gain. Make sure you are familiar with the pros and cons of each RAID solution.
Using SVM, you can utilize volumes to provide increased capacity, higher availability, and better performance. In addition, the hot spare capability provided by SVM can provide another level of data availability for mirrors and RAID 5 volumes. Hot spares were described earlier in this chapter.
After you have set up your configuration, you can use Solaris utilities such as iostat, metastat, and metadb to report on its operation. The iostat utility is used to provide information on disk usage and will show you which metadevices are being heavily utilized, while the metastat and metadb utilities provide status information on the metadevices and state databases, respectively. As an example, the output shown below provides information from the metastat utility whilst two mirror metadevices are being synchronized:
# metastat -i d60: Mirror Submirror 0: d61 State: Okay Submirror 1: d62 State: Resyncing Resync in progress: 15 % done Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 10462032 blocks (5.0 GB)d61: Submirror of d60 State: Okay Size: 10462032 blocks (5.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c0t3d0s4 0 No Okay Yes d62: Submirror of d60 State: Resyncing Size: 10462032 blocks (5.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c0t1d0s5 0 No Okay Yes d50: Mirror Submirror 0: d51 State: Okay Submirror 1: d52 State: Resyncing Resync in progress: 26 % done Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 4195296 blocks (2.0 GB) d51: Submirror of d50 State: Okay Size: 4195296 blocks (2.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c0t3d0s3 0 No Okay Yes d52: Submirror of d50 State: Resyncing Size: 4195296 blocks (2.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c0t1d0s4 0 No Okay Yes Device Relocation Information: Device Reloc Device ID c0t1d0 Yes id1,dad@ASAMSUNG_SP0411N=S01JJ60X901935 c0t0d0 Yes id1,dad@AWDC_AC310200R=WD-WT6750311269 #
Notice from the preceding output that there are two mirror metadevices, each containing two submirror component metadevicesd60 contains submirrors d61 and d62, and d50 contains submirrors d51 and d52. It can be seen that the metadevices d52 and d62 are in the process of resynchronization. Use of this utility is important as there could be a noticeable degradation of service during the resynchronization operation on these volumes, which can be closely monitored as metastat also displays the progress of the operation, in percentage complete terms. Further information on these utilities is available from the online manual pages.
You can also use SVM's Simple Network Management Protocol (SNMP) trap generating daemon to work with a network monitoring console to automatically receive SVM error messages. Configure SVM's SNMP trap to trap the following instances:
The system administrator is now able to receive, and monitor, messages from SVM when an error condition or notable event occurs. All operations that affect SVM volumes are managed by the metadisk driver, which is described in the next section.
The metadisk driver, the driver used to manage SVM volumes, is implemented as a set of loadable pseudo device drivers. It uses other physical device drivers to pass I/O requests to and from the underlying devices. The metadisk driver operates between the file system and application interfaces and the device driver interface. It interprets information from both the UFS or applications and the physical device drivers. After passing through the metadevice driver, information is received in the expected form by both the file system and the device drivers. The metadevice is a loadable device driver, and it has all the same characteristics as any other disk device driver.
The volume name begins with "d" and is followed by a number. By default, there are 128 unique metadisk devices in the range of 0 to 127. Additional volumes, up to 8192, can be added to the kernel by editing the /kernel/drv/md.conf file. The meta block device accesses the disk using the system's normal buffering mechanism. There is also a character (or raw) device that provides for direct transmission between the disk and the user's read or write buffer. The names of the block devices are found in the /dev/md/dsk directory, and the names of the raw devices are found in the /dev/md/rdsk directory. The following is an example of a block and raw logical device name for metadevice d0:
/dev/md/dsk/d0 - block metadevice d0 /dev/md/rdsk/d0 - raw metadevice d0
You must have root access to administer SVM or have equivalent privileges granted through RBAC. (RBAC is described in Chapter 11, "Controlling Access and Configuring System Messaging.")
There are a number of SVM commands that will help you create, monitor, maintain and remove metadevices. All the commands are delivered with the standard Solaris 10 Operating Environment distribution. Table 10.5 briefly describes the function of the more frequently used commands that are available to the system administrator.
Where They Live The majority of the SVM commands reside in the /usr/sbin directory, although you should be aware that metainit, metadb, metastat, metadevadm, and metarecover reside in /sbinthere are links to these commands in /usr/sbin as well.
No More metatool You should note that the metatool command is no longer available in Solaris 10. Similar functionalitymanaging metadevices through a graphical utilitycan be achieved using the Solaris Management Console (SMC), specifically the Enhanced Storage section.
Creating the State Database
The SVM state database contains vital information on the configuration and status of all volumes, hot spares, and disk sets. There are normally multiple copies of the state database, called replicas, and it is recommended that state database replicas be located on different physical disks, or even different controllers if possible, to provide added resilience.
The state database, together with its replicas, guarantees the integrity of the state database by using a majority consensus algorithm. The algorithm used by SVM for database replicas is as follows:
No Automatic Problem Detection The SVM software does not detect problems with state database replicas until there is a change to an existing SVM configuration and an update to the database replicas is required. If insufficient state database replicas are available, you'll need to boot to single-user mode, and delete or replace enough of the corrupted or missing database replicas to achieve a quorum.
If a system crashes and corrupts a state database replica then the majority of the remaining replicas must be available and consistent; that is, half + 1. This is why at least three state database replicas must be created initially to allow for the majority algorithm to work correctly.
You also need to put some thought into the placement of your state database replicas. The following are some guidelines:
/sbin/metadb -h /sbin/metadb [-s setname] /sbin/metadb [-s setname] -a [-f] [-k system-file] mddbnn /sbin/metadb [-s setname] -a [-f] [-k system-file] [- c number]\[-l length] slice... /sbin/metadb [-s setname] -d [-f] [-k system-file] mddbnn /sbin/metadb [-s setname] -d [-f] [-k system-file] slice... /sbin/metadb [-s setname] -i /sbin/metadb [-s setname] -p [-k system-file] [mddb.cf-file]
Table 10.6 describes the options available for the metadb command.
In the following example, I have reserved a slice (slice 4) on each of two disks to hold the copies of the state database, and I'll create two copies in each reserved disk slice, giving a total of four state database replicas. In this scenario, the failure of one disk drive will result in a loss of more than half of the operational state database replicas, but the system will continue to function. The system will panic only when more than half of the database replicas are lost. For example, if I had created only three database replicas and the drive containing two of the replicas fails, the system will panic.
To create the state database and its replicas, using the reserved disk slices, enter the following command:
# metadb -a -f -c2 c0t0d0s4 c0t1d0s4
Here, -a indicates a new database is being added, -f forces the creation of the initial database, -c2 indicates that two copies of the database are to be created, and the two cxtxdxsx enTRies describe where the state databases are to be physically located. The system returns the prompt; there is no confirmation that the database has been created.
The following example demonstrates how to remove the state database replicas from two disk slices, namely c0t0d0s4 and c0t1d0s4:
# metadb -d c0t0d0s4 c0t1d0s4
Monitoring the Status of the State Database
When the state database and its replicas have been created, you can use the metadb command, with no options, to see the current status. If you use the -i flag then you will also see a description of the status flags.
Examine the state database as shown here:
# metadb -i flags first blk block count a m p luo 16 8192 /dev/dsk/c0t0d0s4 a p luo 8208 8192 /dev/dsk/c0t0d0s4 a p luo 16 8192 /dev/dsk/c0t1d0s4 a p luo 8208 8192 /dev/dsk/c0t1d0s4 r - replica does not have device relocation information o - replica active prior to last mddb configuration change u - replica is up to date l - locator for this replica was read successfully c - replica's location was in /etc/lvm/mddb.cf p - replica's location was patched in kernel m - replica is master, this is replica selected as input W - replica has device write errors a - replica is active, commits are occurring to this replica M - replica had problem with master blocks D - replica had problem with data blocks F - replica had format problems S - replica is too small to hold current data base R - replica had device read errors
Each line of output is divided into the following fields:
The last field in each state database listing is the path to the location of the state database replica.
Recovering from State Database Problems
SVM requires that at least half of the state database replicas must be available for the system to function correctly. When a disk fails or some of the state database replicas become corrupt, they must be removed with the system at the Single User state, to allow the system to boot correctly. When the system is operational again (albeit with fewer state database replicas), additional replicas can again be created.
The following example shows a system with two disks, each with two state database replicas on slices c0t0d0s7 and c0t1d0s7.
If we run metadb -i, we can see that the state database replicas are all present and working correctly:
# metadb -i flags first blk block count a m p luo 16 8192 /dev/dsk/c0t0d0s7 a p luo 8208 8192 /dev/dsk/c0t0d0s7 a p luo 16 8192 /dev/dsk/c0t1d0s7 a p luo 8208 8192 /dev/dsk/c0t1d0s7 r - replica does not have device relocation information o - replica active prior to last mddb configuration change u - replica is up to date l - locator for this replica was read successfully c - replica's location was in /etc/lvm/mddb.cf p - replica's location was patched in kernel m - replica is master, this is replica selected as input W - replica has device write errors a - replica is active, commits are occurring to this replica M - replica had problem with master blocks D - replica had problem with data blocks F - replica had format problems S - replica is too small to hold current data base R - replica had device read errors
metadb -i flags first blk block count a m p luo 16 8192 /dev/dsk/c0t0d0s7 a p luo 8208 8192 /dev/dsk/c0t0d0s7 M p 16 unknown /dev/dsk/c0t1d0s7 M p 8208 unknown /dev/dsk/c0t1d0s7 r - replica does not have device relocation information o - replica active prior to last mddb configuration change u - replica is up to date l - locator for this replica was read successfully c - replica's location was in /etc/lvm/mddb.cf p - replica's location was patched in kernel m - replica is master, this is replica selected as input W - replica has device write errors a - replica is active, commits are occurring to this replica M - replica had problem with master blocks D - replica had problem with data blocks F - replica had format problems S - replica is too small to hold current data base R - replica had device read errors
When the system is rebooted, the following messages appear:
Insufficient metadevice database replicas located. Use metadb to delete databases which are broken. Ignore any Read-only file system error messages. Reboot the system when finished to reload the metadevice database. After reboot, repair any broken database replicas which were deleted.
To repair the situation, you will need to be in single-user mode, so boot the system with -s and then remove the failed state database replicas on c0t1d0s7.
# metadb -d c0t1d0s7
Now reboot the system againit will boot with no problems, although you now have fewer state database replicas. This will enable you to repair the failed disk and re-create the metadevice state database replicas.
Creating a Concatenated Volume
/sbin/metainit -h /sbin/metainit [generic options] concat/stripe numstripes /sbin/metainit [generic options] mirror -m submirror /sbin/metainit [generic options] RAID -r component... [-i interlace] /sbin/metainit [generic options] -a /sbin/metainit [generic options] softpart -p [-e] component size /sbin/metainit -r
Table 10.7 describes the options available for the metainit command.
# metainit -f d100 1 1 c0t0d0s5 d100: Concat/Stripe is setup
Monitoring the Status of a Volume
/usr/sbin/metastat -h /usr/sbin/metastat [-a] [-B] [-c] [-i] [-p] [-q] [-s setname] component
Table 10.8 describes the options for the metastat command.
In the following example, the metastat command is used to display the status of a single metadevice, d100:
# metastat d100 d100: Concat/Stripe Size: 10489680 blocks (5.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c0t0d0s5 0 No Okay Yes Device Relocation Information: Device Reloc Device ID c0t1d0 Yes id1,dad@ASAMSUNG_SP0411N=S01JJ60X901935
# metastat -c d100 d100 s 5.0GB c0t0d0s5
Creating a Soft Partition
Soft partitions are used to divide large partitions into smaller areas, or extents, without the limitations imposed by hard slices. The soft partition is created by specifying a start block and a block size. Soft partitions differ from hard slices created using the format command because soft partitions can be non-contiguous, whereas a hard slice is contiguous. Therefore, soft partitions can cause I/O performance degradation.
A soft partition can be built on a disk slice or another SVM volume, such as a concatenated device. You'll create soft partitions using the SVM command metainit. For example, let's say that we have a hard slice named c2t1d0s1 that is 10GB in size and was created using the format command. To create a soft partition named d10 which is 1GB in size, and assuming that you've already created the required database replicas, issue the following command:
# metainit d10 -p c2t1d0s1 1g
d10: Soft Partition is setup
View the soft partition using the metastat command:
# metastat d10 d10: Soft Partition Device: c2t1d0s1 State: Okay Size: 2097152 blocks (1.0 GB) Device Start Block Dbase Reloc c2t1d0s1 25920 Yes Yes Extent Start Block Block count 0 25921 2097152 Device Relocation Information: Device Reloc Device ID c2t1d0 Yes id1,sd@SIBM_____DDRS34560SUN4.2G564442__________
Create a file system on the soft partition using the newfs command as follows:
# newfs /dev/md/rdsk/d10
Now you can mount a directory named /data onto the soft partition as follows:
# mount /dev/md/dsk/d10 /data
To remove the soft partition named d10, unmount the file system that is mounted to the soft partition and issue the metaclear command as follows:
# metaclear d10
Removing the soft partition destroys all data that is currently stored on that partition.
The system responds with
d10: Soft Partition is cleared
Expanding an SVM Volume
With SVM, you can increase the size of a file system while it is active and without unmounting the file system. The process of expanding a file system consists of first increasing the size of the SVM volume, and then growing the file system that has been created on the partition. In Step by Step 10.1, I'll increase the size of a soft partition and the file system mounted on it.
Soft partitions can be built on top of concatenated devices, and you can increase a soft partition as long as there is room on the underlying metadevice. For example, you can't increase a 1GB soft partition if the metadevice on which it is currently built is only 1GB in size. However, you could add another slice to the underlying metadevice d9.
In Step by Step 10.2 we will create an SVM device on c2t1d0s1 named d9 that is 4GB in size. We then will create a 3GB soft partition named d10 built on this device. To add more space to d10, we first need to increase the size of d9, and the only way to accomplish this is to add more space to d9, as described in the Step by Step.
Creating a Mirror
A mirror is a logical volume that consists of more than one metadevice, also called a submirror. In this example, there are two physical disks: c0t0d0 and c0t1d0. Slice 5 is free on both disks, which will comprise the two submirrors, d12 and d22. The logical mirror will be named d2; it is this device that will be used when a file system is created. Step by Step 10.3 details the whole process:
Unmirroring a Non-Critical File System
This section details the procedure for removing a mirror on a file system that can be removed and remounted without having to reboot the system. Step by Step 10.4 shows how to achieve this. This example uses a file system, /test, that is currently mirrored using the metadevice, d2; a mirror that consists of d12 and d22. The underlying disk slice for this file system is /dev/dsk/c0t0d0s5:
Mirroring the Root File System
In this section we will create another mirror, but this time it will be the root file system. This is different from Step by Step 10.3 because we are mirroring an existing file system that cannot be unmounted. We can't do this while the file system is mounted, so we'll configure the metadevice and a reboot will be necessary to implement the logical volume and to update the system configuration file. The objective is to create a two-way mirror of the root file system, currently residing on /dev/dsk/c0t0d0s0. We will use a spare disk slice of the same size, /dev/dsk/c0t1d0s0, for the second submirror. The mirror will be named d0, and the submirrors will be d10 and d20. Additionally, because this is the root (/) file system, we'll also configure the second submirror as an alternate boot device, so that this second slice can be used to boot the system if the primary slice becomes unavailable. Step by Step 10.5 shows the procedure to follow:
Unmirroring the Root File System
Unlike Step by Step 10.4, where a file system was unmirrored and remounted without affecting the operation of the system, unmirroring a root file system is different because it cannot be unmounted while the system is running. In this case, it is necessary to perform a reboot to implement the change. Step by Step 10.6 shows how to unmirror the root file system that was successfully mirrored in Step by Step 10.5. This example comprises a mirror, d0, consisting of two submirrors, d10 and d20. The objective is to remount the / file system using its full disk device name, /dev/dsk/c0t0d0s0, instead of using /dev/md/dsk/d0:
Troubleshooting Root File System Mirrors
Occasionally, a root mirror fails and recovery action has to be taken. Often, only one side of the mirror fails, in which case it can be detached using the metadetach command. You then replace the faulty disk and reattach it. Sometimes though, a more serious problem occurs prohibiting you from booting the system with SVM present. In this case, you have two options available to you. First, temporarily remove the SVM configuration so that you boot from the original c0t0d0s0 device, or second, you boot from a CD-ROM and recover the root file system manually, by carrying out an fsck.
To disable SVM, you must reinstate pre-SVM copies of the files /etc/system and /etc/vfstab. In Step by Step 10.5 we took a copy of these files (step 5). This is good practice and should always be done when editing important system files. Copy these files again, to take a current backup, and then copy the originals back to make them operational, as shown here:
# cp /etc/system /etc/system.svm # cp /etc/vfstab /etc/vfstab.svm # cp /etc/system.nosvm /etc/system # cp /etc/vfstab.nosvm /etc/vfstab
You should now be able to reboot the system to single-user without SVM and recover any failed file systems.
If the preceding does not work, it might be necessary to repair the root file system manually, requiring you to boot from a CD-ROM. Insert the Solaris 10 CD 1 disk (or the Solaris 10 DVD) and shut down the system if it is not already shut down.
Boot to single-user from the CD-ROM as follows:
ok boot cdrom -s
When the system prompt is displayed, you can manually run fsck on the root file system. In this example, I am assuming a root file system exists on /dev/rdsk/c0t0d0s0:
# fsck /dev/rdsk/c0t0d0s0 ** /dev/rdsk/c0t0d0s0 ** Last Mounted on / ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups FREE BLK COUNT(S) WRONG IN SUPERBLK SALVAGE? y 136955 files, 3732764 used, 1404922 free (201802 frags, 150390 blocks, \ 3.9% fragmentation) ***** FILE SYSTEM WAS MODIFIED *****
You should now be able to reboot the system using SVM and you should resynchronize the root mirror as soon as the system is available. This can be achieved easily by detaching the second submirror and then reattaching it. The following example shows a mirror d0 consisting of d10 and d20:
# metadetach d0 d20 d0: submirror d20 is detached # metattach d0 d20 d0: submirror d20 is attached
To demonstrate that the mirror is performing a resynchronization operation, you can issue the metastat command as follows, which will show the progress as a percentage:
# metastat d0 d0: Mirror Submirror 0: d10 State: Okay Submirror 1: d20 State: Resyncing Resync in progress: 37 % done Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 10462032 blocks (5.0 GB) d10: Submirror of d0 State: Okay Size: 10462032 blocks (5.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c0t0d0s0 0 No Okay Yes d20: Submirror of d0 State: Resyncing Size: 10489680 blocks (5.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c0t1d0s0 0 No Okay Yes Device Relocation Information: Device Reloc Device ID c0t0d0 Yes id1,dad@AWDC_AC310200R=WD-WT6750311269 c0t1d0 Yes id1,dad@ASAMSUNG_SP0411N=S01JJ60X901935