| |
|
Home
|
| Red Hat Linux 8.0: The Official Red Hat Linux System Administration Primer |
|---|
| Prev | Chapter 5. Managing Storage | Next |
RAID-Based StorageOne skill that a system administrator should cultivate is the
ability to look at complex system configurations, and observe the
different shortcomings inherent in each configuration. While this
might, at first glance, seem to be a rather depressing viewpoint to
take, it can be a great way to look beyond the shiny new boxes to some
future Saturday night with all production down due to a failure that
could easily have been avoided. With this in mind, let us use what we now know about disk-based
storage and see if we can determine the ways that disk drives can cause
problems. First, consider an outright hardware failure: A disk drive with four partitions on it dies completely: what
happens to the data on those partitions? It is immediately unavailable
(at least until it can be restored from a recent backup, that
is). A disk drive with a single partition on it is operating at the
limits of its design due to massive I/O loads: what happens to
applications that require access to the data on that partition? The
applications slow down because the disk drive cannot process reads and
writes any faster. You have a large data file that is slowly growing in size; soon it
will be larger than the largest disk drive available for your system.
What happens then? The data file (and its associated applications) stop
running. Just one of these problems could cripple a data center, yet system
administrators must face these kinds of issues every day. What can be
done? Fortunately, there is one technology that can address each one of
these issues. And the name for that technology is RAID. Basic ConceptsRAID is an acronym standing for Redundant Array of Independent
Disks[1]. As the name implies, RAID is a way
for multiple disk drives to act as a single disk drive. RAID techniques were first developed by researchers at the
University of California, Berkeley in the mid-1980s. At the time,
there was a large gap in price between the high-performance disk
drives used on the large computers installations of the day, and the
smaller, slower disk drives used by the still-young personal computer
industry. RAID was viewed as a method of having many less expensive
disk drives fill in for higher-priced hardware. More importantly, RAID arrays can be constructed in different
ways, and will have different characteristics depending on the final
configuration. Let us look at the different configurations (known as
RAID levels) in more detail. RAID LevelsThe Berkeley researchers originally defined five different RAID
levels and numbered them "1" through "5". In time, additional RAID
levels were defined by other researchers and members of the storage
industry. Not all RAID levels were equally useful; some were of
interest only for research purposes, and others could not be
economically implemented. In the end, there were three RAID levels that ended up seeing
widespread usage: The following sections will discuss each of these levels in more
detail. RAID 0The disk configuration known as RAID level 0 is a bit
misleading, as this is the only RAID level that employs absolutely
no redundancy. However, even though RAID 0 has no advantages from
a reliability standpoint, it does have other advantages. A RAID 0 array consists of two or more disk drives. The
drives are divided into chunks, which
represents some multiple of the drives' native block size. Data
written to the array will be written, chunk by chunk, to each
drive in the array. The chunks can be thought of as forming
stripes across each drive in the array; hence the other term for
RAID 0: striping. For example, with a two-drive array and a 4KB chunk size,
writing 12KB of data to the array would result in the data being
written in three 4KB chunks to the following drives: The first 4KB would be written to the first drive, into
the first chunk The second 4KB would be written to the second drive, into
the second chunk The last 4KB would be written to the first drive, into the
second chunk
Advantages to RAID 0Compared to a single disk drive, the advantages to RAID 0
are: Larger total size — RAID 0 arrays can be constructed
that are larger than a single disk drive, making it easier
to store larger data files Better read/write performance — The I/O load on a RAID
0 array will be spread evenly among all the drives in the
array No wasted space — All available storage on all drives
in the array are available for data storage
Disadvantages to RAID 0Compared to a single disk drive, RAID 0 has the following
disadvantage:  | Tip |
|---|
| | If you have trouble keeping the different RAID levels
straight, just remember that RAID 0 has
zero percent redundancy. |
RAID 1RAID 1 uses two (although some implementations support more)
identical disk drives. All data is written to both drives, making
them identical copies of each other. That is why RAID 1 is often
known as mirroring. Whenever data is written to a RAID 1 array, two physical
writes must take place: one to one drive, and one to the other.
Reading data, on the other hand, only needs to take place once and
either drive in the array can be used. Advantages to RAID 1Compared to a single disk drive, a RAID 1 array has the
following advantages: Improved redundancy — Even if one drive in the array
were to fail, the data would still be accessible Improved read performance — With both drives
operational, reads can be evenly split between them
Disadvantages to RAID 1When compared to a single disk drive, a RAID 1 array has
some disadvantages: Reduced write performance — Because both drives must be
kept up-to-date, all write I/O must be performed by both
drives, slowing the overall process of writing data to the
array Reduced cost efficiency — With one entire drive
dedicated to redundancy, the cost of a RAID 1 array is at
least double that of a single drive
RAID 5RAID 5 attempts to combine the benefits of RAID 0 and RAID 1,
while minimizing their respective disadvantages. Like RAID 0, a RAID 5 array consists of multiple disk drives,
each divided into chunks. This allows a RAID 5 array to be larger
than any single drive. And like a RAID 1 array, a RAID 5 array
uses some disk space in a redundant fashion, improving
reliability. However, the way RAID 5 works is unlike either RAID 0 or
1. A RAID 5 array must consist of at least three
identically-sized disk drives (although more drives may be used).
Each drive is divided into chunks and data is written to the
chunks in order. However, not every chunk is dedicated to data
storage as it is in RAID 0. Instead, in an array with
n disk drives in it, every
nth chunk is dedicated to
parity. Chunks containing parity make it possible to recover data
should one of the drives in the array fail. The parity in chunk
x is calculated by mathematically
combining the data from each chunk x
stored on all the other drives in the array. If the data in a
chunk is updated, the corresponding parity chunk must be
recalculated and updated as well. This also means that every time data is written to the array,
two drives are written to: the drive holding
the data, and the drive containing the parity chunk. One key point to keep in mind is that the parity chunks are
not concentrated on any one drive in the array. Instead, they are
spread evenly through all the drives. Even though dedicating a
specific drive to contain nothing but parity is possible (and, in
fact, this configuration is known as RAID level 4), the constant
updating of parity as data is written to the array would mean that
the parity drive could become a performance bottleneck. By
spreading the parity information throughout the array, this impact
is reduced. Advantages to RAID 5Compared to a single drive, a RAID 5 array has the following
advantages: Improved redundancy — If one drive in the array fails,
the parity information can be used to reconstruct the
missing data chunks, all while keeping the data available
for use Improved read performance — Due to the RAID 0-like way
data is divided between drives in the array, read I/O
activity is spread evenly between all the drives Reasonably good cost efficiency — For a RAID 5 array of
n drives, only
1/nth of the total available
storage is dedicated to redundancy
Disadvantages to RAID 5Compared to a single drive, a RAID 5 array has the following
disadvantage: Nested RAID LevelsAs should be obvious from the discussion of the various RAID
levels, each level has specific strengths and weaknesses. It was
not long before people began to wonder whether different RAID
levels could somehow be combined, producing arrays with all of the
strengths and none of the weaknesses of the original levels. For example, what if the disk drives in a RAID 0 array were
actually RAID 1 arrays? This would give the advantages of RAID
0's speed, with the reliability of RAID 1. This is just the kind of thing that can be done. Here are the
most commonly-nested RAID levels: Because nested RAID is used in more specialized environments,
we will not go into greater detail here. However, there are two
points to keep in mind when thinking about nested RAID: Order matters — The order in which RAID levels are nested
can have a large impact on reliability. In other words, RAID
1+0 and RAID 0+1 are not the same Costs can be high — If there is any disadvantage common
to all nested RAID implementations, it is one of cost; the
smallest possible RAID 5+1 array is six disk drives (and even
more drives will be required for larger arrays)
Now that we have explored the concepts behind RAID, let us see
how RAID can be implemented. RAID ImplementationsIt is obvious from the previous sections that RAID requires
additional "intelligence" over and above the usual disk I/O
processing for individual drives. At the very least, the following
tasks must be performed: Dividing incoming I/O requests to the individual disks in
the array Calculating parity (for RAID 5), and writing it to the
appropriate drive in the array Monitoring the individual disks in the array and taking
the appropriate actions should one fail Controlling the rebuilding of an individual disk in the
array, when that disk has been replaced or repaired Providing a means to allow administrators to maintain the
array (removing and adding drives, initiating and halting
rebuilds, etc.)
Fortunately, there are two major methods that may be used to
accomplish these tasks. The next two sections will describe
them. Hardware RAIDA hardware RAID implementation usually takes the form of a
specialized disk controller card. The card performs all
RAID-related functions and directly controls the individual drives
in the arrays attached directly to it. With the proper driver,
the arrays managed by a hardware RAID card appear to the host
operating system just as if they were regular disk drives. Most RAID controller cards work with SCSI drives, although
there are some IDE-based RAID controllers as well. In any case,
the administrative interface is usually implemented in one of three
ways: Specialized utility programs that run as applications
under the host operating system An on-board interface using a serial port that is
accessed using a terminal emulator A BIOS-like interface that is only accessible during the
system's power-up testing
Some RAID controllers have more than one type of
administrative interface available. For obvious reasons, a
software interface provides the most flexibility, as it allows
administrative functions while the operating system is running.
However, if you are going to boot Red Hat Linux from a RAID controller, an
interface that does not require a running operating system is a
requirement. Because there are so many different RAID controller cards on
the market, it is impossible to go into further detail here. The
best course of action is to read the manufacturer's documentation
for more information. Software RAIDSoftware RAID is simply RAID implemented as kernel- or
driver-level software for a particular operating system. As such,
it provides more flexibility in terms of hardware support — as
long as the hardware is supported by the operating system, RAID
arrays can be configured and deployed. This can dramatically
reduce the cost of deploying RAID by eliminating the need for
expensive, specialized RAID hardware. Because Red Hat Linux includes support for software RAID, the
remainder of this section will describe how it may be configured
and deployed. Creating RAID ArraysUnder Red Hat Linux there are two ways that RAID arrays can be
created: We will next look into these two methods. While Installing Red Hat LinuxDuring the normal Red Hat Linux installation process, RAID arrays can be
created. This is done during the disk partitioning phase of the
installation. To begin, you must manually partition your disk
drives using Disk Druid. You will first
need to create partitions of the type "software RAID". These
partitions will later be combined to form the desired RAID
arrays. Once you have created all the partitions required for the RAID
array(s) that you wish to create, you must then use the
RAID button to actually create the arrays.
You will be presented with a dialog box where you select the array's
mount point, file system type, RAID device name, RAID level, and the
"software RAID" partitions on which this array will be based. Once the desired arrays have been created, the installation
process continues as usual.  | Tip |
|---|
| | For more information on creating software RAID arrays during
the Red Hat Linux installation process, refer to the
Official Red Hat Linux Customization Guide. |
After Red Hat Linux Has Been InstalledCreating a RAID array after Red Hat Linux has been installed is a bit
more complex. As with the addition of any type of disk storage, the
necessary hardware must first be installed and properly configured.
Partitioning is a bit different for RAID than it is for single disk
drives. Instead of selecting a partition type of "Linux" (type 83)
or "Linux swap" (type 82), all partitions that will be part of a
RAID array must be set to "Linux raid auto" (type fd). Next, it is
necessary to create the /etc/raidtab file.
This file is responsible for the proper configuration of all RAID
arrays on your system. The file format (which is documented in the
raidtab man page) is relatively
straightforward. Here is an example
/etc/raidtab entry for a RAID 1 array: raiddev /dev/md0
raid-level 1
nr-raid-disks 2
chunk-size 64k
persistent-superblock 1
nr-spare-disks 0
device /dev/hda2
raid-disk 0
device /dev/hdc2
raid-disk 1 |
Some of the more notable sections in this entry are: raiddev — Shows the
special device file name for the RAID array[3] raid-level — Defines the
RAID level to used used by this RAID array nr-raid-disks — Indicates
how many physical disk partitions are to be part of this array nr-spare-disks — Software
RAID under Red Hat Linux allows the definition of one or more spare disk
partitions; these partitions can automatically take the place of
a malfunctioning disk device,
raid-disk — Together, they
define the physical disk partitions that will make up the RAID
array
Next, it is necessary to actually create the RAID array. This
is done with the mkraid program. Using our
example /etc/raidtab file, we would create the
/dev/md0 RAID array with the following
command: The RAID array /dev/md0 is now ready to be
formatted and mounted. This process is no different than the single
drive approach outlined in the Section called Partitioning and the Section called Formatting the Partition(s). Day to Day Management of RAID ArraysThere is little that needs to be done to keep a RAID array
operating. As long as no hardware problems crop up, the array should
function just as if it were a single physical disk drive. However, just as a system administrator should periodically check
the status of all disk drives on the system, the RAID arrays should be
checked as well. Checking Array Status With /proc/mdstatThe file /proc/mdstat is the easiest way to
check on the status of all RAID arrays on a particular system. Here
is a sample mdstat (view with the command
cat /proc/mdstat): Personalities : [raid1]
read_ahead 1024 sectors
md3 : active raid1 hda4[0] hdc4[1]
73301184 blocks [2/2] [UU]
md1 : active raid1 hda3[0] hdc3[1]
522048 blocks [2/2] [UU]
md0 : active raid1 hda2[0] hdc2[1]
4192896 blocks [2/2] [UU]
md2 : active raid1 hda1[0] hdc1[1]
128384 blocks [2/2] [UU]
unused devices: <none> |
On this system, there are four RAID arrays (all RAID 1). Each
RAID array has its own section in /proc/mdstat
and contains the following information: The RAID array device name (minus
/dev/) The status of the RAID array The RAID array's RAID level The physical partitions that currently make up the array
(followed by the partition's array unit number) The size of the array The number of configured devices versus the number of
operative devices in the array The status of each configured device in the array
(U meaning the device is OK,
and _ indicating that the
device has failed)
Rebuilding a RAID array with raidhotaddShould /proc/mdstat show that a problem
exists with one of the RAID arrays, the
raidhotadd utility program should be used to
rebuild the array. Here are the steps that would need to be
performed: Determine which disk drive contains the failed partition Correct the problem that caused the failure (most likely by
replacing the drive) Partition the new drive so that the partitions on it are
identical to those on the other drive(s) in
the array Issue the following command: raidhotadd <raid-device> <disk-partition> |
Monitor /proc/mdstat to watch the
rebuild take place
 | Tip |
|---|
| | Here is a command that can be used to watch the rebuild as it
takes place: watch -n1 cat /proc/mdstat |
|
|
|
|
|
|
|
|
|
Disclaimer: For authoritative source or latest update to this
documentation, please refer to http://www.redhat.com/docs/manuals/linux/ |
|
 |
|
|
|
Quotes: Television has done much for psychiatry by spreading information about it, as well as contributing to the need for it.
|
|
|
|
|
|
|