| |
|
Home
|
| Red Hat Linux 8.0: The Official Red Hat Linux System Administration Primer |
|---|
| Prev | Chapter 8. Planning for Disaster | Next |
BackupsBackups have two major purposes: The first purpose is the basis for the typical file restoration
request: a user accidentally deletes a file, and asks that it be
restored from the latest backup. The exact circumstances may vary
somewhat, but this is the most common day-to-day use for backups. The second situation is a system administrator's worst nightmare:
for whatever reason, the system administrator is staring at hardware
that used to be a productive part of the data center. Now, it is little
more than a lifeless chunk of steel and silicon. The thing that is
missing is all the software and data you and your users have assembled
over the years. Supposedly everything has been backed up. The question
is: has it? And if it has, will you be able to restore it? Different Data: Different Backup NeedsIf you look at the kinds of data[1] processed and
stored by a typical computer system, you will find that some of
the data hardly ever changes, and some of the data is constantly
changing. The pace at which data changes is crucial to the design of a
backup procedure. There are two reasons for this: A backup is nothing more than a snapshot of the data being
backed up. It is a reflection of that data at a particular moment
in time. Data that changes infrequently can be backed up infrequently;
data that changes more frequently must be backed up more
frequently.
System administrators that have a good understanding of their
systems, users, and applications should be able to quickly group the
data on their systems into different categories. However, here are
some examples to get you started: - Operating System
This data only changes during upgrades, the installation of
bug-fixes, and any site-specific modifications.  | Tip |
|---|
| | Should you even bother with operating system backups?
This is a question that many system administrators have
pondered over the years. On the one hand, if the installation
process is relatively easy, the application of bug-fixes and
customizations are well documented and easily reproducible,
simply reinstalling the operating system may be a viable
option. On the other hand, if there is the least doubt that a
fresh installation can completely recreate the original system
environment, backing up the operating system is the best
choice. |
- Application Software
This data changes whenever applications are installed,
upgraded, or removed. - Application Data
This data changes as frequently as the associated
applications are run. Depending on the specific application and
your organization, this could mean that changes take place
second-by-second, or once at the end of each fiscal year. - User Data
This data changes according to the usage patterns of your
user community. In most organizations, this means that changes
take place all the time.
Based on these categories (and any additional ones that are
specific to your organization), you should have a pretty good idea
concerning the nature of the backups that are needed to protect your
data.  | Note |
|---|
| | You should keep in mind that most backup software deals with
data on a directory or file system level. In other words, your
system's directory structure will play a part in how backups will be
performed. This is another reason why it is always a good idea to
carefully consider the best directory structure for a new system,
grouping files and directories according to their anticipated
usage. |
Backup TechnologiesRed Hat Linux comes with several different programs for backing up and
restoring data. By themselves, these utility programs do not
constitute a complete backup solution. However, they can be used as
the nucleus of such a solution, and as such, warrant some
attention. tarThe tar utility is well known among UNIX
system administrators. It is the archiving method of choice for
sharing ad-hoc bits of source code and files between systems. The
tar implementation included with Red Hat Linux is GNU
tar, one of the more feature-rich
tar implementations. Backing up the contents of a directory can be as simple as
issuing a command similar to the following: tar cf /mnt/backup/home-backup.tar /home/ |
This command will create an archive called
home-backup.tar in
/mnt/backup/. The archive will contain the
contents of the /home/ directory. The archive
file can be compressed by adding a single option: tar czf /mnt/backup/home-backup.tar.gz /home/ |
The home-backup.tar.gz file is now
gzip compressed. There are many other options to tar; to learn
more about them, read the tar man page. cpioThe cpio utility is another traditional UNIX
program. It is an excellent general-purpose program for moving data
from one place to another and, as such, can serve well as a backup
program. The behavior of cpio is a bit different from
tar. Unlike tar,
cpio reads the files it is to process via
standard input. A common method of generating a list of files for
cpio is to use programs such as
find whose output is then piped to
cpio: find /home | cpio -o > /mnt/backup/home-backup.cpio |
This command creates a cpio archive called
home-backup.cpio in the
/mnt/backup directory. There are many other options to cpio; to
learn more about them see the cpio man
page. dump/restore: Not Recommended!The dump and restore
programs are Linux equivalents to the UNIX programs of the same
name. As such, many system administrators with UNIX experience may
feel that dump and restore are
viable candidates for a good backup program under Red Hat Linux.
Unfortunately, the design of the Linux kernel has moved ahead of
dump's design. Here is Linus Torvald's comment
on the subject: From: Linus Torvalds
To: Neil Conway
Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.
Date: Fri, 27 Apr 2001 09:59:46 -0700 (PDT)
Cc: Kernel Mailing List <linux-kernel At vger Dot kernel Dot org>
[ linux-kernel added back as a cc ]
On Fri, 27 Apr 2001, Neil Conway wrote:
> > I'm surprised that dump is deprecated (by you at least ;-)). What to
> use instead for backups on machines that can't umount disks regularly?
Note that dump simply won't work reliably at all even in 2.4.x: the buffer
cache and the page cache (where all the actual data is) are not
coherent. This is only going to get even worse in 2.5.x, when the
directories are moved into the page cache as well.
So anybody who depends on "dump" getting backups right is already playing
Russian roulette with their backups. It's not at all guaranteed to get the
right results - you may end up having stale data in the buffer cache that
ends up being "backed up".
Dump was a stupid program in the first place. Leave it behind.
> I've always thought "tar" was a bit undesirable (updates atimes or
> ctimes for example).
Right now, the cpio/tar/xxx solutions are definitely the best ones, and
will work on multiple filesystems (another limitation of "dump"). Whatever
problems they have, they are still better than the _guaranteed_(*) data
corruptions of "dump".
However, it may be that in the long run it would be advantageous to have a
"filesystem maintenance interface" for doing things like backups and
defragmentation..
Linus
(*) Dump may work fine for you a thousand times. But it _will_ fail under
the right circumstances. And there is nothing you can do about it. |
Given this problem, the use of
dump/restore is strongly
discouraged. Backup Software: Buy Versus BuildNow that we have seen the basic utility programs that do the
actual work of backing up data, the next step is to determine how to
integrate these programs into an overall process that does the
following things: Schedules backups to run at the proper time Manages the location, rotation, and usage of backup
media Works with operators (and/or robotic media changers) to ensure
that the proper media is available Assists operators in locating the media containing a specific
backup of a specific file
As you can see, a real-world backup solution entails much more
than just typing a tar command. Most system administrators at this point look at one of two
solutions: Each approach has its good and bad points. Given the complexity
of the task, an in-house solution is not likely to handle some aspects
(most notably media management) very well. However, for some
organizations, this might not be a shortcoming. A commercially-developed solution is more likely to be highly
functional, but may also be overly-complex for the organization's
present needs. That said, the complexity might make it possible to
stick with one solution even as the organization grows. As you can see, there is no clear-cut method for deciding on a
backup system. The only guidance that can be offered is to ask you to
consider these points: Changing backup software is difficult; once implemented, you
will be using the backup software for a long time. After all, you
will have long-term archive backups that you will need to be able
to read. Changing backup software means you must either keep the
original software around, or you must convert your archive backups
to be compatible with the new software. The software must be 100% reliable when it comes to backing up
what it is supposed to, when it is supposed to. When the time comes to restore any data — whether a
single file, or an entire file system — the backup software
must be 100% reliable.
Although this section has dealt with a build-or-buy decision,
there is, in fact, another approach. There are open source
alternatives available, and one of them is included with Red Hat Linux. The Advanced Maryland Automatic Network Disk Archiver (AMANDA)AMANDA is a client/server based backup application produced by
the University of Maryland. By having a client/server architecture,
a single backup server (normally a fairly powerful system with a
great deal of free space on fast disks, and configured with the
desired backup device) can back up many client systems, which need
nothing more than the AMANDA client software. This approach to backups makes a great deal of sense, as it
concentrates those resources needed for backups in one system,
instead of requiring additional hardware for every system requiring
backup services. AMANDA's design also serves to centralize the
administration of backups, making the system administrator's life
that much easier. The AMANDA server manages a pool of backup media, and rotates
usage through the pool in order to ensure that all backups are
retained for the administrator-dictated timeframe. All media is
pre-formatted with data that allows AMANDA to detect whether the
proper media is available or not. In addition, AMANDA can be
interfaced with robotic media changing units, making it possible to
completely automate backups. AMANDA can use either tar or
dump to do the actual backups (although under
Red Hat Linux using tar is preferable, due to the issues
with dump raised in the Section called dump/restore: Not Recommended!). As such, AMANDA backups
do not require AMANDA in order to restore files — a decided
plus. In operation, AMANDA is normally scheduled to run once a day
during the data center's backup window. The AMANDA server connects
to the client systems, and directs the clients to produce estimated
sizes of the backups to be done. Once all the estimates are
available, the server constructs a schedule, automatically
determining the order in which systems will be backed up. Once the backups actually start, the data is sent over the
network from the client to the server, where it is stored on a
holding disk. Once a backup is complete, the server starts writing
it out from the holding disk to the backup media. At the same time,
other clients are sending their backups to the server for storage on
the holding disk. This results in a continuous stream of data
available for writing to the backup media. As backups are written
to the backup media, they are deleted from the server's holding
disk. Once all backups have been completed, the system administrator
is emailed a report outlining the status of the backups, making
review easy and fast. Should it be necessary to restore data, AMANDA contains a
utility program that allows the operator to identify the file
system, date, and file name(s). Once this is done, AMANDA
identifies the correct backup media, accesses, and restores the
desired data. As stated earlier, AMANDA's design also makes it
possible to restore data even without AMANDA's assistance, although
identification of the correct media would be a slower, manual
process. This section has only touched upon the most basic AMANDA
concepts. If you would like to do more research on AMANDA, your
Red Hat Linux system has additional information. To learn more, type the
following command for a list of documentation files available for
AMANDA: (Note that this command will only work if you have installed the
amanda RPMs on your Red Hat Linux system.) You can also learn more about AMANDA from the AMANDA website at http://www.amanda.org/. Types of BackupsIf you were to ask a person that was not familiar with computer
backups, most would think that a backup was simply an identical copy
of the data on the computer. In other words, if a backup was created
Tuesday evening, and nothing changed on the computer all day
Wednesday, the backup created Wednesday evening would be identical to
the one created on Tuesday. While it is possible to configure backups in this way, it is
likely that you would not. To understand more about this, we first
need to understand the different types of backups that can be
created. They are: Full backups Incremental backups Differential backups
Full BackupsThe type of backup that was discussed at the beginning of this
section is known as a full backup. A full
backup is simply a backup where every single file is written to the
backup media. As noted above, if the data being backed up never
changes, every full backup being created will be the same. That similarity is due to the fact that a full backup does not
check to see if a file has changed since the last backup; it blindly
writes it to the backup media whether it has been modified or
not. This is the reason why full backups are not done all the time
— every file is written to the backup media. This means that
a great deal of backup media is used even if nothing has changed.
Backing up 100 gigabytes of data each night when maybe 10 megabytes
worth of data has changed is not a sound approach; that is why
incremental backups were created. Incremental BackupsUnlike full backups, incremental backups first look to see
whether a file's modification time is more recent than its last
backup time. If it is not, that file has not been modified since
the last backup and can be skipped this time. On the other hand, if
the modification date is more recent than the
last backup date, the file has been modified and should be backed
up. Incremental backups are used in conjunction with an occasional
full backup (for example, a weekly full backup, with daily
incrementals). The primary advantage gained by using incremental backups is
that the incremental backups run more quickly than full backups.
The primary disadvantage to incremental backups is that restoring
any given file may mean going through one or more incremental
backups until the file is found. When restoring a complete file
system, it is necessary to restore the last full backup and every
subsequent incremental backup. In an attempt to alleviate the need to go through every
incremental backup, a slightly different approach was implemented.
This is known as the differential
backup. Differential BackupsDifferential backups are similar to incremental backups in that
both backup only modified files. However, differential backups are
cumulative — in other words, with a
differential backup, if a file is modified and backed up on Tuesday
night, it will also be backed up on Wednesday night (even if it has
not been modified since). Of course, all newly-modified files will be backed up as
well. Like the backup strategy used with incremental backups,
differential backups normally follow the same approach: a single
periodic full backup followed by more frequent differential
backups. The affect of using differential backups in this way is that the
differential backups tend to grow a bit over time (assuming
different files are modified over the time between full backups).
However, the benefit to differential backups comes at restoration
time — at most, the latest full backup and the latest
differential backup will need to be restored. Backup MediaWe have been very careful to use the term "backup media"
throughout the previous sections. There is a reason for that. Most
experienced system administrators usually think about backups in terms
of reading and writing tapes, but today there are other
options. At one time, tape devices were the only removable media devices
that could reasonably be used for backup purposes. However, this has
changed. In the following sections we will look at the most popular
backup media, and review their advantages as well as their
disadvantages. TapeTape was the first widely-used removable data storage medium.
It has the benefits of low media cost, and reasonably-good storage
capacity. However, tape has some disadvantages — it is
subject to wear, and data access on tape is sequential in
nature. These factors mean that it is necessary to keep track of tape
usage (retiring tapes once they have reached the end of their useful
life), and that searching for a specific file on tape can be a
lengthy proposition. On the other hand, tape is one of the most inexpensive mass
storage media available, and it has a long history of reliability.
This means that building a good-sized tape library need not consume
a large part of your budget, and you can count on it being usable
now and in the future. DiskIn years past, disk drives would never have been used as a
backup medium. However, storage prices have dropped to the point
where, in some cases, using disk drives for backup storage does make
sense. The primary reason for using disk drives as a backup medium
would be speed. There is no faster mass storage medium available.
Speed can be a critical factor when your data center's backup window
is short, and the amount of data to be backed up is large. But disk storage is not the ideal backup medium, for a number of
reasons: Disk drives are not normally removable. One key factor to
an effective backup strategy is to get the backups out of your
data center and into off-site storage of some sort. A backup of
your production database sitting on a disk drive two feet away
from the database itself is not a backup; it is a copy. And
copies are not very useful should the data center and its
contents (including your copies) be damaged or destroyed by some
unfortunate set of circumstances. Disk drives are expensive (at least compared to other backup
media). There may be circumstances where money truly is no
object, but in all other circumstances, the expenses associated
with using disk drives for backup mean that the number of backup
copies will be kept low to keep the overall cost of backups low.
Fewer backup copies mean less redundancy should a backup not be
readable for some reason. Disk drives are fragile. Even if you spend the extra money
for removable disk drives, their fragility can be a problem. If
you drop a disk drive, you have lost your backup. It is
possible to purchase specialized cases that can reduce (but not
entirely eliminate) this hazard, but that makes an
already-expensive proposition even more so. Disk drives are not archival media. Even assuming you are
able to overcome all the other problems associated with
performing backups onto disk drives, you should consider the
following. Most organizations have various legal requirements
for keeping records available for certain lengths of time. The
chance of getting usable data from a 20-year-old tape is much
greater than the chance of getting usable data from a
20-year-old disk drive. For instance, would you still have the
hardware necessary to connect it to your system? Another thing
to consider is that a disk drive is much more complex than a
tape cartridge. When a 20-year-old motor spins a 20-year-old
disk platter, causing 20-year-old read/write heads to fly over
the platter surface, what are the chances that all these
components will work flawlessly after sitting idle for 20
years?  | Note |
|---|
| | Some data centers back up to disk drives and then, when
the backups have been completed, the backups are written out
to tape for archival purposes. In many respects this is
similar to how AMANDA handles backups. |
All this said, there are still some instances where backing up
to disk drives might make sense. In the next section we will see
how they can be combined with a network to form a viable backup
solution. NetworkBy itself, a network cannot act as backup media. But combined
with mass storage technologies, it can serve quite well. For
instance, by combining a high-speed network link to a remote data
center containing large amounts of disk storage, suddenly the
disadvantages about backing up to disks mentioned earlier are no
longer disadvantages. By backing up over the network, the disk drives are already
off-site, so there is no need for transporting fragile disk drives
anywhere. With enough network bandwidth, the speed advantage you
can get from disk drives is maintained. However, this approach still does nothing to address the matter
of archival storage (though the same "spin off to tape after the
backup" approach mentioned earlier can be used). In addition, the
costs of a remote data center with a high-speed link to the main
data center make this solution extremely expensive. But for the
types of organizations that need the kind of features this solution
can provide, it is a cost they will gladly pay. Storage of BackupsOnce the backups are complete, what happens then? The obvious
answer is that the backups must be stored. However, what is not so
obvious is exactly what should be stored — and where. To answer these questions, we must first consider under what
circumstances the backups will be used. There are three main
situations: Small, ad-hoc restoration requests from users Massive restorations to recover from a disaster Archival storage unlikely to ever be used again
Unfortunately, there are irreconcilable differences between
numbers 1 and 2. When a user accidentally deletes a file, they would
like it back immediately. This implies that the backup media is no
more than a few steps away from the system to which the data is to be
restored. In the case of a disaster that necessitates a complete restoration
of one or more computers in your data center, if the disaster was
physical in nature, whatever it was that destroyed your computers
would also destroy the backups sitting a few steps away from the
computers. This would be a very bad state of affairs. Archival storage is less controversial; since the chances that it
will ever be used for any purpose are rather low, if the backup media
was located miles away from the data center there would be no real
problem. The approaches taken to resolve these differences vary according
to the needs of the organization involved. One possible approach is
to store several days worth of backups on-site; these backups are then
taken to more secure off-site storage when newer daily backups are
created. Another approach would be to maintain two different pools of
media: Of course, having two pools implies the need to run all backups
twice, or to make a copy of the backups. This can be done, but double
backups can take too long, and copying requires multiple backup drives
to process the copies (and a probably-dedicated system to actually
perform the copy). The challenge for a system administrator is to strike a balance
that adequately meets everyone's needs, while ensuring that the
backups are available for the worst of situations. Restoration IssuesWhile backups are a daily occurrence, restorations are normally a
less frequent event. However, restorations are inevitable; they will
be necessary, so it is best to be prepared. The important thing to do is to look at the various restoration
scenarios detailed throughout this section, and determine ways to test
your ability to actually carry them out. And keep in mind that the
hardest one to test is the most critical one. Restoring From the Bare MetalThe phrase "restoring from the bare metal" is system
administrator's way of describing the process of restoring a
complete system backup onto a computer with absolutely no data of
any kind on it — no operating system, no applications,
nothing. Although some computers have the ability to create bootable
backup tapes, and to actually boot from them to start the
restoration process, the PC architecture used in most systems
running Red Hat Linux do not lend themselves to this approach. However,
some alternatives are available: - Rescue disks
A rescue disk is usually a bootable CD-ROM that contains
enough of a Linux environment to perform the most common
system administration tasks. The rescue disk environment
contains the necessary utilities to partition and format disk
drives, the device drivers necessary to access the backup
device, and the software necessary to restore data from the
backup media. - Reinstall, followed by restore
Here the base operating system is installed just as if a
brand-new computer were being initially set up. Once the
operating system is in place and configured properly, the
remaining disk drives can be partitioned and formatted, and
the backup restored from the backup media.
Red Hat Linux supports both of these approaches. In order to be
prepared, you should try a bare metal restore from time to time (and
especially whenever there has been any significant change in the
system environment). Testing BackupsEvery type of backup should be tested on a periodic basis to
make sure that data can be read from it. It is a fact that
sometimes backups are performed that are, for one reason or another,
unreadable. The unfortunate part in all this is that many times it
is not realized until data has been lost and must be restored from
backup. The reasons for this can range from changes in tape drive head
alignment, misconfigured backup software, and operator error. No
matter what the cause, without periodic testing you cannot be sure
that you are actually generating backups from which data can be
restored at some later time.
|
|
|
|
|
|
|
|
Disclaimer: For authoritative source or latest update to this
documentation, please refer to http://www.redhat.com/docs/manuals/linux/ |
|
 |
|
|
|
Quotes: To see a World in a Grain of Sand And a Heaven in a Wild Flower, Hold Infinity in the palm of your hand And Eternity in an hour.
|
|
|
|
|
|
|