A data loss issue has been reported by a customer with the following conditions: duplications to tape are performed by a Red Hat Linux Media Server running Red Hat GFS (Global File System) versions prior to GFS 6.1 and NetBackup 6.0 Maintenance Pack 3 (MP3) is being used.
Details:
Introduction:
A possibility
for data loss has been discovered in Veritas NetBackup (tm) 6.0 Maintenance Pack
3 (MP3) when duplications to tape are performed by a Red Hat Linux Media Server
that is running Red Hat Global File System (GFS) with a version prior to GFS
6.1. When the End of Media (EOM) is reached during duplication, that media
may be used again incorrectly and overwritten. This is due to the fact
that the "fcntl lock" function of the GFS does not perform as
expected.
This issue has
only been reported by one customer, and only when a NetBackup Media Server is
installed on a Red Hat GFS volume. Symantec has been unable to confirm or
replicate this issue, and therefore can only recommend that users implement the
solution in the Formal Resolution section of this TechNote, or implement the
workaround provided in the Workaround section of this TechNote. If you have
questions, or have experienced this problem, please contact Symantec Enterprise
Technical Support.
What is Affected:
This issue was
reported on a Red Hat Linux Media Server using NetBackup 6.0
MP3.
Note:
- Although
this issue has only been observed specifically with duplications, there is a
possibility that this issue could be seen with other operations as well.
- As
this issue revolves around the file system and not NetBackup, it is possible
that this issue could be seen in other versions of NetBackup (including future
versions), although this has not been reported or
confirmed.
How to Determine if
Affected:
This data loss issue was reported when with ALL of the
following conditions were met:
- A
duplication to tape is performed.
- The
Media Server used for the duplication is a Red Hat Linux Media Server, running a
Red Hat GFS version prior to GFS 6.1 on the partition of which NetBackup is
installed. For example, if /usr/openv/netbackup/db/media
resides on a Red Hat GFS volume, the server can be affected.
- A
media used in the duplication reaches the End of Media
(EOM).
Determining if the file system is
GFS:
Run the following command from the Red Hat Linux Media
Server.
#
mount -v
/dev/sdb1
on /disk2 type ext3 (rw)
/dev/md2
on /nbu type gfs (rw)
/dev/md3
on /storage00 type gfs (rw)
/dev/md4
on /storage01 type gfs (rw)
In this case, /nbu would be where NetBackup is
installed and would indicate the problem may be present.
Additionally,
the following command can be run to display the version of GFS currently in
use:
#
rpm -qi GFS
If GFS is being run, it is recommended to observe
if NetBackup is utilizing the "fcntl lock" function of GFS by doing the
following:
1. Ensure that the
/usr/openv/netbackup/logs/bptm folder has been created and log data is
available.
2. Ensure that a duplication to
tape has been performed since the creation of the bptm log
folder.
3. From the bptm folder,
execute:
grep
"has lock" * | wc -l
4. Observe the output
number. If the output is 0, the chances are high that the file system does
not support the "fcntl lock" function and is therefore vulnerable to this
issue.
Below is an example of a "has
lock" occurrence in the bptm log that indicates usage of file locks. In
this case, it is likely that a new version of GFS is being used that does not
have the issue explained in this document:
log.MMDDYY:08:27:00.789
[2210] <2> drivename_checklock: PID 2095 has lock
Determining if existing backup copies have been
affected.
If a Red Hat Linux Media Server is or has been running
Red Hat GFS, It is important to determine if backups have been affected by this
issue in the past. Below are steps to identify and verify all duplicated
images to ensure they have not been overwritten. Perform the following on
the Media Server in question:
1. Determine the maximum backup
copies that are configured (for use in commands to follow):
/usr/openv/netbackup/bin/bpconfig
-U -M <master_server>
(Record the value on the
"Maximum Backup Copies:" line (the default setting is 2).
2. To list which images need to be verified, run
the following command:
/usr/openv/netbackup/bin/admincmd/bpduplicate
-PM -cn 2 -shost <media_server> 2>/dev/null
(Where media_server is the
hostname that has NetBackup installed on GFS).
The "2>/dev/null" will
suppress some of the informational output and will only print applicable
images.
The output will be similar to the
following:
MM/DD/YYYY
14:05:55 nbuclient01 full nbuclient01_XXXXXXXXXX nbumaster01 "/nbu/dsu" 2
0
MM/DD/YYYY
09:37:35 junk f-tape nbumaster01_XXXXXXXXXX nbumaster01 RBH576 2
0
The first line shows an image
which was written to disk. This image can be ignored. Notice that
the Media ID is a file path ""/nbu/dsu". This is a easy/quick
indication that the image is a disk image, not tape.
The second line indicates that
the backupid in question is nbumaster01_XXXXXXXXXX (column 5). This
is the information needed for the next step. Note that "XXXXXXXXXX"
will be an actual numeric value.
Continue running this command,
incrementing the -cn value up to the Maximum Backup Copies
value. For example, if maximum copies are set at "4", the following 3
commands would need to run, noting the output for each command. Only the
first command need be run if maximum copies are set at "2".
/usr/openv/netbackup/bin/admincmd/bpduplicate
-PM -cn 2 -shost <media_server> 2>/dev/null
/usr/openv/netbackup/bin/admincmd/bpduplicate
-PM -cn 3 -shost <media_server> 2>/dev/null
/usr/openv/netbackup/bin/admincmd/bpduplicate
-PM -cn 4 -shost <media_server> 2>/dev/null
3. Once all backupid/copy number pairs have
been identified by running the
bpduplicate command(s), a verify process
can be run for each image copy as follows:
/usr/openv/netbackup/bin/admincmd/bpverify
-backupid <backupid> -cn <#>
Example Command:
/usr/openv/netbackup/bin/admincmd/bpverify
-backupid nbumaster01_XXXXXXXXXX -cn 2 -L /bpverify.out
Continue to execute this
bpverify command for each backupid seen in the output of the
bpduplicate command(s) in Step 2.
If verification fails for any
images:
If bpverify fails for any
images, it is highly recommended that either a new copy of that image be
made, or that backups be run immediately for the clients whose image has failed
verification. However, the copy that failed verification should first be
expired using bpexpdate.
For example, if copy 2 of backup
id "nbumaster01_XXXXXXXXXX" fails verification, then expire this backup
with the bpexpdate command, as it is no
longer usable on tape:
/usr/openv/netbackup/bin/admincmd/bpexpdate
-backupid nbumaster01_XXXXXXXXXX -d 0 -copy 2
Then, make a duplicate of copy
1:
/usr/openv/netbackup/bin/admincmd/bpduplicate
-cn 1 -backupid nbumaster01_XXXXXXXXXX -dstunit Tape_04
For further information on the
duplication process and the options available, please refer to that NetBackup
System Administrator's guide, found below in the Related Documents section of
this TechNote.
198303 -- fcntl lock
doesn't work on local machine when in cluster mode
191222 -- read flock
broken on single-node
As the underlying cause does not reside in NetBackup,
fixes present in GFS 6.1 and future revisions of GFS cannot be fully confirmed
to resolve the symptoms present in NetBackup. If it is still necessary to
run GFS on the Red Hat Linux NetBackup server after understanding the contents
of this alert, update to the latest version of GFS and review "How to
Determine if Affected" section of this
document.
Workaround:
Currently, the only workaround
observed has been to utilize a file system other than Red Hat GFS on the volume
that the NetBackup Media Server is installed on.
Symantec strongly
recommends the following best practices:
1. Always perform a Full backup
prior to and after any changes to your environment.
2. Always make sure that
your environment is running the latest version and patch
level.