Document ID: 285487
http://support.veritas.com/docs/285487
E-Mail Colleague IconE-Mail this document to a colleague

A data loss issue has been reported by a customer with the following conditions: duplications to tape are performed by a Red Hat Linux Media Server running Red Hat GFS (Global File System) versions prior to GFS 6.1 and NetBackup 6.0 Maintenance Pack 3 (MP3) is being used.

Details:
Introduction:
A possibility for data loss has been discovered in Veritas NetBackup (tm) 6.0 Maintenance Pack 3 (MP3) when duplications to tape are performed by a Red Hat Linux Media Server that is running Red Hat Global File System (GFS) with a version prior to GFS 6.1.  When the End of Media (EOM) is reached during duplication, that media may be used again incorrectly and overwritten.  This is due to the fact that the "fcntl lock" function of the GFS does not perform as expected.  

This issue has only been reported by one customer, and only when a NetBackup Media Server is installed on a Red Hat GFS volume. Symantec has been unable to confirm or replicate this issue, and therefore can only recommend that users implement the solution in the Formal Resolution section of this TechNote, or implement the workaround provided in the Workaround section of this TechNote. If you have questions, or have experienced this problem, please contact Symantec Enterprise Technical Support.



What is Affected:
This issue was reported on a Red Hat Linux Media Server using NetBackup 6.0 MP3.

Note:


How to Determine if Affected:
This data loss issue was reported when with ALL of the following conditions were met:

Determining if the file system is GFS:
Run the following command from the Red Hat Linux Media Server.
# mount -v
/dev/sdb1 on /disk2 type ext3 (rw)
/dev/md2 on /nbu type gfs (rw)
/dev/md3 on /storage00 type gfs (rw)
/dev/md4 on /storage01 type gfs (rw)

In this case, /nbu would be where NetBackup is installed and would indicate the problem may be present.

Additionally, the following command can be run to display the version of GFS currently in use:
# rpm -qi GFS


If GFS is being run, it is recommended to observe if NetBackup is utilizing the "fcntl lock" function of GFS by doing the following:
1. Ensure that the /usr/openv/netbackup/logs/bptm folder has been created and log data is available.
2. Ensure that a duplication to tape has been performed since the creation of the bptm log folder.
3. From the bptm folder, execute:
grep "has lock" * | wc -l
4. Observe the output number.  If the output is 0, the chances are high that the file system does not support the "fcntl lock" function and is therefore vulnerable to this issue.  

Below is an example of a "has lock" occurrence in the bptm log that indicates usage of file locks. In this case, it is likely that a new version of GFS is being used that does not have the issue explained in this document:
log.MMDDYY:08:27:00.789 [2210] <2> drivename_checklock: PID 2095 has lock


Determining if existing backup copies have been affected.
If a Red Hat Linux Media Server is or has been running Red Hat GFS, It is important to determine if backups have been affected by this issue in the past.  Below are steps to identify and verify all duplicated images to ensure they have not been overwritten.  Perform the following on the Media Server in question:

1.  Determine the maximum backup copies that are configured (for use in commands to follow):

/usr/openv/netbackup/bin/bpconfig -U -M <master_server>
(Record the value on the "Maximum Backup Copies:" line (the default setting is 2).

2.  To list which images need to be verified, run the following command:

/usr/openv/netbackup/bin/admincmd/bpduplicate -PM -cn 2 -shost <media_server> 2>/dev/null
(Where media_server is the hostname that has NetBackup installed on GFS).
The "2>/dev/null" will suppress some of the informational output and will only print applicable images.

    The output will be similar to the following:

MM/DD/YYYY 14:05:55 nbuclient01 full nbuclient01_XXXXXXXXXX nbumaster01 "/nbu/dsu" 2 0
MM/DD/YYYY 09:37:35 junk f-tape nbumaster01_XXXXXXXXXX nbumaster01 RBH576 2 0

The first line shows an image which was written to disk.  This image can be ignored.  Notice that the Media ID is a file path ""/nbu/dsu".  This is a easy/quick indication that the image is a disk image, not tape.
The second line indicates that the backupid in question is nbumaster01_XXXXXXXXXX (column 5).  This is the information needed for the next step.  Note that "XXXXXXXXXX" will be an actual numeric value.

    Continue running this command, incrementing the -cn value up to the Maximum Backup Copies value.  For example, if maximum copies are set at "4", the following 3 commands would need to run, noting the output for each command.  Only the first command need be run if maximum copies are set at "2".

/usr/openv/netbackup/bin/admincmd/bpduplicate -PM -cn 2 -shost <media_server> 2>/dev/null
/usr/openv/netbackup/bin/admincmd/bpduplicate -PM -cn 3 -shost <media_server> 2>/dev/null
/usr/openv/netbackup/bin/admincmd/bpduplicate -PM -cn 4 -shost <media_server> 2>/dev/null


3.  Once all backupid/copy number pairs have been identified by running the bpduplicate command(s), a verify process can be run for each image copy as follows:
/usr/openv/netbackup/bin/admincmd/bpverify -backupid <backupid> -cn <#>

Example Command:
/usr/openv/netbackup/bin/admincmd/bpverify -backupid nbumaster01_XXXXXXXXXX -cn 2 -L /bpverify.out

Continue to execute this bpverify command for each backupid seen in the output of the bpduplicate command(s) in Step 2.

If verification fails for any images:
If bpverify fails for any images, it is highly recommended that either a new copy of that image be made, or that backups be run immediately for the clients whose image has failed verification.  However, the copy that failed verification should first be expired using bpexpdate.

For example, if copy 2 of backup id "nbumaster01_XXXXXXXXXX" fails verification, then expire this backup with the bpexpdate command, as it is no longer usable on tape:
/usr/openv/netbackup/bin/admincmd/bpexpdate -backupid nbumaster01_XXXXXXXXXX -d 0 -copy 2

Then, make a duplicate of copy 1:
/usr/openv/netbackup/bin/admincmd/bpduplicate -cn 1 -backupid nbumaster01_XXXXXXXXXX -dstunit Tape_04


For further information on the duplication process and the options available, please refer to that NetBackup System Administrator's guide, found below in the Related Documents section of this TechNote.


Formal Resolution:
Red Hat acknowledges fcntl lock issues that are now fixed in GFS 6.1.  This can be observed from the release notes for GFS 6.1 on Red Hat's site:
 http://www.redhat.com/docs/manuals/csgfs/release-notes/GFS_6_1-RHEL4U4-relnotes.txt
198303 -- fcntl lock doesn't work on local machine when in cluster mode
191222 -- read flock broken on single-node

As the underlying cause does not reside in NetBackup, fixes present in GFS 6.1 and future revisions of GFS cannot be fully confirmed to resolve the symptoms present in NetBackup.  If it is still necessary to run GFS on the Red Hat Linux NetBackup server after understanding the contents of this alert, update to the latest version of GFS and review "How to Determine if Affected" section of this document.


Workaround:
Currently, the only workaround observed has been to utilize a file system other than Red Hat GFS on the volume that the NetBackup Media Server is installed on.


Symantec strongly recommends the following best practices:
1. Always perform a Full backup prior to and after any changes to your environment.
2. Always make sure that your environment is running the latest version and patch level.



Acknowledgements
Red Hat

Products Applied:
 NetBackup Enterprise Server 6.0 MP3

Last Updated: May 07 2007 06:57 PM GMT
Expires on: 365 days from publish date
Subscribe Via E-Mail IconSubscribe to receive critical updates about this document

Subjects:
 NetBackup Enterprise Server
   Application: Backup

Languages:
 English (US)

Operating Systems:
Linux

RHEL 3.0 (AS), RHEL 3.0 (ES)