Troubleshooting hardware with Backup Exec for Windows Servers using the SCSI Trace Utility (tracer.exe).
Details:
Introduction
Beginning with version 11d,
Backup Exec for Windows Servers comes with a SCSI Trace utility. This
utility can be used to troubleshoot suspected tape hardware issues in the Backup
Exec environment.
Tracer works
by performing a low-level SCSI trace of the SCSI bus on a server and records the
SCSI commands sent to and from all of the devices on the SCSI bus. The SCSI
commands captured by tracer.exe are an industry standard for all SCSI
devices. The information gathered by the SCSI trace utility can be used to
narrow down the cause of a particular problem and determine whether or not there
is a hardware fault.
Tracer.exe
is located within the installation directory of the Backup Exec application (by
default C:\Program Files\Symantec\Backup Exec\). To begin capturing SCSI data,
launch tracer.exe and click on the green capture button (Figure 1):
Figure
1:
With
tracer.exe running, perform the steps necessary to reproduce your error.
NOTE:
Due to the inherit nature of SCSI and the high number of commands sent to and
from SCSI-based devices, tracer logs can become quite large rather quickly. For
that reason, it is best to try to reproduce the problem in as short an operation
as possible to minimize the size of the SCSI trace.
Understanding
how to read the events captured by tracer.exe
Tracer logs
SCSI commands sequentially. Each event is logged containing the SCSI Target,
SCSI command type, and SCSI driver result.
Example
of a successful SCSI command (Figure 2):
Figure
2:
The above command is an Inquiry command, which responded with a good
SCSI status and a successful driver result.
Example
of an unsuccessful SCSI command (Figure 3):
Figure
3:
The above command is a Test Unit Ready command, which is a SCSI command that
queries the device to see if the device is ready for read and write operations.
In this case, there is a SCSI reservation conflict preventing such an operation.
The SCSI Status also indicates that there is an IO Device Error. The cause of
this conflict would on the hardware level.
SCSI
commandsThis is list of the most common SCSI commands used by
Backup Exec when communicating with tape drives and libraries (Figure
4:):
Figure
4: | CDB | COMMAND | Description |
|---|
| 00h | TEST UNIT READY | Queries device to see if it is ready for data transfers. |
| 01h | REWIND | Rewinds the medium. |
| 03h | REQUEST SENSE | Requests that the device transfer sense data to the host. |
| 05h | READ BLOCK LIMITS | Reports the maximum block length limit. |
| 07h | INITIALIZE ELEMENT STATUS | Forces an inventory operation. |
| 08h | READ | Reads the medium. |
| 0Ah | WRITE | Writes to the medium. |
| 0Ch | ROTATE MAILSLOT COMMAND | Opens or closes the mailslot. |
| 10h | WRITE FILEMARKS | Writes filesmarks, such as end of data, onto medium. |
| 11h | SPACE | Provides a variety of positioning functions. |
| 12h | INQUIRY | Returns basic device information and inquiry data. |
| 15h | MODE SELECT(6) | Sets device parameters in a mode page. |
| 16h | RESERVE UNIT | Reserves the unit. |
| 17h | RELEASE UNIT | Releases the unit. |
| 19h | ERASE | Erases the medium. |
| 1Ah | MODE SENSE(6) | Returns current device parameters from mode pages. |
| 1Bh | LOAD UNLOAD | Tells the target to load or unload the media in the tape cartridge. |
| 1Eh | PREVENT ALLOW MEDIUM REMOVAL | Enables or disables the unloading of the tape cartridge. |
| 2Bh | LOCATE (Seek to a position) | Uses the identifier from a READ POSITION to position back to this same logical position. |
| 34h | READ POSITION | Read a position identifier, or SCSI Logical Block Address. |
| 3Bh | WRITE BUFFER | Diagnostic function for testing the device data buffer, DMA engine, SCSI bus interface hardware, and SCSI bus integrity. |
| 3Ch | READ BUFFER | Diagnostic function for testing the device data buffer, DMA engine, SCSI bus interface hardware, and SCSI bus integrity. |
| 4Ch | LOG SELECT | Allows the host to manage statistical information maintained by the device about its own hardware or the installed media. |
| 4Dh | LOG SENSE | Allows the initiator to modify and initialize parameters within the logs supported by the device. |
| 55h | MODE SELECT(10) | Sets device parameters in a mode page. |
| 5Ah | MODE SENSE(10) | Returns current device parameters from mode pages. |
| A5h | MOVE MEDIUM | Used to move cartridges from the tape drive to the library. |
| A6h | EXCHANGE MEDIUM | Used to move cartridges from the tape drive to the library. |
| B8h | READ ELEMENT STATUS | Returns the status tables of its elements to the initiator. |
| | | |
Errors and Check ConditionsCheck conditions are
errors that occur when a SCSI command completes successfully, but returns an
error. Detailed information regarding the error is contained within the response
from the SCSI within a field known as the Sense Data. These events are marked as
a 'C #####' under the 'Check' column in tracer.
Error responses
occur when SCSI commands do not complete their intended operation due to an
error. Like Check Conditions, these errors normally contain additional sense key
data that contains information regarding the condition that caused the failure.
These errors are marked as an 'E' under the 'Check' column in
tracer.
NOTE:
It is important to note that not all check conditions and SCSI errors occur
due to bad or faulty hardware, and that some errors and check conditions will
occur during normal tape operations. All hardware errors, however, will be
reported as check conditions within tracer.
The
following is an example of an expected check condition (Figure 5):
Figure
5:
The
driver result from the above SCSI command was
STATUS_IO_DEVICE_ERROR. The
Additional Sense Code (
ASC) indicates that the drive is not ready due the
condition
MEDIUM_NOT_PRESENT, which means there is not a tape in the
drive. While this is considered a SCSI 'error,' it does not constitute any fault
with the hardware.
The following is an example of a check condition
that occurred due to faulty hardware (Figure 6): Figure
6:
The Test Unit Ready command responded with a Sense Key of
UNIT_ATTENTION with the Additional Sense Code of
POWER_ON_RESET_OR_BUS_DEVICE_RESET_OCCURRED. This occurred due to
the SCSI bus being reset due to a hardware failure before or during the Test
Unit Ready command.
The
following is an example of a SCSI command that resulted in an error (Figure
7):Figure
7:
The Test Unit Ready command did not complete, and the
status was STATUS_DEVICE_NOT_CONNECTED. This occurred after a
device was disconnected. Since the device was not connected, there was no Sense
Data returned.
Filtering
Tracer EventsTracer can be filtered to show you only events from a
certain device, a certain command, or commands that resulted in a check
condition. To enable filters, go to Tools > Filters (Figure
8):
Figure
8:
For
Example, to view all errors and check conditions, select the box 'Command
resulted in check condition.' Furthermore, if you have multiple devices
connected to your server, you can filter just the device you are troubleshooting
by selecting 'Event is from one of these targets.'
Detection
issues:
To use
tracer.exe to troubleshoot a hardware detection issue, first have tracer display
the SCSI discovery and verify that all of the required information is being
presented to the operating system properly by clicking on Tools > Display
Discovery Data (Figure 9):
Figure
9:
The above
drive is an Archive Python 06408, connected on SCSI Port 6, SCSI BUS 0, SCSI ID
6, LUN 0, with Firmware Version 9100, and a serial number of HN0D594. All of
this information should be present with a properly configured and operating
drive.
The
information should also be present in the DEVICEMAP within the SCSI registry.
For the above example, that would be:
HKEY_LOCAL_MACHINE\HARDWARE\DEVICEMAP\Scsi\Scsi
Port 6\Scsi Bus 0\Target Id 6\Logical Unit Id 0\
NOTE:
You should not, under any circumstances, edit the registry settings under
DEVICEMAP. These keys should be automatically populated if the hardware is
configured and functioning properly.
If any of
the above information is incomplete or missing, or if the device is shown
multiple times, then perform a power cycle of the SCSI hardware and server, and
then display the discover data again. If the data is still incomplete after
performing those steps, consult your hardware documentation and verify that the
hardware is connected properly to the server. If the connected properly, contact
your hardware vendor for support.
If your
devices are properly shown during Discovery but not appearing in the Backup Exec
GUI, stop the Backup Exec services. Launch tracer.exe and begin capturing data,
then start the Backup Exec services.
Whenever the
Backup Exec services are started, Backup Exec will issue Inquiry, Reserve,
Release, and Test Unit Ready commands to all of the SCSI hardware attached to
the SCSI bus. These commands will respond successfully on properly functioning
and configured hardware.
Read
or Write Errors:
Read and
write errors can be difficult to log with tracer unless the error can be easily
reproduced in a short amount of time. Reason being is that a higher number of
SCSI commands are issued whenever performing any basic function, especially when
performing read or writes. The high number of commands will eventually cause
tracer to run out of virtual memory, which can result in the tracer application
hanging.
NOTE:
In Backup Exec version 12.0 and higher, SGMON.EXE can be configured to capture
tracer data. Please see related documents for using SGMON.EXE.
On a
properly functioning SCSI drive, there should be very few check conditions and
no errors reported on a Read or Write command. An error received on a Read or
Write command is in almost all cases due to failing hardware or faulty media.
Example
of a Read Error (Figure 10):
Figure
10:
Example
of a Write Error (Figure 11):
Figure
11:
If after
performing a trace using tracer, you are unsure how to read the results, save
the tracer file in a BIN format and then contact Symantec Technical
Support.
Alternatively,
you can export the tracer file as a text file and provide the data to your
hardware vendor for further clarification.