Asset ID: |
1-71-1371222.1 |
Update Date: | 2012-04-30 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1371222.1
:
Tape - How To Diagnose a Fibre Channel Tape Drive Configuration Issue on Solaris 10
Related Items |
- StorageTek T10000C Tape Drive
- Sun StorageTek 9840D Tape Drive
- Sun StorageTek T10000A Tape Drive
- LTO Tape Drive
- Sun StorageTek 9940A Tape Drive
- Sun StorageTek T10000B Tape Drive
- Sun StorageTek 9840B Tape Drive
- Sun StorageTek 9940B Tape Drive
- Sun StorageTek 9840C Tape Drive
|
Related Categories |
- PLA-Support>Sun Systems>TAPE>Tape Hardware>SN-TP: STK T-Series Drive
|
In this Document
Applies to:
StorageTek T10000C Tape Drive - Version Not Applicable and later
Sun StorageTek 9840B Tape Drive - Version Not Applicable and later
Sun StorageTek 9840C Tape Drive - Version Not Applicable and later
Sun StorageTek 9840D Tape Drive - Version Not Applicable and later
Sun StorageTek 9940A Tape Drive - Version Not Applicable and later
Information in this document applies to any platform.
Goal
How To diagnose a Fibre Channel Tape Drive Configuration issue on Solaris 10.
Fix
1. Verify that the drive is visible from the Solaris OS Run these cfgadm commands to show what devices
are connected to the system.
..................................................................................................
For example:
# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c2 fc-fabric connected configured unknown
c2::500104f000795ed9 tape connected configured unusable
c2::500104f0009e929e med-changer connected configured unusable
c2::500104f0009e92a8 tape connected configured unusable
c2::50060b00006287e0 tape connected configured unusable
c3 fc connected unconfigured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb0/4 unknown empty unconfigured ok
..................................................................................................
Drives are recognized as tape devices and media changers are shown as med-changer.
..................................................................................................
For example:
# cfgadm -alv
Ap_Id Receptacle Occupant Condition Information
When Type Busy Phys_Id
c2 connected configured unknown
unavailable fc-fabric n /devices/pci@3,0/pci1077,101@2/fp@0,0:fc
c2::500104f000795ed9 connected configured unusable IBM ULTRIUM-TD2
unavailable tape n /devices/pci@3,0/pci1077,101@2/fp@0,0:fc::500104f000795ed9
c2::500104f0009e929e connected configured unusable STK SL500
unavailable med-changer n /devices/pci@3,0/pci1077,101@2/fp@0,0:fc::500104f0009e929e
c2::500104f0009e92a8 connected configured unusable HP Ultrium 3-SCSI
unavailable tape n /devices/pci@3,0/pci1077,101@2/fp@0,0:fc::500104f0009e92a8
c2::50060b00006287e0 connected configured unusable HP Ultrium 3-SCSI
unavailable tape n /devices/pci@3,0/pci1077,101@2/fp@0,0:fc::50060b00006287e0
c3 connected unconfigured unknown
unavailable fc n /devices/pci@3,0/pci1077,101@2,1/fp@0,0:fc
usb0/1 empty unconfigured ok
unavailable unknown n /devices/pci@0,0/pci15d9,5544@f,2:1
usb0/2 empty unconfigured ok
unavailable unknown n /devices/pci@0,0/pci15d9,5544@f,2:2
usb0/3 empty unconfigured ok
unavailable unknown n /devices/pci@0,0/pci15d9,5544@f,2:3
usb0/4 empty unconfigured ok
unavailable unknown n /devices/pci@0,0/pci15d9,5544@f,2:4
..................................................................................................
For displaying devices and their lun's we should use -o show_FCP_dev option on cfgadm, the lun is shown after the WWN as ,0. In the case of bridged libraries they will show as lun 1 on the same wwn as the drive host (SL48, SL24 and Bridged SL500).
..................................................................................................
For example:
# cfgadm -o show_FCP_dev -al
Ap_Id Type Receptacle Occupant Condition
c2 fc-fabric connected configured unknown
c2::500104f000795ed9,0 tape connected configured unusable
c2::500104f0009e929e,0 med-changer connected configured unusable
c2::500104f0009e92a8,0 tape connected configured unusable
c2::50060b00006287e0,0 tape connected configured unusable
c3 fc connected unconfigured unknown
..................................................................................................
The Correct Status of the Drives and media changer's should be connected configured and unknown.
To check what the HBA is seen connected to her you will need to run to luxadm -e dump_map /dev/cfg/c#
this will show all the devices that are visible to the HBA.
luxadm -e dump_map <device path | /dev/cfg/c2>
..................................................................................................
For example:
# luxadm -e dump_map /dev/cfg/c2
Pos Port_ID Hard_Addr Port WWN Node WWN Type
0 10300 0 50060b00006287e0 50060b00006287e2 0x1 (Tape device)
1 10100 0 210100e08b3fbbe6 200100e08b3fbbe6 0x1f (Unknown Type)
2 10400 0 500104f0009e929e 500104f0009e929d 0x8 (Medium changer device)
3 10826 0 500104f000795ed9 500104f000795ed8 0x1 (Tape device)
4 10000 0 210000e08b138e17 200000e08b138e17 0x1f (Unknown Type,Host Bus Adapter)
-----------------------------------------------------------------------------------------------------------------------------------
2. If Drives and media changers are not correctly configured.
Note A.
a. Device's must be seen as "connected/configured/unknown".
b. If drive or media changer on incoirect state they can not be used, some reasons this happend's are:
- The device is no longer connected to the system.
- The device is not enabled.
- The device is in an unusable state (hung-frozen-not communicating).
- Zoning issues.
- Patch Solaris levels, refer to Doc ID 1018748.1 as reference.
- HBA firmware or HW issues.
Use the command devfsadm and cfgadm commands to verify the existing devices and remove old device controllers that are no longer usable.
Note 1:
devfadm will delete all controlers for drive's in folder "/dev/rmt/" that are not any more in configure state, it will delete also the media changers on "/dev/scsi/changer/". With "v" option it will display in the screen the changes that are done by devfsadm.
Note 2:
If devfsadm doesn't pull in and configure the devices, a reconfigure reboot may need to be done on the system.
Note 3:
The command devfsadm will update the controllers so if some old controller was been used by a backup application and an update is run on that controler the backup aplciation could not be able to use the controller any more as it has detected changes, a reconfig of the backup application may be needed.
..................................................................................................
For example:
# devfsadm -Cv
devfsadm[5549]: verbose: removing file: /dev/rmt/1mn
devfsadm[5549]: verbose: removing file: /dev/rmt/0ubn
devfsadm[5549]: verbose: removing file: /dev/rmt/0h
devfsadm[5549]: verbose: removing file: /dev/rmt/0un
devfsadm[5549]: verbose: removing file: /dev/scsi/changer/c2t500104F0009E929Ed0
..................................................................................................
To make sure that a device is usable we can manually disable the device, manually reconfigure and reload the drivers.
To un-configure a device specificly we can use cfgadm.
# cfgadm -c unconfigure c2::500110a0008c35b
You can also use cfgadm to reconfigure a device:
# cfgadm -c configure c2::500110a0008c35b
Note 1:
If the configuration fails it could be because several reason's as described on Note A.
Note 2:
During a "cfgadm -c configure" for an FC attachment point, the following occurs:
1. The device nodes are created.
2. The device is made available to system.
3. Device information is stored into mapfile.
Note:
If cfgadm configuration failed in writing to fabric_WWN_mapfile was skipped, you may use '-o force_update' option to override the default behavior.
This option will store the device information to the mapfile regardless of the error encountered. Errors can be seen specifically because of patch level's issues on SOLARIS.
-----------------------------------------------------------------------------------------------------------------------------------
3. To make sure that the device Drives symbolic link's exist and are assigned correctly.
a. The backup application uses symbolic device links to access the tape device controllers. These are
stored in the directory, "/dev/rmt".
To display all the device tape drives symbolic link's to the controllers:
..................................................................................................
For example:
# ls /dev/rmt
0 0bn 0cb 0cn 0hb 0hn 0lb 0ln 0mb 0mn 0u 0ubn 1 1bn 1cb 1cn 1hb 1hn 1lb 1ln 1mb 1mn 1u 1ubn
0b 0c 0cbn 0h 0hbn 0l 0lbn 0m 0mbn 0n 0ub 0un 1b 1c 1cbn 1h 1hbn 1l 1lbn 1m 1mbn 1n 1ub 1un
..................................................................................................
To display the device real link controller and the WWN of the actual device we use:
..................................................................................................
For example:
# ls -l /dev/rmt
total 96
lrwxrwxrwx 1 root root 68 Oct 12 15:48 0 -> ../../devices/pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f000795ed9,0:
lrwxrwxrwx 1 root root 69 Oct 12 15:48 0b -> ../../devices/pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f000795ed9,0:b
lrwxrwxrwx 1 root root 70 Oct 12 15:48 0bn -> ../../devices/pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f000795ed9,0:bn
lrwxrwxrwx 1 root root 70 Oct 12 15:48 1un -> ../../devices/pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f0009e92a8,0:un
..................................................................................................
NOTE:
a. fp is the device controller of the device.
b. The 101@2 points to PCI bus 2 on the host.
c. Make sure the backup applications are using existing and usable symbolic links.
-----------------------------------------------------------------------------------------------------------------------------------
4. To clean Solaris of old, disconnected, un-configured or failed devices:
a. Devices in this directory may be old and no longer existent. Run 'devfsadm -Cv' to verify the existing
devices and remove the ones no longer connected.
b. Make sure that the backup application is configured to use valid /dev/rmt device files.
c. Make sure that the symbolic links are assigned to the correct device.
Sometimes when a drive is discovered it will be asigned a new symbolic link, different than the one that it had before. This can be worth if the patch level on Solaris are not up to date.
Reference knowledge articles:
1009912.1 - How do I change rmt-dev-number (/dev/rmt/X)to what I want
1009374.1 - Persistent bind - Creating consistent tape links across SAN
With luxadm disp WWN/ | /dev/rmt/# we can display the information of the different tape drives and match that info with backup application information.
..................................................................................................
For example:
# luxadm display /dev/rmt/1n
DEVICE PROPERTIES for tape: /dev/rmt/1n
Vendor: IBM
Product ID: ULTRIUM-TD2
Revision: 73V1
Serial Num: 1110163795
Device Type: Tape device
Path(s):
/dev/rmt/1n
/devices/pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f000795ed9,0:n
Controller /dev/cfg/c2
Device Address 500104f000795ed9,0
Host controller port WWN 210000e08b138e17
..................................................................................................
To verify the devices tree on Solaris we can use.
..................................................................................................
For example:
# prtpicl -v | grep tape
tape (tape, 108700000675)
:devfs-path /pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f000795ed9,0
:_class tape
:name tape
tape (tape, 108700000693)
:devfs-path /pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f0009e92a8,0
:_class tape
:name tape
# prtpicl -v | grep changer
medium-changer (changer, 10870000065a)
:devfs-path /pci@3,0/pci1077,101@2/fp@0,0/medium-changer@w500104f0009e929e,0
:_class changer
:name medium-changer
..................................................................................................
Multi-path has been a known issue on tape drives and tape libraries as at this point is not supported by hardware.
Run 'iostat' to verify if phantom drives exist.
For example:
# iostat -En
rmt/1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: IBM Product: ULTRIUM-TD2 Revision: 73V1
Serial No:n
st6 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: HP Product: Ultrium 3-SCSI Revision: M6BS
Serial No:
rmt/2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: HP Product: Ultrium 3-SCSI Revision: M14S
Serial No:
From the above example, st6 is not a symbolic link. It is probably a phantom drive with a controller that is no longer connected or is no longer configured.
c. Make sure that the backup application's tape devices are correctly mapped to existing /dev/rmt/X devices. Otherwise those will be unusable for backup jobs.
To diagnose improperly mapped drives, identify an empty drive and mount a scratch tape in the drive. From the the drive host, issue the command:
# mt -f /dev/rmt/X status
For example:
# mt -f /dev/rmt/1 status
IBM Ultrium Gen 2 LTO tape drive:
sense key(0x0)= No Additional Sense residual= 0 retries= 0
file no= 0 block no= 0"
NOTE:
This indicates that a tape is mounted in the drive.
# mt -f /dev/rmt/1 status
/dev/rmt/1: no tape loaded or drive offline"
NOTE:
This indicates that the tape may have been mounted to another drive and not on the drive that is linked to
/dev/rmt/1. Or, the drive may be offline. Make sure that the drive is online and available before mounting the tape.
To show the actual devises that are connected to the different HBA you can display the Tree node.
# prtpicl -v ->Shows the tree node of connected devices, Tape Drive should show there if not think of SAN Zoning and Drive Connectivity issues.
4. Make sure that the tape loads and unloads successfully to and from
the drive.
a. Mount the tape either from the application or from the library management system (i.e., ACSLS or LibStation).
b. Check the drive status from the device host. For example:
# mt -f /dev/rmt/1 status
IBM Ultrium Gen 2 LTO tape drive:
sense key(0x0)= No Additional Sense residual= 0 retries= 0
file no= 0 block no= 0
NOTE: An mt status error indicates that the tape did not mount or it did not load successfully.
c. If load is successful, you may verify if the tape has data on it or if it is a scratch tape.
# tar tvf /dev/rmt/1
With
ls -l /dev/cfg display the maps for c#
EXAMPLE
*******************************************************************************************************
# ls -l /dev/cfg
total 5
lrwxrwxrwx 1 root root 45 May 6 14:39 c2 -> ../../devices/pci@3,0/pci1077,101@2/fp@0,0:fc
lrwxrwxrwx 1 root root 47 May 6 14:39 c3 -> ../../devices/pci@3,0/pci1077,101@2,1/fp@0,0:fc
*******************************************************************************************************
To see what is connected to the device we use luxadm -e dump_map /dev/cfg/c#
luxadm -e dump_map <device path | /dev/cfg/c2>
EXAMPLE
*******************************************************************************************************
# luxadm -e dump_map /dev/cfg/c2
Pos Port_ID Hard_Addr Port WWN Node WWN Type
0 10300 0 50060b00006287e0 50060b00006287e2 0x1 (Tape device)
1 10100 0 210100e08b3fbbe6 200100e08b3fbbe6 0x1f (Unknown Type)
2 10400 0 500104f0009e929e 500104f0009e929d 0x8 (Medium changer device)
3 10826 0 500104f000795ed9 500104f000795ed8 0x1 (Tape device)
4 10000 0 210000e08b138e17 200000e08b138e17 0x1f (Unknown Type,Host Bus Adapter)
*******************************************************************************************************
To Display the Errors of communication to a specific controller use
luxadm -e rdls <device path | dev/cfg/c# >
EXAMPLE
*******************************************************************************************************
luxadm -e rdls /devices/pci@3,0/pci1077,101@2/fp@0,0:fc
Link Error Status information for loop:/devices/pci@3,0/pci1077,101@2/fp@0,0:fc
al_pa lnk fail sync loss signal loss sequence err invalid word CRC
10300 1 153 1 0 0 0
10100 0 1 1 0 0 0
10400 0 1 1 0 0 0
10826 2 8369 1 0 38143 0
10000 0 1 1 0 0 0
NOTE:
These LESB counts are not cleared by a reset, only power cycles.
These counts must be compared to previously read counts.
*******************************************************************************************************
If this number goes up while doing backups a lot, it will indicate some kind of issue on HBA, switch, Fiber connection, Device Communication issues (Dion card, Drive FC port Interface).
If all devices in HBA are having issues, it is more probable something in common is having the issue.
NOTE:
Note how al_pa devices relate to luxadm -e dump_ so we can find what devices are having the issues.
With luxadm display WWN/ | /dev/rmt/# we can display the information of the different tape drives
EXAMPLE
*******************************************************************************************************
# luxadm disp /dev/rmt/1n
DEVICE PROPERTIES for tape: /dev/rmt/1n
Vendor: IBM
Product ID: ULTRIUM-TD2
Revision: 73V1
Serial Num: 1110163795
Device Type: Tape device
Path(s):
/dev/rmt/1n
/devices/pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f000795ed9,0:n
Controller /dev/cfg/c2
Device Address 500104f000795ed9,0
Host controller port WWN 210000e08b138e17
*******************************************************************************************************
With this, we can confirm that the Drives that are been seen by Solaris are the same that are configured on the BACKUP Application SN° information code lvl path etc.
--------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------
Running a Backup directly from Solaris has two important consequences:
First see directly how errors as shown before they are been generated on the Fly.
Test the Drives directly to discard a backup application issue.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Note:
The customer must be aware that this will overwrite any data they currently have on that tape. So they should either use a blank tape or a tape that can be overwritten.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
On the other prompt we run the backup to the tape with.
Note: is always a good idea to run a backup big enough so the transfer rates can be high enough and so the drive can ramp up to nominal writing speed.
To see the issues as they are been generated on the fly, we need two control prompts on Solaris.
In the first one, we will display the status of the drives, on the other we will send the commands to backup.
In the first prompt we will use continual iostat to
To run some test we need to command prompts open to the Solaris.
In one of the prompts we use.
iostat -xend 1 to display every 1 second the transfer rates to the devices and the soft hard errors received.
we look at /rmt/X that we want to check.
EXAMPLE
*******************************************************************************************************
#iostat -xend 1
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 fd0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c1t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 rmt/1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 st6
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 1 0 0 1 rmt/2
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 cl-lib04:vold(pid617)
*******************************************************************************************************
Note.
Soft Errors can be: Bad commands, Firmware issues, Patch issues, etc, in general recovery issues.
Hard Errors are more related failures on the command while it was run.
Transport errors are related to issues with Fiber and HBA protocol.
It is very important to note two things here:
-First the s/w h/w trn, are respectively Soft Errors, Hard Errors and Transport Errors, refer to previous Note.
-Second when the backup is been done to the Tape the w/s and kw/s will show the transfer rate of the device if you use a big file, enough so the drive can get to full writing speed and the transfers are to slow, then we can be in front of a HBA/FC issue, we could also see transport errors go up.
To send the backup to tape we use tar.
#tar cvpf /dev/rmt/1 file_to_backup_or_directory
EXAMPLE
*******************************************************************************************************
#tar cvpf /dev/rmt/1n p12637445_62_Solarisx86.zip
a p12637445_62_Solarisx86.zip 16364 tape blocks
*******************************************************************************************************
If you don't have a big enough file, you can use mkfile .
You will see on the iostat, the errors, the speed of transfer, and hard errors, soft errors, and transport errors.
This will help you diagnose issues with connectivity if transfer rates are slow, hardware drives & media errors, or even software issues.
After The backup have been done. On the prompt that you ran the tar command, you will see how the files are being loaded to the tape and will come back to prompt alone.
Note:
You can stop the backup with Ctr-C
EXAMPLE
*******************************************************************************************************
#mkfile 1000M t
#
*******************************************************************************************************
We have now created a 1000M file to send to tape.
The last step of the Test is run a Recovery from the tape to the system of a file and compare to the original one. You have to make sure that the file in your Solaris has not been change so test is good.
To recover a file we first display the files that are on the tape with "#tar tvf /dev/rmt/Xxx"
EXAMPLE
*******************************************************************************************************
# tar tvf /dev/rmt/1n
-rw-r--r-- 0/0 3187427 Sep 13 13:48 2011 p12637445_62_Solarisx86.zip
*******************************************************************************************************
Note:
Every time you display the files the tape will automatically move to the next file system on the tape.
Use mt to rewind and move over the tape files.
To display the current state and the current file system that is accessed on the tape use "file no= 0" shows the current pointer to the accessed information.
EXAMPLE
*******************************************************************************************************
"# mt -f /dev/rmt/1n status
IBM Ultrium Gen 2 LTO tape drive:
sense key(0x0)= No Additional Sense residual= 0 retries= 0
file no= 0 block no= 0"
*******************************************************************************************************
To move to next file archived on the tape use "mt -f /dev/rmt/Xxx fsf"
EXAMPLE
*******************************************************************************************************
mt -f /dev/rmt/1n fsf -> This comand will move the pointer to the next file system on the tape.
*******************************************************************************************************
To rewind Tape to the firt File use. "mt -f /dev/rmt/Xxx rewind"
EXAMPLE
*******************************************************************************************************
mt -f /dev/rmt/1n rewind
*******************************************************************************************************
To retrive the file we can use "tar xvf"
EXAMPLE
*******************************************************************************************************
#cd /tmp
# tar -xvf /dev/rmt/1n
x p12637445_62_Solarisx86.zip, 8378333 bytes, 16364 tape blocks
# ls
hsperfdata_noaccess hsperfdata_root ogl_select292 p12637445_62_Solarisx86.zip
#
*******************************************************************************************************
Related Knowledges
Tape Drives - Identifying and testing a tape drive (Doc ID 1007435.1)
Attachments
This solution has no attachment