Asset ID: |
1-75-1388897.1 |
Update Date: | 2012-10-09 |
Keywords: | |
Solution Type
Troubleshooting Sure
Solution
1388897.1
:
Troubleshooting Sun Storage[TM] 2500 and 6000 Array Drive Tray Lost Redundancy Events
Related Items |
- Sun Storage 6580 Array
- Sun Storage 6180 Array
- Sun Storage 6780 Array
- Sun Storage 2540-M2 Array
- Sun Storage 2540 Array
- Sun Storage 2510 Array
- Sun Storage 6140 Array
- Sun Storage 2530-M2 Array
- Sun Storage 2530 Array
- Sun Storage 6540 Array
- Sun Storage 6130 Array
|
Related Categories |
- PLA-Support>Sun Systems>DISK>Arrays>SN-DK: 6140_6180
- .Old GCS Categories>Sun Microsystems>Storage - Disk>Modular Disk - 6xxx Arrays
|
In this Document
Applies to:
Sun Storage 6540 Array - Version Not Applicable and later
Sun Storage 6580 Array - Version Not Applicable and later
Sun Storage 6780 Array - Version Not Applicable and later
Sun Storage 6130 Array - Version Not Applicable and later
Sun Storage 2530 Array - Version Not Applicable and later
Information in this document applies to any platform.
Purpose
The purpose of this document is to help troubleshoot Drive/Drive-Tray Lost Redundancy events for Sun Storage[TM] 2500 and 6000 Arrays.
Symptoms include:
- Critical Fault for Drive <Tray.xx.Drive.xx> lost redundancy (xx.66.1032) or REC_LOST_REDUNDANCY_DRIVE
- Critical Fault for Enclosure tray <Tray.xx> lost redundancy (xx.66.1033) or REC_LOST_REDUNDANCY_TRAY
- Critical Fault for Lost communication with <Tray.xx.IOM.x> (xx.66.1034) or REC_LOST_REDUNDANCY_ESM
Please validate that each troubleshooting step below is true for your environment. Each step will provide instructions via a link to a document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.
Troubleshooting Steps
1. Verify the Critical Faults on the array.
Reference <Document 1021057.1> How to verify Sun StorageTek[TM] 2500 and Sun Storage[TM] 6000 and J4000 Critical Faults via the User Interface.
- If the critical fault is for only Drive, collect Supportdata and proceed to Step 2.
- If there is critical fault for IOM/Tray along with the drive above, collect Cabling diagram along with Supportdata and proceed to Step 2.
- If the critical fault is for only IOM/Tray, collect Cabling diagram along with Supportdata and proceed to Step 3.
Reference <Document 1002514.1> Collecting Sun Storage Common Array Manager Array Support Data.
Reference <Document 1014074.1> Collecting Support Data for Arrays Using Sun StorageTek[TM] SANtricity Storage Manager.
2. Identify Drive Details from alarms.txt (recoveryGuruProcedures.html in case of SANtricity).
- If you use Sun Storage Common Array Manager:
- Extract and open the alarms.txt file from the supportdata.
- Get TrayID, DriveID and Channel information from the alarm xx.66.1032.
Example:
Alarm ID : alarm1
Description: Drive Tray.45.Drive.05 lost redundancy, IOM N/A, working channel: 5.
Severity : Critical
Element : t45drive5
GridCode : 80.66.1032
Date : xx-xx-xx
- If you use Sun StorageTek[TM] SANtricity Storage Manager:
- Extract and open the recoveryGuruProcedures.html file from the supportdata.
- Get TrayID, DriveID and Channel information from the Failure Entry NO_REDUNDANCY_DRIVE
Example:
Storage array: ST6540
Component reporting problem: Drive in slot 8
Status: Optimal
Location: Drive tray 1
Component requiring service: 8
Service action (removal) allowed: No
Service action LED on component: No
Working channel: 2
- Proceed to Step 4 to identify Working and Affected Channels.
3. Identify Tray Details from alarms.txt (recoveryGuruProcedures.html in case of SANtricity).
Reference Examples in Step 2 to get TrayID, Working Channel information from alarms.txt or recoveryGuruProcedures.html.
Proceed to Step 4 to identify Working and Affected Channels.
4. Identify Affected and Working Channels:
Locate the 'luall' output by opening the stateCaptureData.dmp file, and searching for the keyword 'luall'. Locate the Affected Drive/Tray as mentioned in the previous steps, and identify the Affected and Working Channels by following the example below:
For example:
Executing luall(0,0,0,0,0,0,0,0,0,0) on controller A:
.......Logical Unit........: :.Channels..:Que ............IOs............:
Devnum Location Role :ORP : 0 1 2 3 4 :Dep Qd Open Completed Errs : OldestCmdAge(ms)
---------- -------- ------ :--- : - - - - - :--- --- ----- ---------- ----- : ----------------
00020000 t0 Encl :++ : A B : 1 0 0 38399 3 0
00010100 t0,s1 FCdr :+++ : * + : 16 0 0 5934 2 0
00010101 t0,s2 FCdr :+++ : + * : 16 0 0 5935 4 0
Important fields to look here:
'Location' Column - t0,s1 - indicates Tray0, Slot1
'Channels' Column
0 1 2 3 4 . . . - Drive Channel information. Here it starts from 0. Channel-0 here represents Channel-1 in storageArrayProfile or alarms.txt output, and so on.
'A' or 'B' under Channels - Reported for only Trays, having A and B for a tray indicates the drive is redundant.
'*' under Channels - Active Path
'+' under Channels - Standby Path
'D' or 'd' or '-' or ' ' (No charactor) under Channels - Standby path is not available and needs further investigation.
Note1: Working Channel will always be seen with '*'
Note2: For Simplex (Single Controller) Array configuration, it's expected to see only Active path and Standby path will not be seen.
Detailed Explanation of symbols for Oracle TSE:
Symbols appearing before Device numbers:
-< = no IT Nexus connected
=< = logical unit rejecting IO requests
#< = logical unit restricted or suspended
d< = logical unit degraded... look at the ORP
ORP Column = Operation, Redundancy, Performance
Operation = the state of the ITN currently chosen
+ = chosen itn is not degraded
d = chosen itn is degraded
Redundancy = the state of the redundant ITN
+ = alternate itn is up
d = alternate itn is degraded
- = alternate itn is down
x = there is no alternate itn
Performance = Are we using the preferred path?
+ = chosen itn is preferred
- = chosen itn is not preferred
= no itn preferences
Channels column indicates the state of the itn on that channel
* = up and chosen
+ = up and not chosen
D = degraded and chosen
d = degraded and not chosen
- = down
x = not present
- If only a drive is seen with single path and all the other drives in the same tray have both paths available, the drive may need to be replaced. Proceed to Step 11.
- If all the drives are seen with single path and it is controller tray, one of the controllers may not be working properly. To check for controller issue and take appropriate action(s), follow <Document 1021113.1> Sun Storage[TM] Arrays: Troubleshooting RAID Controller Failures.
- If all the drives are seen with Single path and it is expansion tray, proceed to Step 5.
5. Verify other alerts in alarms.txt (recoveryGuruProcedures.html in case of SANtricity).
- If any alarm exists for Failed IOM on the same tray, the IOM may need to be replaced. Proceed to Step 11.
- If any alarm exists for Minihub failed and/or SFP failed, the SFP may need to be replaced. Proceed to Step 11.
- If no such alarm exists, proceed to Step 6.
6. Physically locate the Affected Channel using information collected in Step4.
- Default channel numbers and their location for 6130 array:

- Default channel numbers and their location for 6140 array:

- Default channel numbers and their location for 6180 array:

- Default channel numbers and their location for 6540 array:

- Default channel numbers and their location for 6580/6780 array:

- Default channel numbers and their location for 2540/2530/2510 array:

7. Trace the cable connectivity from the Affected Tray in the Affected Channel.
CAUTION: Do not disconnect any cables on the working channel. Doing so may cause a possible loss of data accessibility.
- If the array is 6000 series, proceed to Step 8.
- If the array is 2500 series, proceed to Step 9.
8. Verify the 7 segment LED status code of IOM.
Internal Note for Oracle Support Engineers:
a. CSM200 Tray has 7 segment LED display. To identify Tray/IOM type, click here.
b. For detailed LED status code description, refer <Document 1021109.1> Sun StorageTek[TM] 6140, 6540, and Flexline 380 Array Controller 7-Segment LED
9. Verify the Port Status LED.
Reference Port Status LEDs for 2500 Series - Check for "Link Fault" LED status.
Reference Port Status LEDs for 6000 Series - Check for "Port Bypass" LED status
- If Amber LED is ON, proceed to Step 10.
- If the LED is OFF, proceed to Step 11.
10. Check the cable going IN to the array in the cabling sequence.
If the cable is loose -or- disconnected, connect and evaluate alarm. It may also needed to reseat IOM for that tray.
- If the issue is fixed, you are finished with the procedure.
- If the issue is not resolved and Amber LED is ON, the cable and/or SFP would need replacement, proceed to Step 11.
11. Please contact Oracle Support and supply:
- Supportdata Collection
- Cabling Diagram (if applicable)
- Results of the above steps (if applicable)
References
Attachments
This solution has no attachment