Asset ID: |
1-75-1002641.1 |
Update Date: | 2012-08-23 |
Keywords: | |
Solution Type
Troubleshooting Sure
Solution
1002641.1
:
Troubleshooting Sun StorEdge[TM] 351x and 33x0 Controllers
Related Items |
- Sun Storage 3511 SATA Array
- Sun Storage 3310 Array
- Sun Storage 3510 FC Array
- Sun Storage 3320 SCSI Array
|
Related Categories |
- PLA-Support>Sun Systems>DISK>Arrays>SN-DK: SE31xx_33xx_35xx
- .Old GCS Categories>Sun Microsystems>Storage - Disk>Modular Disk - 3xxx Arrays
|
PreviouslyPublishedAs
203641
Before replacing a controller, confirm overall Array health . Other components and/or conditions may have caused the current controller state. Most controller failures are caused by firmware detected problems and not actual controller hardware problems. Generally speaking, the controller should be one of the very last components replaced.
Applies to:
Sun Storage 3511 SATA Array - Version Not Applicable and later
Sun Storage 3310 Array - Version Not Applicable and later
Sun Storage 3320 SCSI Array - Version Not Applicable and later
Sun Storage 3510 FC Array - Version Not Applicable and later
All Platforms
Purpose
Description
Troubleshooting Sun Storage[TM] 351x and 33x0 Controllers and Array Health.
Symptoms:
- "show enclosure" shows a faulted controller
- "show redundancy" shows Redundancy status: Failed or Scanning on a dual controller array
- Controller Alert: Redundant Controller Failure Detected
- Amber controller LED indicating failure
- Chassis sounds audible alarm
- Controller appear hung
- DRAM Parity errors
Please refer to the Sun StorEdge 3000 Family RAID Firmware 4.2x User Guide, Appendix E for additional Controller related Event Messages.
Troubleshooting Steps
NOTE:
- This is a sub-set of Troubleshooting Sun StorEdge[TM] 33x0/351x Hardware (Doc ID 1011431.1). The steps below will help verify and resolve controller problems.
- Please ensure that sccli version 2.3 or higher is being used to run the commands described in this document.
- Latest Release: Sun StorageTek 3000 Series Software 2.5
bash# sccli -v
sccli version 2.5.1
Please validate that each troubleshooting step below is true for your environment. Each step will provide instructions via a link to the document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.
1. Verify the existence of a serial number in both primary and secondary slots using show redundancy:
Refer to Chapter 6 of the Sun StorEdge Family FRU Installation Guide or the Sun StorEdge 3000 Family CLI 2.5 User's Guide for an explanation of possible states.
sccli> show redundancy
Primary controller serial number: 8045010
Primary controller location: Lower
Redundancy mode: Active-Active
Redundancy status: Failed
Secondary controller serial number: 8002225
- If, Secondary controller's serial number shows 0, go to Step 2.
- If, Both Primary and Secondary controller's show a serial number, go to Step 3
- If, Both controller's serial number show 0, Contact Oracle for further support
2. Verify physical existence of two controllers in the unit:
You will have to physically verify that there are 2 controllers. Each controller will have an ethernet port and serial port.
3510 Rear View
3511 SSH Rear View
3320/3310 Rear View
- If, there are two controllers in the unit, then go to Step 3.
- If, there is only one controller in the unit, go to Step 4.
3. Verify LED status on the controllers:
- Solid Green - indicates controller operating as secondary
- Blinking Green - indicates controller operating as primary
- Amber indicates - controller is faulted.
Upper LED | Lower LED | Result | Action |
Solid Green |
Solid Green |
Both controllers think they are secondary |
Contact Oracle for further support. |
Solid Green |
Blinking Green |
Lower Controller is primary, Upper controller is Secondary |
Array controllers ar optimal.
|
Solid Green |
Amber |
Lower controller is faulted |
Upper controller has not taken over as Primary. Contact Oracle for further support. |
Blinking Green |
Solid Green |
Upper Controller is Primary, Lower is Secondary |
Array controllers are optimal. |
Blinking Green |
Blinking Green |
Both controllers think they are primary |
Contact Oracle for further support. |
Blinking Green |
Amber |
Upper Controller is primary, Lower controller is faulted |
Upper controller has taken over as Primary. Secondary controller is faulted, go to Step 4. |
Amber |
Solid Green |
Upper controller is faulted |
Upper controller has faulted, Lower controller has not taken over as Primary. Contact Oracle for further support. |
Amber |
Blinking Green |
Lower controller is primary, Upper controller is faulted |
Lower controller has taken over as Primary. Secondary controller is faulted, go to Step 4. |
Amber |
Amber |
Upper and lower controllers are faulted |
Both controllers are faulted. Contact Oracle for further support. |
- If, you cannot verify the LED status, go to Step 4.
- If, LED status is optimal but redundany status is failed, Contact Oracle for further support.
4. Verify persistent events or use show events for controller events:
Review the event log for the following events within a 24 hour period of seeing the event above, and take action as indicated:
- Controller Unrecoverable Error
- Controller SDRAM ECC Multi-bits Error Detected
- Controller SDRAM ECC Single-bit Error Detected
- Controller SDRAM Parity Error Detected
- Controller PCI Bus Parity Error Detected
- If, Controller Unrecoverable Error, reference: Troubleshooting Controller Unrecoverable Errors in Sun Storage[TM] SE3000 Arrays (Doc ID 1314607.1)
- If, Any of the other events apply or you only have a single controller array, Contact Oracle for further support.
- If, None of these apply, continue to Step 5.
5. Verify show events for drive channel loop issues:
Reference the error documentation:
http://download.oracle.com/docs/cd/E19673-01/817-3711-18/appe_msgs_translat.html#pgfId-999843
CHL:_ Drive SCSI Channel ALERT: Unexpected Select Timeout
CHL:_ RCC Channel ALERT: Gross Phase/Signal Error Detected
CHL:_ Drive SCSI Channel ALERT: Gross Phase/Signal Error Detected
CHL:_ Drive SCSI Channel ALERT: Unexpected Disconnect Encountered
CHL:_ RCC Channel ALERT: Timeout Waiting for I/O to Complete
CHL:_ Drive SCSI Channel ALERT: Timeout Waiting for I/O to Complete
CHL:_ RCC Channel ALERT: SCSI Parity/CRC Error Detected
CHL:_ SCSI Drive Channel ALERT: SCSI Parity/CRC Error Detected
CHL:_ RCC Channel ALERT: Unit Attention Received
CHL:_ SCSI Drive Channel ALERT: Unit Attention Received
CHL:_ RCC Channel ALERT: Data Overrun/Underrun Detected
CHL:_ Drive SCSI Channel ALERT: Data Overrun/Underrun Detected
CHL:_ RCC Channel ALERT: Negotiation Error Detected
CHL:_ Drive SCSI Channel ALERT: Negotiation Error Detected
CHL:_ RCC Channel ALERT: Invalid Status/Sense Data Received
CHL:_ Drive SCSI Channel ALERT: Invalid Status/Sense Data Received
CHL:_ SCSI Host Channel Alert: SCSI Bus Reset Issued
CHL:_ ALERT: Fibre Channel Loop Failure Detected
CHL:_ ALERT: Redundant loop for CHL:_ Failure Detected
CHL:_ ALERT: Redundant Path for CHL:_ ID:_ Expected but Not Found
- If, There are events similar to the above, contact Oracle for further support.
- If, There are no events similar to the above list, go to Step 6.
6. Unfail the secondary controller:
Issue the following command:
Wait up to 15 minutes for device detection before the controller redundancy status is Enabled.
Issue sccli> show redundancy command to confirm redundancy status is Enabled .
- If, The status goes to a Failed, Scanning, or Detected state, contact Oracle for further support.
- If, The controller redundancy mode is now Enabled, repeat step 4. Troubleshooting is complete if no additional errors have been generated.
References
<NOTE:1011431.1> - Troubleshooting Sun StorEdge[TM] 33x0/351x Hardware
<NOTE:1314607.1> - Troubleshooting Controller Unrecoverable Errors in Sun Storage[TM] SE3000 Arrays
Attachments
This solution has no attachment