Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1002641.1
Update Date:2012-08-23
Keywords:

Solution Type  Troubleshooting Sure

Solution  1002641.1 :   Troubleshooting Sun StorEdge[TM] 351x and 33x0 Controllers  


Related Items
  • Sun Storage 3511 SATA Array
  •  
  • Sun Storage 3310 Array
  •  
  • Sun Storage 3510 FC Array
  •  
  • Sun Storage 3320 SCSI Array
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>Arrays>SN-DK: SE31xx_33xx_35xx
  •  
  • .Old GCS Categories>Sun Microsystems>Storage - Disk>Modular Disk - 3xxx Arrays
  •  

PreviouslyPublishedAs
203641
Before replacing a controller, confirm  overall Array health . Other components and/or conditions may have caused the current controller state. Most controller failures are caused by firmware detected problems and not actual controller hardware problems. Generally speaking, the controller should be one of the very  last components  replaced.

Applies to:

Sun Storage 3511 SATA Array - Version Not Applicable and later
Sun Storage 3310 Array - Version Not Applicable and later
Sun Storage 3320 SCSI Array - Version Not Applicable and later
Sun Storage 3510 FC Array - Version Not Applicable and later
All Platforms

Purpose

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Storage Disk 2000, 3000, 6000 RAID Arrays & JBODs Community


Description


Troubleshooting Sun Storage[TM] 351x and 33x0 Controllers and Array Health.

Symptoms:

  • "show enclosure" shows a faulted controller
  • "show redundancy" shows Redundancy status: Failed or Scanning on a dual controller array
  • Controller Alert: Redundant Controller Failure Detected
  • Amber controller LED indicating failure
  • Chassis sounds audible alarm
  • Controller appear hung
  • DRAM Parity errors

Please refer to the Sun StorEdge 3000 Family RAID Firmware 4.2x User Guide, Appendix E for additional Controller related Event Messages.

Troubleshooting Steps

NOTE:

  • This is a sub-set of Troubleshooting Sun StorEdge[TM] 33x0/351x Hardware (Doc ID 1011431.1). The steps below will help verify and resolve controller problems.
  • Please ensure that sccli version 2.3 or higher is being used to run the commands described in this document.
  • Latest Release:  Sun StorageTek 3000 Series Software 2.5
bash# sccli -v
sccli version 2.5.1

 Please validate that each troubleshooting step below is true for your environment. Each step will provide instructions via a link to the document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.

 

1. Verify the existence of a serial number in both primary and secondary slots using show redundancy:

Refer to Chapter 6 of the Sun StorEdge Family FRU Installation Guide or the Sun StorEdge 3000 Family CLI 2.5 User's Guide for an explanation of possible states.

 

sccli> show redundancy
 Primary controller serial number: 8045010
 Primary controller location: Lower
 Redundancy mode: Active-Active
 Redundancy status: Failed
 Secondary controller serial number: 8002225
  • If, Secondary controller's serial number shows 0, go to Step 2.
  • If, Both Primary and Secondary controller's show a serial number, go to Step 3
  • If, Both controller's serial number show 0, Contact Oracle for further support

 

2.  Verify physical existence of two controllers in the unit:

You will have to physically verify that there are 2 controllers.  Each controller will have an ethernet port and serial port.

3510 Rear View

3511 SSH Rear View

3320/3310 Rear View 

  • If, there are two controllers in the unit, then go to Step 3.
  • If, there is only one controller in the unit, go to Step 4.

 

 3.  Verify LED status on the controllers:

  • Solid Green - indicates controller operating as secondary
  • Blinking Green - indicates controller operating as primary
  • Amber indicates - controller is faulted.
Upper LEDLower LEDResultAction
Solid Green Solid Green Both controllers think they are secondary Contact Oracle for further support.
Solid Green Blinking Green Lower Controller is primary, Upper controller is Secondary

Array controllers ar optimal.

Solid Green Amber Lower controller is faulted Upper controller has not taken over as Primary.  Contact Oracle for further support.
Blinking Green Solid Green Upper Controller is Primary, Lower is Secondary Array controllers are optimal.
Blinking Green Blinking Green Both controllers think they are primary Contact Oracle for further support.
Blinking Green Amber Upper Controller is primary, Lower controller is faulted Upper controller has taken over as Primary.  Secondary controller is faulted, go to Step 4.
Amber Solid Green Upper controller is faulted Upper controller has faulted, Lower controller has not taken over as Primary.  Contact Oracle for further support.
Amber Blinking Green Lower controller is primary, Upper controller is faulted Lower controller has taken over as Primary.  Secondary controller is faulted, go to Step 4.
Amber Amber Upper and lower controllers are faulted Both controllers are faulted.  Contact Oracle for further support.
  • If, you cannot verify the LED status, go to Step 4.
  • If, LED status is optimal but redundany status is failed, Contact Oracle for further support.


4. Verify persistent events or use show events for controller events:

 Review the event log for the following events within a 24 hour period of seeing the event above, and take action as indicated:

sccli> show events
  • Controller Unrecoverable Error
  • Controller SDRAM ECC Multi-bits Error Detected 
  • Controller SDRAM ECC Single-bit Error Detected 
  • Controller SDRAM Parity Error Detected 
  • Controller PCI Bus Parity Error Detected 
  • If, Controller Unrecoverable Error, reference: Troubleshooting Controller Unrecoverable Errors in Sun Storage[TM] SE3000 Arrays (Doc ID 1314607.1)
  • If, Any of the other events apply or you only have a single controller array, Contact Oracle for further support.
  • If, None of these apply, continue to Step 5.

 

5. Verify show events for drive channel loop issues:

sccli> show events

Reference the error documentation:

http://download.oracle.com/docs/cd/E19673-01/817-3711-18/appe_msgs_translat.html#pgfId-999843

CHL:_ Drive SCSI Channel ALERT: Unexpected Select Timeout
CHL:_ RCC Channel ALERT: Gross Phase/Signal Error Detected
CHL:_ Drive SCSI Channel ALERT: Gross Phase/Signal Error Detected
CHL:_ Drive SCSI Channel ALERT: Unexpected Disconnect Encountered
CHL:_ RCC Channel ALERT: Timeout Waiting for I/O to Complete
CHL:_ Drive SCSI Channel ALERT: Timeout Waiting for I/O to Complete
CHL:_ RCC Channel ALERT: SCSI Parity/CRC Error Detected
CHL:_ SCSI Drive Channel ALERT: SCSI Parity/CRC Error Detected
CHL:_ RCC Channel ALERT: Unit Attention Received
CHL:_ SCSI Drive Channel ALERT: Unit Attention Received
CHL:_ RCC Channel ALERT: Data Overrun/Underrun Detected
CHL:_ Drive SCSI Channel ALERT: Data Overrun/Underrun Detected
CHL:_ RCC Channel ALERT: Negotiation Error Detected
CHL:_ Drive SCSI Channel ALERT: Negotiation Error Detected
CHL:_ RCC Channel ALERT: Invalid Status/Sense Data Received
CHL:_ Drive SCSI Channel ALERT: Invalid Status/Sense Data Received
CHL:_ SCSI Host Channel Alert: SCSI Bus Reset Issued
CHL:_ ALERT: Fibre Channel Loop Failure Detected
CHL:_ ALERT: Redundant loop for CHL:_ Failure Detected
CHL:_ ALERT: Redundant Path for CHL:_ ID:_ Expected but Not Found

  • If, There are events similar to the above, contact Oracle for further support.
  • If, There are no events similar to the above list, go to Step 6.

 6. Unfail the secondary controller:

Issue the following command:

sccli> unfail

Wait up to 15 minutes for device detection before the controller redundancy status is Enabled.

Issue sccli> show redundancy command to confirm redundancy status is Enabled .

  • If, The status goes to a Failed, Scanning, or Detected state, contact Oracle for further support.
  • If, The controller redundancy mode is now Enabled, repeat step 4. Troubleshooting is complete if no additional errors have been generated.
 

References

<NOTE:1011431.1> - Troubleshooting Sun StorEdge[TM] 33x0/351x Hardware
<NOTE:1314607.1> - Troubleshooting Controller Unrecoverable Errors in Sun Storage[TM] SE3000 Arrays

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback