Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Troubleshooting Sure Solution 1314607.1 : Troubleshooting Controller Unrecoverable Errors in Sun Storage[TM] SE3000 Arrays
In this Document
Applies to:Sun Storage 3310 Array - Version: Not ApplicableSun Storage 3510 FC Array - Version: Not Applicable and later [Release: N/A and later] Sun Storage 3511 SATA Array - Version: Not Applicable and later [Release: N/A and later] Sun Storage 3320 SCSI Array - Version: Not Applicable and later [Release: N/A and later] Information in this document applies to any platform. PurposeController Unrecoverable Error codes may be reported in the event log of SE3xxx arrays, starting with firmware version 4.21.In the following example, Wed June 4 13:12:22 2008 [Secondary] Alert ALERT: Controller Unrecoverable Error 0001 00000000 00000000 45754677 The first field contains: 0001, indicating a multi-bit memory error. The last field: 45754677, reflects the date field in hexadecimal. The Controller Unrecoverable Error event containing these error codes is written to the array event log, when the affected controller restarts. Because there may be multiple reboots, the same error maybe written to the event log multiple times.
The error codes included give some more information about when and why the controller firmware rebooted. Some causes are hardware problems; others causes are less clear, and are more likely to be firmware problems. Last Review DateApril 18, 2011Instructions for the ReaderA Troubleshooting Guide is provided to assist
in debugging a specific issue. When possible, diagnostic tools are included in the document
to assist in troubleshooting.
Troubleshooting Details1. Determine if there are "Controller Unrecoverable Errors" in the eventlog or persistent event log on the array.Issue a "sccli>show event" or "sccli>show persistent-event" cli command, and review the content of the message output looking for "Controller Unrecoverable Errors". Example: Wed June 4 13:12:22 2008 [Secondary] Alert ALERT: Controller Unrecoverable Error 0001 00000000 00000000 45754677 If yes, continue to the next step. If not, proceed to the data collection and support engagement step at the end of this document. 2. Verify if the array is a single controller or dual controller array. Issue the "sccli>show redundancy" command. If the Redundancy status is "Scanning", then the array is a single controller array and continue with the next step. If the status is "Failed" or "Enabled", then the array is a dual controller array and go to step 5. 3. Verify the the field for a Single Controller Array For a single-controller array, since the controller restarts automatically, this event is added to the event log very soon after the problem occurred. If the first field contains: 0000, 0003, or 0004: Clear the core (this will require that you telnet into the array via the firmware interface (tip or telnet) and enter the: System Functions-> Controller Maintenance menu and select Clear Core. NOTE: This step will avoid repeated occurrences of this error from happening each time the array is reset. If the first field contains: 0001, or 0002, replace the controller. 4. Verify the first field of the unrecoverable error in a dual-controller array For a dual-controller array, the affected controller will be held in reset by the running controller (you will see only a "Redundant Controller Failure" event at the time when the problem occurred). The "Controller Unrecoverable Error" event will be written to the event log only when the affected controller is restarted (e.g. when the array is power-cycled, or when the remaining controller is reset for any reason, or when someone manually "unfails" the affected controller etc.).
a. Determine the current Redundancy Status of the array by issuing a "sccl> show redundancy-mode" command. b. If the controller is in a failed state, unfail the controller using the "sccli> unfail" command. c. Clear the core (this will require that you telnet into the array via the firmware interface (tip or telnet) and enter the: System Functions-> Controller Maintenance menu and select Clear Core. NOTE: This step will avoid repeated occurrences of this error from happening each time the array is reset. d. Issue a "sccli>show redundancy-mode" command to review the current Redundancy Status. e. If the current Redundancy status indicates the status is 'failed', proceed to the data collection and support engagement step at the end of this document. Otherwise you have finished troubleshooting the problem. 6. Identify the affected controller Finding which controller reported the "Controller Unrecoverable Error" event needs extra care, depending on what has been done after that event was logged. The event message itself will show "[Primary]" or "[Secondary]", which is the affected controller's functional role when that event is logged. For example: Wed June 4 13:12:22 2008 [Secondary] Alert ALERT: Controller Unrecoverable Error 0001 00000000 00000000 45754677 The affected controllers functional role was "Secondary" at the time the event was logged. However, after that event message is logged, the controller functional roles might have changed e.g. if the customer or an engineer has manually failed the other controller, or if a "Controller Unrecoverable Error" event occurred on the other controller, or if the array was power-cycled. Determine if controller roles have been reversed since the error was first reported due to one of the following:
If not, then replace the controller that logged the event ie [Primary]: Controller Unrecoverable Error. If yes, go to the data collection and support engagement step below and contact Oracle. Data collection and support engagement step: 1. Collect a Sun Explorer or se3000 extractor. 2. Contact Oracle support for further assistance. Attachments This solution has no attachment |
||||||||||||||||||||||
|