Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1314607.1
Update Date:2011-05-26
Keywords:

Solution Type  Troubleshooting Sure

Solution  1314607.1 :   Troubleshooting Controller Unrecoverable Errors in Sun Storage[TM] SE3000 Arrays  


Related Items
  • Sun Storage 3511 SATA Array
  •  
  • Sun Storage 3310 Array
  •  
  • Sun Storage 3510 FC Array
  •  
  • Sun Storage 3320 SCSI Array
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - 3xxx Arrays
  •  




In this Document
  Purpose
  Last Review Date
  Instructions for the Reader
  Troubleshooting Details


Applies to:

Sun Storage 3310 Array - Version: Not Applicable and later   [Release: N/A and later ]
Sun Storage 3510 FC Array - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 3511 SATA Array - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 3320 SCSI Array - Version: Not Applicable and later    [Release: N/A and later]
Information in this document applies to any platform.

Purpose

Controller Unrecoverable Error codes may be reported in the event log of SE3xxx arrays, starting with firmware version 4.21.

In the following example,
Wed June 4 13:12:22 2008
[Secondary] Alert
ALERT: Controller Unrecoverable Error 0001 00000000 00000000 45754677

The first field contains: 0001, indicating a multi-bit memory error.
The last field: 45754677, reflects the date field in hexadecimal.

The Controller Unrecoverable Error event containing these error codes is
written to the array event log, when the affected controller restarts.  Because there may be
multiple reboots, the same error maybe written to the event log multiple times.


First Field Error Code

Error
0000
PCI Parity Error
0001
Multi-bit memory Error
0002
Hardware Error
0003
DSI exception
0004
Illegal Request

The error codes included give some more information about when and why the controller firmware rebooted.  Some causes are hardware problems; others causes are less clear, and are more likely to be firmware problems.

Last Review Date

April 18, 2011

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details

1. Determine if there are "Controller Unrecoverable Errors" in the eventlog or persistent event log on the array.

Issue a "sccli>show event" or "sccli>show persistent-event" cli command,
and review the content of the message output looking for "Controller Unrecoverable Errors".

Example:

Wed June 4 13:12:22 2008
[Secondary] Alert
ALERT: Controller Unrecoverable Error 0001 00000000 00000000 45754677

If yes, continue to the next step.
If not, proceed to the data collection and support engagement step at the end of this document. 

2. Verify if the array  is a single controller or dual controller array. Issue the "sccli>show redundancy" command.

If the Redundancy status is "Scanning", then the array is a single controller array and continue with the next step.

If the status is "Failed" or "Enabled", then the  array is a dual controller array and go to  step 5.

3.  Verify the the field for a Single Controller Array

For a single-controller array,  since the controller restarts automatically, this event is added to the event log very soon after the problem occurred.

  If the first field contains: 0000, 0003, or 0004:

Clear the core (this will require that you telnet into the array via the firmware interface
(tip or telnet) and enter the: System Functions-> Controller Maintenance menu and select Clear Core.


NOTE: This step will avoid repeated occurrences of this error from happening each time the array is reset.


If the first field contains: 0001, or 0002, replace the  controller.

4.  Verify the first field of the unrecoverable error in a dual-controller array

 For a dual-controller array, the affected controller will be held in
reset by the running controller (you will see only a "Redundant Controller
Failure" event at the time when the problem occurred). The
"Controller Unrecoverable Error" event will be written to the event log only
when the affected controller is restarted (e.g. when the array is power-cycled,
or when the remaining controller is reset for any reason, or when someone
manually "unfails" the affected controller etc.).

  • If the first field contains: 0000, 0003, or 0004 go to Step 5:
  • If the first field contains: 0001 or 0002 go to Step 6:
5.  Determine the Redundancy Status of the array

a. Determine the current Redundancy Status of the array by issuing a "sccl> show redundancy-mode" command.
b. If the controller is in a failed state,  unfail the controller using the "sccli> unfail" command.
c. Clear the core (this will require that you telnet into the array via the firmware interface (tip or telnet) and
enter the: System Functions-> Controller Maintenance menu and select Clear Core.


NOTE: This step will avoid repeated occurrences of this error from happening each time the array is reset.


d. Issue a "sccli>show redundancy-mode" command to review the current Redundancy Status.
e. If the current Redundancy status indicates the status is 'failed', proceed to the data collection and support engagement step at the end of this document. Otherwise you have finished troubleshooting the problem.

6.  Identify the affected controller

Finding which controller reported the  "Controller Unrecoverable Error" event needs extra
care, depending on what has been done after that event was logged.

The event message itself will show "[Primary]" or "[Secondary]", which is the
affected controller's functional role when that event is logged.

For example:
Wed June 4 13:12:22 2008
[Secondary] Alert
ALERT: Controller Unrecoverable Error 0001 00000000 00000000 45754677

The affected controllers functional role was "Secondary" at the time the event was logged.

However, after that event message is logged, the controller functional roles might
have changed e.g. if the customer or an engineer has manually failed the other
controller, or if a "Controller Unrecoverable Error" event occurred on the other
controller, or if the array was power-cycled.

Determine if controller roles have been reversed since the error was first reported
due to one of the following:
  •  - controller unfail commands issued
  •  - controllers physically swapped
  •  - array rebooting(s) as indicated by: "Initialization Complete" eventlog messages

If not, then replace the controller that logged the event ie [Primary]: Controller Unrecoverable Error.

If yes, go to the data collection and support engagement step below and contact Oracle.



Data collection and support engagement step:

1. Collect a Sun Explorer or se3000 extractor.
2. Contact Oracle support for further assistance.




Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback