Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1010810.1
Update Date:2010-07-06
Keywords:

Solution Type  Technical Instruction Sure

Solution  1010810.1 :   How to resolve "DRAM parity error detected" on Sun Storage 33x0/351x Arrays  


Related Items
  • Sun Storage 3510 FC Array
  •  
  • Sun Storage 3320 SCSI Array
  •  
  • Sun Storage 3310 Array
  •  
  • Sun Storage 3511 SATA Array
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - 3xxx Arrays
  •  

PreviouslyPublishedAs
214938


Applies to:

Sun Storage 3310 SCSI Array
Sun Storage 3510 FC Array
Sun Storage 3511 SATA Array
Sun Storage 3320 SCSI Array - Version: Not Applicable and later    [Release: NA and later]
All Platforms

Goal

Description

Memory parity errors on the Sun Storage 3310/3320/3510/3511 Array results in the 
following messages being logged in the event log which can be viewed using the
software CLI "sccli> show event" command.
[0104] #4287: StorEdge Array SN#xxxxxxx Controller ALERT: DRAM parity error
detected

However it doesn't tell us which controller has logged the messages in a dual controller 
array and we might  end up replacing the wrong controller. Caution must be taken when
determining the affected controller, as the messages in the event log reflect the
controller's functional role when that event is logged. However, after that event message
was logged, the controller functional roles might have changed. For example, if the customer
or an engineer has manually failed the other controller, or if a "Controller Unrecoverable
Error" event occurred on the other controller, or if the array was power-cycled.
This document describes steps to assist with determining the affected controller 
in certain instances (when the controller role has not been changed).

Solution


Steps to Follow:


DRAM parity errors are usually ok if the errors are single bit errors. Only replace the controller if you observe

many occurrences of "DRAM parity error detected" in the event log.

Unfortunately, while using sccli to view the event log, the error shows up as:


[0104] #4287: StorEdge Array SN#xxxxxxx Controller ALERT: DRAM parity error detected

It doesn't indicate which controller generated the error in the case of a dual controller configuration.

The onsite engineer may erroneously replace the primary controller, but the errors may still continue even after the controller is replaced.

In this case, the secondary controller generated this error at the time the

event was logged.

To find out correctly as to which controller is generating this error,

use the 3310/3320/3510/3511 telnet/serial interface and view the event log.

Tip or telnet into the Sun Storage 3310, 3320,  3510,or 3511 Array.

From the Main Menu:

1) Choose "view and edit Event logs"

and you should see something similar to the following for the same DRAM parity error:

[0104] Controller SDRAM ECC Single-bit Error Detected
---Wed Feb 12 14:47:46 2003------------------------------------S---

The S in the bottom line indicates that this error was generated by the Secondary controller, and requires

the secondary controller  be replaced and NOT the primary controller. If there is a "P" instead of a "S",

this means the primary controller reported the error and should be replaced.

This will only be an accurate indicator of which controller requires replacement

if the roles of the controllers have not changed. If the controller roles have changed, additional fault isolation steps

will be required to identify the correct controller generating the errors.


Additional Information:

The same error messages can be seen as follows using the diagnostic reporter and the sscs software,

if they are configured. These components are also a part of the array management software.

### From Diagnostic Reporter email:
************************************************************
hostname=x.x.x.x dummyhost
timestamp=02/12/2003 14:48:17
device=HBA 3[Ch0Id2] SUN StorEdge 3510
priority=Critical
error_code=010b1d0d
message=Controller Event, SDRAM Error. Likely controller error. If error
persists, replace defective controller. (Wed Feb 12 14:47:46 2003)
************************************************************
### From /var/adm/messages:
Feb 12 14:48:40 dummyhost SUNWscsdMonitor[374]: [ID 677437 daemon.error]
[SUNWscsd 0x10B1D0D: Critical] Controller Event, SDRAM Error.
Likely controller error. If error persists, replace defective controller.
(Wed Feb 12 14:47:44 2003)




Change History

@ User Name: [email protected]
Date: 2010-07-06
Action: Currency & Update


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback