Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1364982.1
Update Date:2012-06-20
Keywords:

Solution Type  Problem Resolution Sure

Solution  1364982.1 :   Failed DIMM  


Related Items
  • Oracle Exalogic Elastic Cloud Software
  •  
  • Oracle Exalogic Elastic Cloud X2-2 Qtr Rack
  •  
Related Categories
  • PLA-Support>Database Technology>Engineered Systems>Oracle Exalogic>MW: Exalogic Core
  •  




In this Document
Symptoms
Cause
Solution


Created from <SR 3-4306694701>

Applies to:

Oracle Exalogic Elastic Cloud Software - Version 1.0.0.0.0 and later
Oracle Exalogic Elastic Cloud X2-2 Qtr Rack - Version Not Applicable and later
Oracle Solaris on x86-64 (64-bit)

Symptoms

ILOM report shows "0 (/SYS/MB/P1/D5) shows faulted DIMM"

Below is the output of the ILOM command "show faulty"

Oracle(R) Integrated Lights Out Manager

Version 3.0.14.11.b r62978

Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved.

-> show faulty
Target              | Property               | Value
--------------------+------------------------+---------------------------------
/SP/faultmgmt/0     | fru                    | /SYS/MB/P1/D5
/SP/faultmgmt/0/    | class                  | fault.memory.intel.dimm.test-fai
faults/0            |                        | led
/SP/faultmgmt/0/    | sunw-msg-id            | SPX86-8001-SA
faults/0            |                        |
/SP/faultmgmt/0/    | uuid                   | 2f21fdec-31cb-e015-a3e3-a8356fb0
faults/0            |                        | 64fc
/SP/faultmgmt/0/    | timestamp              | 2011-05-13/21:46:43
faults/0            |                        |
/SP/faultmgmt/0/    | fru_part_number        | A123B1C12AB1-AB1
faults/0            |                        |
/SP/faultmgmt/0/    | fru_serial_number      | 1234A123
faults/0            |                        |
/SP/faultmgmt/0/    | product_serial_number  | 1234ABC12A
faults/0            |                        |
/SP/faultmgmt/0/    | chassis_serial_number  | 1234ABC12A
faults/0            |                        |


Below is the output snippet (containing the faulty DIMM) of the ILOM command "show -d properties -level all "

Oracle(R) Integrated Lights Out Manager

Version 3.0.14.11.b r62978

Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved.

-> show -d properties -level all / 

  ......
  ......
  ...... 

  /SYS/MB/P1/D5
    Properties:
        type = DIMM
        ipmi_name = MB/P1/D5
        fru_name = 8GB DDR3 SDRAM 533
        fru_manufacturer = SAMSUNG
        fru_version = 00
        fru_part_number = A123B1C12AB1-AB1
        fru_serial_number = 1234A123
        fault_state = Faulted
        clear_fault_action = (none)
 
  ......
  ......
  ......

  /SP/faultmgmt/0/faults/0 
    Properties:
        class = fault.memory.intel.dimm.test-failed
        sunw-msg-id = SPX86-8001-SA
        uuid = 2f21fdec-31cb-e015-a3e3-a8356fb064fc 
        timestamp = 2011-05-13/21:46:43
        fru_part_number = A123B1C12AB1-AB1 
        fru_serial_number = 1234A123
        product_serial_number = 1234ABC12A
        chassis_serial_number = 1234ABC12A

  ......
  ......

Cause

As per the above ILOM outputs it is very clear that one of the DIMM is faulted.
In this case, it is the DIMM at P1/D5 persistently failing during MRC test and its neighbor DIMM at P1/D4 being disabled as a result.
But, it is possible that any of the DIMMs can fault/fail.
In this scenario, the cause of the fault is due to a damaged hardware.

Another possible cause:
This issue can be caused due to the dual diagnosis of memory correctable errors on a system that is running a Solaris OS. ILOM detects the errors as well as Solaris FMA. Solaris FMA has the ability to do page retirement but ILOM does not and therefore can only fault an entire DIMM if it's failure criteria is met. Please check Solaris FMA where available. If it is not available then we will have to rely on the ILOM diagnosis.

Solution

Review the Solaris FMA diagnosis instead of the ILOM diagnosis of DIMM replacement. On systems not running Solaris you must obviously go with the ILOM diagnosis. In either case steps must be followed to clear the errors. Please follow the below steps:
Login to the ILOM command line interface as 'root' and use the following commands to clear the fault.

Example:
-> set /SYS/MB/P0 clear_fault_action=true
Are you sure you want to clear /SYS/MB/P0 (y/n)? y
Set 'clear_fault_action' to 'true'

Now, if after clearing the errors, the issue reoccurs, then please replace the faulty DIMM through a field task generated through a Service Request (SR)
So, please open a Service Request for getting the faulty part replaced.


Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback