Asset ID: |
1-72-1364982.1 |
Update Date: | 2012-06-20 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1364982.1
:
Failed DIMM
Related Items |
- Oracle Exalogic Elastic Cloud Software
- Oracle Exalogic Elastic Cloud X2-2 Qtr Rack
|
Related Categories |
- PLA-Support>Database Technology>Engineered Systems>Oracle Exalogic>MW: Exalogic Core
|
In this Document
Created from <SR 3-4306694701>
Applies to:
Oracle Exalogic Elastic Cloud Software - Version 1.0.0.0.0 and later
Oracle Exalogic Elastic Cloud X2-2 Qtr Rack - Version Not Applicable and later
Oracle Solaris on x86-64 (64-bit)
Symptoms
ILOM report shows "0 (/SYS/MB/P1/D5) shows faulted DIMM"
Below is the output of the ILOM command "show faulty"
Oracle(R) Integrated Lights Out Manager
Version 3.0.14.11.b r62978
Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved.
-> show faulty
Target | Property | Value
--------------------+------------------------+---------------------------------
/SP/faultmgmt/0 | fru | /SYS/MB/P1/D5
/SP/faultmgmt/0/ | class | fault.memory.intel.dimm.test-fai
faults/0 | | led
/SP/faultmgmt/0/ | sunw-msg-id | SPX86-8001-SA
faults/0 | |
/SP/faultmgmt/0/ | uuid | 2f21fdec-31cb-e015-a3e3-a8356fb0
faults/0 | | 64fc
/SP/faultmgmt/0/ | timestamp | 2011-05-13/21:46:43
faults/0 | |
/SP/faultmgmt/0/ | fru_part_number | A123B1C12AB1-AB1
faults/0 | |
/SP/faultmgmt/0/ | fru_serial_number | 1234A123
faults/0 | |
/SP/faultmgmt/0/ | product_serial_number | 1234ABC12A
faults/0 | |
/SP/faultmgmt/0/ | chassis_serial_number | 1234ABC12A
faults/0 | |
Below is the output snippet (containing the faulty DIMM) of the ILOM command "show -d properties -level all "
Oracle(R) Integrated Lights Out Manager
Version 3.0.14.11.b r62978
Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved.
-> show -d properties -level all /
......
......
......
/SYS/MB/P1/D5
Properties:
type = DIMM
ipmi_name = MB/P1/D5
fru_name = 8GB DDR3 SDRAM 533
fru_manufacturer = SAMSUNG
fru_version = 00
fru_part_number = A123B1C12AB1-AB1
fru_serial_number = 1234A123
fault_state = Faulted
clear_fault_action = (none)
......
......
......
/SP/faultmgmt/0/faults/0
Properties:
class = fault.memory.intel.dimm.test-failed
sunw-msg-id = SPX86-8001-SA
uuid = 2f21fdec-31cb-e015-a3e3-a8356fb064fc
timestamp = 2011-05-13/21:46:43
fru_part_number = A123B1C12AB1-AB1
fru_serial_number = 1234A123
product_serial_number = 1234ABC12A
chassis_serial_number = 1234ABC12A
......
......
Cause
As per the above ILOM outputs it is very clear that one of the DIMM is faulted.
In this case, it is the DIMM at P1/D5 persistently failing during MRC test and its neighbor DIMM at P1/D4 being disabled as a result.
But, it is possible that any of the DIMMs can fault/fail.
In this scenario, the cause of the fault is due to a damaged hardware.
Another possible cause:
This issue can be caused due to the dual diagnosis of memory correctable errors on a system that is running a Solaris OS. ILOM detects the errors as well as Solaris FMA. Solaris FMA has the ability to do page retirement but ILOM does not and therefore can only fault an entire DIMM if it's failure criteria is met. Please check Solaris FMA where available. If it is not available then we will have to rely on the ILOM diagnosis.
Solution
Review the Solaris FMA diagnosis instead of the ILOM diagnosis of DIMM replacement. On systems not running Solaris you must obviously go with the ILOM diagnosis. In either case steps must be followed to clear the errors. Please follow the below steps:
Login to the ILOM command line interface as 'root' and use the following commands to clear the fault.
Example:
-> set /SYS/MB/P0 clear_fault_action=true
Are you sure you want to clear /SYS/MB/P0 (y/n)? y
Set 'clear_fault_action' to 'true'
Now, if after clearing the errors, the issue reoccurs, then please replace the faulty DIMM through a field task generated through a Service Request (SR)
So, please open a Service Request for getting the faulty part replaced.
Attachments
This solution has no attachment