Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Technical Instruction Sure Solution 1005169.1 : How to decode System Event Log (SEL) entries for Memory Uncorrectable Errors in Sun Blade[TM] X8400, X8420, X8440 Server Modules
PreviouslyPublishedAs 207257 Description Document description Symptoms:
Purpose/Scope: When a Memory Uncorrectable ECC Error occurs on Sun Blade[TM] X8400 server modules, a Hypertransport sync flood occurs, the system is rebooted and BIOS fault diagnosis will fault the DIMM pair and put an entry in the System Event Log (SEL).
Note that the faulted DIMMs can be identified in other places. They should all point to the same DIMM pair, which is the customer/field replaceable unit (CRU/FRU). Steps to Follow Steps to follow A SEL entry for the Uncorrectable ECC Error may appear like this as viewed by ipmitool command e.g.: /usr/sfw/bin/ipmitool -I lanplus -H <IP address or hostname of Blade SP> -U root sel elist >>> 1807 | 06/24/2007 | 07:40:26 | Memory | Uncorrectable ECC | Asserted | CPU 1 DIMM 0 The label "DIMM 0" refers to a DIMM Pair, and not a single DIMM. This may be different from how the entry appears when viewing the SEL in BIOS or when viewed from the blade SP or Chassis Monitoring Module (CMM) unified log the DIMMs will be identified as d#/d# for pairs that correspond to the physical labeling of the slots on the board. On the blade board, using the Fault Remind button will light up the LEDs in the DIMM sockets for the correct pair. The CRU/FRU is the pair of DIMMs to ensure matched size, type and vendor DIMMs. To translate the pair numbering to the slot numbering, use these tables for the blade model you have: X8400
X8420
X8440
Note: When adding or upgrading memory, always populate memory in empty DIMM slots starting with the white pairs furthest from the CPU socket. Note: On X8440 blade, problems can occur with some OS' that cannot handle having CPU0 with no dimm's. Product Sun Blade X8400 Server Module Sun Blade 8000 Sun Blade X8420 Server Module Sun Blade X8440 Server Module Internal Comments On some revisions of ILOM firmware, the fault remind button may also light the CPU socket fault LED as well as the DIMM LEDs. If the logged error is related to memory, then only replace the memory DIMMs, not the CPU as well. This document contains normalized content and is managed by the the Domain Lead normalized, blade, 8000, uncorrectable, memory, ecc, error, UE, dimm, x8400, x8420, SEL, x8440 Previously Published As 86563 Change History Date: 2008-10-20 User Name: 79977 Action: Updated Comment: Added normalization keywords and wrapper Version: 7 Attachments This solution has no attachment |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
|