Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1020256.1
Update Date:2012-07-30
Keywords:

Solution Type  Problem Resolution Sure

Solution  1020256.1 :   M-Series: DIMMs are suddenly marked faulty after upgrading kernel patches  


Related Items
  • Sun SPARC Enterprise M9000-32 Server
  •  
  • Sun SPARC Enterprise M5000 Server
  •  
  • Sun SPARC Enterprise M9000-64 Server
  •  
  • Sun SPARC Enterprise M4000 Server
  •  
  • Sun SPARC Enterprise M8000 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: Mx000
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>OPL Servers
  •  

PreviouslyPublishedAs
254968


Applies to:

Sun SPARC Enterprise M8000 Server
Sun SPARC Enterprise M4000 Server
Sun SPARC Enterprise M5000 Server
Sun SPARC Enterprise M9000-32 Server - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Sun SPARC Enterprise M9000-64 Server - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
All Platforms

Symptoms

Shortly after upgrading to KJP 127111-08 (kernel patch) or higher in an OPL domain, one or several DIMMs are marked faulty by XSCF.

Changes

Solaris 10 Kernel Patch upgrade to 127111-08 or higher.

Cause

KJP 127111-08 introduces memory page retirement for intermittent ECC errors.
DIMMs that were installed prior to patching may suddenly show many errors and be marked faulty by XSCF.
These errors were corrected silently prior to 127111-08 and not reported by FMA.

On older systems that have been originally installed with Solaris 10u4 this may lead to the impression that one or several DIMMs suddenly got bad.
Patching the kernel is required if upgrading to SPARC64 VII (Jupiter), customers may believe that the new CMUs are causing the problems.

Solution

Schedule a maintenance action to replace the faulted DIMMs


Additional Information
Description of past and current behavior for intermittent and permanent correctable ECC errors.

127111-07 and older:
=========================
DIMMs are marked for replacement when more than 128 pages are retired.
A single permanent CE on a page triggers retirement of that page.
Intermittent CEs are handled and corrected silently.

127111-08 and newer:
=========================
DIMMs are marked for replacement when more than 128 pages are retired. (same as before)
A single permanent CE on a page triggers retirement of that page. (same as before)
3 intermittent CEs within 72 hours on a DIMM trigger retirement of the page associated with the 3rd ICE. (this is new).


fmdump -e will report:

Intermittent errors (only 127111-08 and newer):
================================
ereport.asic.mac.mi-ice
and / or
ereport.asic.mac.ptrl-ice

Persistent errors (always):
=================
ereport.asic.mac.mi-ce
and / or
ereport.asic.mac.ptrl-ce


fmadm faulty -a will show something like:
--------------- ------------------------------------  -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Feb 26 18:52:14 a4309495-67c4-eb98-d174-f0c091643420 SUN4U-8000-2S Major

Fault class : fault.memory.dimm 95%
Affects : mem:///unum=/CMU03/MEM10A
degraded but still in service
FRU : mem:///unum=/CMU03/MEM10A 95%
faulty
Serial ID. : D21757AD:36HTF51272PY-667E1

Description : The number of errors associated with this memory module has exceeded acceptable levels. Refer to....../SUN4U-8000-2S for more information.

Response : Pages of memory associated with this memory module are being removed from service as errors are reported.

Impact : Total system memory capacity will be reduced as pages are retired.

Action : Schedule a repair procedure to replace the affected memory module. Use fmdump -v -u to identify the module.



Product
Sun SPARC Enterprise M4000 Server
Sun SPARC Enterprise M5000 Server
Sun SPARC Enterprise M8000 Server
Sun SPARC Enterprise M9000 Server

Keywords: patch, troubleshoot, 127111, OPL dimm

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback