![]() | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Problem Resolution Sure Solution 1020256.1 : M-Series: DIMMs are suddenly marked faulty after upgrading kernel patches
PreviouslyPublishedAs 254968
Applies to:Sun SPARC Enterprise M8000 ServerSun SPARC Enterprise M4000 Server Sun SPARC Enterprise M5000 Server Sun SPARC Enterprise M9000-32 Server - Version: Not Applicable to Not Applicable [Release: N/A to N/A] Sun SPARC Enterprise M9000-64 Server - Version: Not Applicable to Not Applicable [Release: N/A to N/A] All Platforms SymptomsShortly after upgrading to KJP 127111-08 (kernel patch) or higher in an OPL domain, one or several DIMMs are marked faulty by XSCF.ChangesSolaris 10 Kernel Patch upgrade to 127111-08 or higher.CauseKJP 127111-08 introduces memory page retirement for intermittent ECC errors.DIMMs that were installed prior to patching may suddenly show many errors and be marked faulty by XSCF. These errors were corrected silently prior to 127111-08 and not reported by FMA. On older systems that have been originally installed with Solaris 10u4 this may lead to the impression that one or several DIMMs suddenly got bad. Patching the kernel is required if upgrading to SPARC64 VII (Jupiter), customers may believe that the new CMUs are causing the problems. SolutionSchedule a maintenance action to replace the faulted DIMMsAdditional Information Description of past and current behavior for intermittent and permanent correctable ECC errors. 127111-07 and older: ========================= DIMMs are marked for replacement when more than 128 pages are retired. A single permanent CE on a page triggers retirement of that page. Intermittent CEs are handled and corrected silently. 127111-08 and newer: ========================= DIMMs are marked for replacement when more than 128 pages are retired. (same as before) A single permanent CE on a page triggers retirement of that page. (same as before) 3 intermittent CEs within 72 hours on a DIMM trigger retirement of the page associated with the 3rd ICE. (this is new). fmdump -e will report: Intermittent errors (only 127111-08 and newer): ================================ ereport.asic.mac.mi-ice and / or ereport.asic.mac.ptrl-ice Persistent errors (always): ================= ereport.asic.mac.mi-ce and / or ereport.asic.mac.ptrl-ce fmadm faulty -a will show something like: --------------- ------------------------------------ -------------- --------- Product Sun SPARC Enterprise M4000 Server Sun SPARC Enterprise M5000 Server Sun SPARC Enterprise M8000 Server Sun SPARC Enterprise M9000 Server Keywords: patch, troubleshoot, 127111, OPL dimm Attachments This solution has no attachment |
||||||||||||
|