Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1010905.1
Update Date:2012-03-14
Keywords:

Solution Type  Technical Instruction Sure

Solution  1010905.1 :   Sun Enhanced Memory DIMM Replacement Policy for SPARC  


Related Items
  • Sun Fire E25K Server
  •  
  • Sun SPARC Enterprise M9000-32 Server
  •  
  • Sun SPARC Enterprise T5440 Server
  •  
  • Sun Fire E6900 Server
  •  
  • Sun Fire V890 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Generic Product>SN-OTH: Gen_Prod
  •  
  • .Old GCS Categories>Sun Microsystems>Desktops>Workstations
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>NEBS-Certified Servers
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>High-End Servers
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>Midrange Servers
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>Entry-Level Servers
  •  
  • .Old GCS Categories>Sun Microsystems>Boards>Memory Module
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>CMT Servers
  •  

PreviouslyPublishedAs
215045


Applies to:

Sun Fire E25K Server - Version: Not Applicable and later   [Release: N/A and later ]
Sun Fire E6900 Server - Version: Not Applicable and later    [Release: N/A and later]
Sun SPARC Enterprise T5440 Server - Version: Not Applicable and later    [Release: N/A and later]
Sun SPARC Enterprise M9000-32 Server - Version: Not Applicable and later    [Release: N/A and later]
Sun Fire V890 Server - Version: Not Applicable and later    [Release: N/A and later]
Oracle Solaris on SPARC (64-bit)
Oracle Solaris on SPARC (32-bit)

Goal


Description
Sun Enhanced Memory DIMM Replacement Policy for SPARC

The rules detailed in this Policy apply to all supported machines that use the SPARC architecture.

Solution

Sun's Sparc/Solaris DIMM Replacement Policy - Version 20110629

Note: The rules detailed in this Policy apply to all supported machines that use the SPARC architecture.

Oracle's DIMM Replacement Policy for SPARC Platforms

Replace a DIMM:

1. If system firmware (for example, POST, host-config) fails it prior to the Fault Management system coming up.

2. If the Fault Management system tells you to.

3. If an authorized Oracle service person tells you to either based on advice from a Fault Management knowledge article, or because of a documented defect in Fault Management for which a patch is not yet available or has not yet been applied to the system.


Copyright (c) 2008, Oracle Corporation. All Rights Reserved.
Original version: Nov. 17. 2004
Updated March 16, 2006
Updated January 13, 2010 (Updated Rule 5A, added Rule 5B)
Updated March 5, 2010 (removed Rule 4B)
Updated March 11, 2010 (modified Rules 5.1.3.1 and 5.1.3.2)
Updated March 12, 2010 (spelling correction: depending )
Updated June 23, 2010 (Corrected typo in 5.1.2 to remove duplicated text)
Updated June 29, 2011 Revised Replacement Policy Version 20110629/


What tools does the engineer have to make the Fault Management decisions referenced in policy rule#2 above?


1. Solaris 10 and Solaris 11 use (integrated) FMA for the Fault Management System.

2. Systems running prior Solaris OS releases should download and install the CEDIAG diagnostic tool (DocID 1003867.1) to properly identify Memory DIMM replacements.

Due to the nature of legacy OS (prior to Solaris 10 FMA) it may be confusing to see numerous CE correctable error messages and determine when they have reached thresholds for replacement.  One of the features of Solaris 10, 11 FMA is to suppress all this message "noise", and only display FMA fault messages when Fault Management System reaches a diagnosis.

The tool CEDIAG will apply such a memory fault diagnosis engine to older legacy Solaris OS releases (however the messages still come to message file).  CEDIAG will tell you when to replace a DIMM that has reached Fault Management System thresholds.

CEDIAG may be installed to run on a "live" system on demand, or with CRON, and it will automatically analyze the error messages to make a daily report of memory faults.
CEDIAG is also used by Oracle Service to analyze messages from explorer file uploads (offline analysis).

The best analysis is obtained from running CEDIAG installed on a "live" system as root since the live install can also execute 'cestat' to get fault and memory page retirement information directly from the kernel.
- Solaris OS: Solaris 2.5.1 to 9 are eligible for analysis.
- Hardware: sun4u UltraSPARC-II platforms, up to UltraSPARC-IV+

SEE the CEDIAG download instructions below.

cediag(1M) diagnostic tool download and reference:

When deploying the cediag tool, follow the instructions in <Document:1003867.1> Memory DIMM Replacement Tool - cediag FAQ which also provides the software download links where the cediag utility can be obtained.

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in an appropriate My Oracle Support Community, Oracle Sun Technologies Community.


Internal Comments

DIMM Replacement & Related Links

Engineering internal implementation rules for the SPARC Memory DIMM Replacement Policy are no longer published as customer facing policy MOS KM docs.  They are provided here for Oracle-internal reference only.

The only portion of the replacement policy which is public is the first document which was adapted to this MOS KM doc.

MQSC - Memory Quality Steering Committe workspace

https://stbeehive.oracle.com/teamcollab/wiki/Memory+Quality+Steering+Committee:Home

MQSC Documents

dimm.policy_20110629.txt
https://stbeehive.oracle.com/content/dav/st/Memory%20Quality%20Steering%20Committee/Documents/dimm.policy_20110629.txt

dimm.policy_20120111.FMA.txt

https://stbeehive.oracle.com/content/dav/st/Memory%20Quality%20Steering%20Committee/Documents/dimm.policy.FMA.txt
 
dimm.policy_20120111.US-III,IV.txt
https://stbeehive.oracle.com/content/dav/st/Memory%20Quality%20Steering%20Committee/Documents/dimm.policy.US-III,IV.txt



Refer all questions and comments to:
[email protected]


UltraSPARC, II, III, IV, IV+, T1, T2; SPARC T3, T4; SPARC64 VI, VII, VII+ Memory DIMM Replacement Policy for SPARC  (DocID 1010905.1)
Previously Published As 79928 and 215045

References

<NOTE:1000869.1> - DIMM Replacement Policy for Sun x86/x64 systems
<NOTE:1003867.1> - Memory DIMM Replacement Management Tool - cediag FAQ
<NOTE:1004729.1> - Introduction to Solaris[TM] Operating System CE/UE/ECC/CBB/CBI/DBB/DBI Error Messages

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback