Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1008384.1
Update Date:2011-06-01
Keywords:

Solution Type  Troubleshooting Sure

Solution  1008384.1 :   Analyzing memory messages and hardware replacement needs for Sun SPARC systems.  


Related Items
  • Sun Fire V440 Server
  •  
  • Sun Fire V250 Server
  •  
  • Sun Fire V480 Server
  •  
  • Sun Fire V240 Server
  •  
  • Sun Fire V890 Server
  •  
  • Sun Fire V880 Server
  •  
  • Sun SPARC Enterprise T2000 Server
  •  
  • Sun Fire V490 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>CMT Servers
  •  

PreviouslyPublishedAs
211466


Applies to:

Sun Fire V890 Server
Sun SPARC Enterprise T2000 Server
Sun Fire V240 Server
Sun Fire V250 Server
Sun Fire V440 Server
All Platforms

Purpose

This document will discuss various memory messages, and issues pertaining to memory configuration and compatibility on Sun VSP SPARC systems.

Last Review Date

May 12, 2011

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details

Analyzing memory messages and hardware replacement needs for Sun SPARC systems.

Symptoms:

  • Bad DIMM
  • Bad CPU
  • CE errors-all normal
  • Disabled DIMM's
  • FMA not cleared
  • mis-configured memory
  • incompatible memory
Place the Troubleshooting Steps here.

1. Confirm memory meets minimum configuration requirements. Memory needs on Sun systems will vary. The configuration will need to be confirmed on the specific machine in question. The best place to find this information is by referencing the online Sun System Handbook.

2.  Verify the system meets Sun's memory compatibility guidelines (use the Sun System Handbook). This also has to be confirmed on the specific system in question.

3. Verify error exceeds Sun Best Practices thresholds. An individual memory module can log numerous corrected errors (CE's) before replacement is recommended. It is also necessary to verify the version of Solaris[TM] the system is running and it's current patch level to determine if thresholds are exceeded. Use <Document 1010905.1>.

4.  Verify error is not root cause to DIMM (bad DIMM) using <Document 1004729.1>


5. For memory errors in Solaris 10, verify FMA fault logs were cleared.

If system is running Solaris 10, the DIMM has been replaced but continues to log FMA errors in the /var/adm/messages file, the faults will need to be repaired in Solaris.

To repair the FMA faults and error logs from Solaris run:

# fmadm faulty

you will see:

FMADM faulty
STATE RESOURCE / UUID


For each fault listed under fmadm faulty run:

# fmadm repair <uuid#>


Check fmadm faulty again to make sure faults have been repaired.

6.  Verify error is not root cause to CPU (bad CPU). Memory errors are usually caused by a faulty or failing DIMM but in some cases a bad CPU writer could be at fault. (Requires Sun Support to be engaged.)


7.  Verify error is not root cause to motherboard slot (bad MB). If a DIMM that has been replaced continues to log errors, the slot on the motherboard may be faulty. To verify the issue is with the slot on the motherboard as opposed to the DIMM itself, swap memory modules keeping track of DIMM that logged the orignal errors with a known good DIMM. If the error moves with the DIMM then its the memory module that's faulty. If errors are logged on swapped DIMM sitting in the slot in question then the motherboard is at fault and will need to be replaced.

8.  At this point, if you have validated that each troubleshooting step above is true for your environment and the issue still exists, further troubleshooting is required. Gather Explorer data collector from the system then contact Sun Support.



Sun Engineers:  Reference Document 1010921.1 to continue investigation from STEP 6 above.

This document contains normalized content and is managed by the the Domain Lead(s) of the respective domains. To notify content owners of a knowledge gap contained in this document, and/or prior to updating this document, please contact the domain engineers that are managing this document via the "Document Feedback" alias(es) listed below:
Domain Engineer/Lead : Dencho Kojucharov

Feedback alias:  [email protected]
normalized, memory errors
Previously Published As
91314

Change History
Date: 2009-11-6
User Name: 103287
Action: Updated
Comment: Removed link to an article that is being archived because it's bad (old information) and duplicates what is found in the SSH.
Date: 2007-12-14
User Name: 71396
Action: Approved
Comment: Performed final review of article.



Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback