Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1005457.1
Update Date:2012-07-30
Keywords:

Solution Type  Troubleshooting Sure

Solution  1005457.1 :   Troubleshooting CPU sram, (l2sram, L3sram) and Memory Error(s) with Solaris[TM] Up  


Related Items
  • Sun Fire 4810 Server
  •  
  • Sun Fire 3800 Server
  •  
  • Sun Netra 1290 Server
  •  
  • Sun Fire E6900 Server
  •  
  • Sun Fire 6800 Server
  •  
  • Sun Fire V1280 Server
  •  
  • Sun Fire 4800 Server
  •  
  • Sun Fire E2900 Server
  •  
  • Sun Fire E4900 Server
  •  
  • Sun Netra 1280 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: Exx00
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>Entry-Level Servers
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>Midrange Servers
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>Midrange V and Netra Servers
  •  

PreviouslyPublishedAs
207580


Applies to:

Sun Fire 4810 Server - Version Not Applicable and later
Sun Fire 4800 Server - Version Not Applicable and later
Sun Fire 6800 Server - Version Not Applicable and later
Sun Netra 1290 Server - Version Not Applicable and later
Sun Fire E2900 Server - Version Not Applicable and later
All Platforms

Purpose

This document discusses how to troubleshoot CPU sram, (l2sram, L3sram) and Memory Error(s) With Solaris[TM] Up

Troubleshooting Steps

Symptoms

  • The customer may be reporting that the system crashed, rebooted, panic'ed, got UE errors, CE errors, or  went  down.  The system then came back up.

  • Specifically the customer may be reporting:

    • The system may have unexpectedly rebooted and cause is unknown.

    • The system may have received UE, ECC errors, or recoverable memory errors.

    • The customer may say the system crashed, gone down, panic'ed, rebooted, or received CPU or memory errors

System Type and Configuration

  • Sun Fire [TM] v1280/E2900 & Netra [TM] 1280/1290 (LighWeight8 Servers)

  • Sun Fire[TM] 3800/4800/4810/6800/E4900/E6900 (Serengeti Servers)

  • Solaris[TM] 8 (U5 and higher), Solaris[TM] 9, or Solaris[TM] 10

Assumptions:

  • The appropriate Solaris domain is accessible.

  • The system controller is also accessible.



Steps to Follow
Please validate that each troubleshooting step below is true for your environment. Each step will provide instructions or a link to the document for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.
1. Verify Solaris[TM] is running and capture Solaris version.

  • Perform a   uname -a   to confirm Solaris version and operational status

2. Verify this is not the issue described in Sun Alert 1000495.1

Sun Fire[TM] Systems Equipped With UltraSPARC IV+ Processor Modules Running Solaris 9 or Solaris 10 may Exhibit Unnecessary CPU Offlining and Solaris Panics

 

3. Verify that the appropriate diagnostic tools identify a FRU (Field Replaceable Unit) which requires replacement.

  • The appropriate tools are:

    • For Solaris 8 & 9:

      • cediag: See Memory DIMM Replacement Management Tool - cediag FAQ (Doc ID 1003867.1)

    • Solaris 10:

      • fmdump: See Solaris[TM] 10 Operating System: Displaying the list of Fault Management Architecture (FMA) resources currently believed to be faulted (Doc ID 1018939.1)

  • The following documents are available for reference:

    • Sun Enhanced Memory DIMM Replacement Policy (Doc ID 1010905.1)

4. Verify you can provide to Sun Support Services the recent error event data.

Run explorer to collect the proper data.

  •   Oracle Explorer Data Collector - Product Information Center (Doc ID 1312847.1)


    • For more information on Explorer see: Oracle Services Tools Bundle Doc ID 1153444.1 and Oracle Services Tools Bundle Frequently Asked Questions (Doc ID 1287574.1)

Exception process for minimum data collection to begin diagnosis:

  • It is always  Best Practices to provide Explorer output.  The complexity of the problems being resolved demands that Explorer data be available if at all possible.  If there might be a delay in having this data available and troubleshooting needs to begin immediately, at minimum provide the following output:

    • For Solaris 10: 

      • fmdump -v

    • For Solaris 8 and 9 collect the following from Solaris:

      • /var/adm/messages 

      • prtdiag -v

5.  At this point, if you have validated that each troubleshooting step above is true for your environment, and the issue still exists, further troubleshooting is required.  Collaborate with Sun Support Services and provide the data suggested in the previous steps or if there is a question with any step of this troubleshooting process.


Product
Sun Fire V1280 Server
Sun Fire E6900 Server
Sun Fire E4900 Server
Sun Fire E2900 Server
Sun Fire 6800 Server
Sun Fire 4810 Server
Sun Fire 4800 Server
Sun Fire 3800 Server
Sun Netra 1290 Server
Netra 1280 Server


Internal Troubleshooting Resources and Document Information

For assistance in analyzing the error messages for this issue use the following:

<Document 1008263.1> - How to investigate CPU/Memory faults with Solaris[TM] FMA
<Document 1010934.1> - Findaft - an AFT, CPU, Memory and PCI ECC error message summary script.
<Document 1012314.1> - What to look for if hardware errors persist after an onsite visit
If errors resist please collaborate with next level of support.

This document contains normalized content and is managed by the the DomainLead(s) of the respective domains.  To notify content owners of a knowledge gap contained in this
document, and/or prior to updating this document, please contact the domain engineers that are managing this document via the Document
Feedback alias(es) listed below:

 
UE, CE, EDU:ST, EDU:BLD, WDU, CPU,UCU, DUE, Bus Timeout, TO, BERR, Event, AFT1, Uncorrectable system @bus, uncorrectable error detected, Mtag, uncorrectable, memory


References

<NOTE:1000495.1> - Sun Fire Systems Equipped With UltraSPARC IV+ Processor Modules Running Solaris 9 or Solaris 10 may Exhibit Unnecessary CPU Offlining and Solaris Panics
<NOTE:1003867.1> - Memory DIMM Replacement Management Tool - cediag FAQ
@<NOTE:1008263.1> - How to Troubleshoot CPU/Memory faults with Solaris[TM] FMA
<NOTE:1010905.1> - Sun Enhanced Memory DIMM Replacement Policy for SPARC
@<NOTE:1010934.1> - Findaft - an AFT, CPU, Memory and PCI ECC error message summary script.
@<NOTE:1012314.1> - What to look for if hardware errors persist after an onsite visit
@<NOTE:1018748.1> - How to Run Oracle Explorer and Forward the Data to an Oracle Service Engineer
<NOTE:1018939.1> - Solaris 10 Operating System: Displaying the list of Fault Management Architecture (FMA) resources currently believed to be faulted

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback