Asset ID: |
1-75-1005457.1 |
Update Date: | 2012-07-30 |
Keywords: | |
Solution Type
Troubleshooting Sure
Solution
1005457.1
:
Troubleshooting CPU sram, (l2sram, L3sram) and Memory Error(s) with Solaris[TM] Up
Related Items |
- Sun Fire 4810 Server
- Sun Fire 3800 Server
- Sun Netra 1290 Server
- Sun Fire E6900 Server
- Sun Fire 6800 Server
- Sun Fire V1280 Server
- Sun Fire 4800 Server
- Sun Fire E2900 Server
- Sun Fire E4900 Server
- Sun Netra 1280 Server
|
Related Categories |
- PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: Exx00
- .Old GCS Categories>Sun Microsystems>Servers>Entry-Level Servers
- .Old GCS Categories>Sun Microsystems>Servers>Midrange Servers
- .Old GCS Categories>Sun Microsystems>Servers>Midrange V and Netra Servers
|
PreviouslyPublishedAs
207580
Applies to:
Sun Fire 4810 Server - Version Not Applicable and later
Sun Fire 4800 Server - Version Not Applicable and later
Sun Fire 6800 Server - Version Not Applicable and later
Sun Netra 1290 Server - Version Not Applicable and later
Sun Fire E2900 Server - Version Not Applicable and later
All Platforms
Purpose
This document discusses how to troubleshoot CPU sram, (l2sram, L3sram) and Memory Error(s) With Solaris[TM] Up
Troubleshooting Steps
Symptoms
-
The customer may be reporting that the system crashed, rebooted, panic'ed, got UE errors, CE errors, or went down. The system then came back up.
-
Specifically the customer may be reporting:
-
The system may have unexpectedly rebooted and cause is unknown.
-
The system may have received UE, ECC errors, or recoverable memory errors.
-
The customer may say the system crashed, gone down, panic'ed, rebooted, or received CPU or memory errors
System Type and Configuration
-
Sun Fire [TM] v1280/E2900 & Netra [TM] 1280/1290 (LighWeight8 Servers)
-
Sun Fire[TM] 3800/4800/4810/6800/E4900/E6900 (Serengeti Servers)
-
Solaris[TM] 8 (U5 and higher), Solaris[TM] 9, or Solaris[TM] 10
Assumptions:
Steps to Follow
Please validate that each troubleshooting step below is true for your environment. Each step will provide instructions or a link to the document for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.
1. Verify Solaris[TM] is running and capture Solaris version.
2. Verify this is not the issue described in Sun Alert 1000495.1
Sun Fire[TM] Systems Equipped With UltraSPARC IV+ Processor Modules Running Solaris 9 or Solaris 10 may Exhibit Unnecessary CPU Offlining and Solaris Panics
3. Verify that the appropriate diagnostic tools identify a FRU (Field Replaceable Unit) which requires replacement.
4. Verify you can provide to Sun Support Services the recent error event data.
Run explorer to collect the proper data.
Exception process for minimum data collection to begin diagnosis:
-
It is always Best Practices to provide Explorer output. The complexity of the problems being resolved demands that Explorer data be available if at all possible. If there might be a delay in having this data available and troubleshooting needs to begin immediately, at minimum provide the following output:
5. At this point, if you have validated that each troubleshooting step above is true for your environment, and the issue still exists, further troubleshooting is required. Collaborate with Sun Support Services and provide the data suggested in the previous steps or if there is a question with any step of this troubleshooting process.
Product
Sun Fire V1280 Server
Sun Fire E6900 Server
Sun Fire E4900 Server
Sun Fire E2900 Server
Sun Fire 6800 Server
Sun Fire 4810 Server
Sun Fire 4800 Server
Sun Fire 3800 Server
Sun Netra 1290 Server
Netra 1280 Server
Internal Troubleshooting Resources and Document Information
For assistance in analyzing the error messages for this issue use the following:
<Document 1008263.1> - How to investigate CPU/Memory faults with Solaris[TM] FMA
<Document 1010934.1> - Findaft - an AFT, CPU, Memory and PCI ECC error message summary script.
<Document 1012314.1> - What to look for if hardware errors persist after an onsite visit
If errors resist please collaborate with next level of support.
This document contains normalized content and is managed by the the DomainLead(s) of the respective domains. To notify content owners of a knowledge gap contained in this
document, and/or prior to updating this document, please contact the domain engineers that are managing this document via the Document
Feedback alias(es) listed below:
UE, CE, EDU:ST, EDU:BLD, WDU, CPU,UCU, DUE, Bus Timeout, TO, BERR, Event, AFT1, Uncorrectable system @bus, uncorrectable error detected, Mtag, uncorrectable, memory
References
<NOTE:1000495.1> - Sun Fire Systems Equipped With UltraSPARC IV+ Processor Modules Running Solaris 9 or Solaris 10 may Exhibit Unnecessary CPU Offlining and Solaris Panics
<NOTE:1003867.1> - Memory DIMM Replacement Management Tool - cediag FAQ
@<NOTE:1008263.1> - How to Troubleshoot CPU/Memory faults with Solaris[TM] FMA
<NOTE:1010905.1> - Sun Enhanced Memory DIMM Replacement Policy for SPARC
@<NOTE:1010934.1> - Findaft - an AFT, CPU, Memory and PCI ECC error message summary script.
@<NOTE:1012314.1> - What to look for if hardware errors persist after an onsite visit
@<NOTE:1018748.1> - How to Run Oracle Explorer and Forward the Data to an Oracle Service Engineer
<NOTE:1018939.1> - Solaris 10 Operating System: Displaying the list of Fault Management Architecture (FMA) resources currently believed to be faulted
Attachments
This solution has no attachment