Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Troubleshooting Sure Solution 1005476.1 : Troubleshooting Level 2 Check Errors (L2CheckError) on Sun Fire[TM] 3800/4800/4810/6800/E2900/E4900/E6900 & Netra[TM] 1280/1290
PreviouslyPublishedAs 207600
Applies to:Sun Fire E2900 ServerSun Fire E4900 Server Sun Fire E6900 Server Sun Fire 6800 Server Sun Netra 1290 Server All Platforms PurposeDescriptionThis document provides the steps required to be followed to troubleshoot Level 2 Check Error events (L2CheckErrors) on Sun Fire[TM] Midrange servers. Symptoms:
System Type and Configuration:
Notes: The system configuration includes at least System Controller Application (ScApp) 5.15.x. A device called a Repeater (RP) will be implicated by an L2CheckError event. An RP is a type of board on all systems except for Sun Fire[TM] 3800 where the RPs are located on the system's Backplane/Centerplane.
Assumption: Sun
Shared Shell If you require assistance in collecting the data recommended in this article or require help in diagnosing a system issue, there is a collaborative service tool called Sun Shared Shell which allows Sun Service engineers to remotely view and diagnose customer's systems. Consider using this option to reduce the problem resolution time. Last Review DateMay 19, 2010Instructions for the ReaderA Troubleshooting Guide is provided to assist
in debugging a specific issue. When possible, diagnostic tools are included in the document
to assist in troubleshooting.
Troubleshooting DetailsSteps to FollowPlease validate that each troubleshooting step below is true for your environment. The steps will provide instructions or a link to a document, for
validating the step and taking corrective action as necessary.
The steps are ordered in the most appropriate sequence to isolate the
issue and identify the proper resolution. Please do not skip a
step.
SC-Name:SC> showlogs -d c
2. Verify this is not a "known" memory interleaving
issues Note: Sun Fire[TM] v1280/E2900 & Netra[TM] 1280/1290 are excluded from this step because they can not have a multiple domain configuration. 3. Verify this is not a "known" adjacent domain issue. Note: Sun Fire[TM] v1280/E2900 & Netra[TM] 1280/1290 are excluded from this step because they can not have a multiple domain configuration. 4. Verify this is not a "known" Dynamic Reconfiguration (DR)/cfgadm issue. 5. Customers should contact Sun Support Services, mention this document ID, and verify extended Explorer data is available for analysis or be prepared to use Sun Shared Shell to continue diagnosis of the event. Product Sun Netra 1290 Server Netra 1280 Server Sun Fire V1280 Server Sun Fire E6900 Server Sun Fire E4900 Server Sun Fire E2900 Server Sun Fire 6800 Server Sun Fire 4810 Server Sun Fire 4800 Server Sun Fire 3800 Server Internal Comments Performing Additional Analysis Offline Verify that Steps 1-5 in the Steps to Follow section above have been performed prior to commencing with step 6. 6.Verify this is not a repeat event. A repeat event is an event that has: - has an identical failure signature and suspect indictment list or - the customer may report or feel the event is reoccurring on the same system/platform. Repeat events require collaboration with the next level of support ( Step 11 ). 7. Verify that this event is not caused by a power failure on a System (SB) or I/O Board (IB). A power failure of a System or I/O Board can be easily identified by the following message appearing in the System Controller showlogs or showlogs -v domainID file: Path broken between CBH and SDC:SB# ----> For a SB fault. Path broken between CBH and SDC:IB# ----> For a IB fault. If the message shown above is present for a System Board (SB), utilize <Solution 243326: xxxxx> to resolve this issue If the message shown above is present for an I/O Board (IB), utilize <Solution: 229081> to resolve this issue. 8. Verify that the Auto-Diagnosis (AD) Event Message "FRU-LOC" does not say "UNRESOLVED". The AD Event Messages are contained in the System Controller (SC) log files (showlogs or showlogs -d ). Look for the AD Event Message appropriate to the date/time of the event in question. The following example identifies suspects RP3 and SB0: Jul 07 21:56:47 sc0 Domain-C.SC: [AD] Event: SF6800.ASIC.AR.INC_SYNC_ERR.1024106f CSN: 136M2383 DomainID: C ADInfo: 1.SCAPP.19.3 Time: Wed Jul 07 21:56:38 PDT 2004 FRU-List-Count: 2; FRU-PN: 5014953; FRU-SN: 013023; FRU-LOC: RP3 FRU-PN: 5014362; FRU-SN: 017608; FRU-LOC: /N0/SB0 Recommended-Action: Service action required The following example says "UNRESOLVED": Dec 29 10:23:51 systemx Domain-C.SC: [ID 436815 local5.error] [AD] Event: SF6800 CSN: 313H3174 DomainID: C ADInfo: 1.SCAPP.19.3 Time: Mon Dec 29 10:23:51 CST 2003 FRU-List-Count: 0; FRU-PN: ; FRU-SN: ; FRU-LOC: UNRESOLVED Recommended-Action: Service action required Collaborate with the next level of support (see Step 11) if UNRESOLVED or unable or unsure how to determine this. 9. Identify and replace the Primary Suspect from the AD Event Message "FRU-LOC" indictment. The FRU-LOC (Field Replaceable Unit Location) indictment compose a list of suspects including SBs, IBs, and RPs. Count the number of individual SBs + IBs versus individual RPs listed in the AD Event Message and compare the totals to the table below. --------------------------------------------------- Number of Number of Primary SB & IB RP Suspect --------------------------------------------------- 1 1 SB or IB --------------------------------------------------- 1 2 (or +) SB or IB --------------------------------------------------- 2 (or +) 1 RP --------------------------------------------------- 2 (or +) 2 (or +) Collaborate --------------------------------------------------- From the event message example in Step 7 where SB0 and RP3 were implicated, the Table identifies that when the number of "SB & IB" and "RP" are both "1", it is the SB or IB which is the primary suspect. In this example, that would be SB0 . Collaborate with the next level of support (see Step 11) if unable or unsure how to determine this. 10. Verify the problem does not reoccur within 24 hours after replacing the Primary Suspect. Replacement procedures are located in the Systems Service Manual for each server by accessing the appropriate system's Hardware link through the Midframe & Midrange Servers Product Documentation Website . 11. Verify the latest data is available and collaborate with the next level of support. The information needing to be provided includes: Explorer with the appropriate scextended or 1280extended option as detailed in How to run Sun data and send to Sun engineer If Explorer data can not be collected for whatever reason see Procedure to collect Sunfire Midrange failure data manually Detailed listing of all previous service actions, identifying parts replaced and dates of service. Confirmation that the previous steps in this resolution path were performed (unless this is a repeat event). Resources for continued troubleshooting: <Document: 1006221.1 > Sun Fire[TM] Servers: How L2CheckErrors Happen <Document: 1009156.1 > SDC Parity errors and SDC L2CheckError discussion Document Notes This document contains normalized content and is managed by the the Domain Lead(s) of the respective domains. To notify content owners of a knowledge gap contained in this document, and/or prior to updating this document, please contact the domain engineers that are managing this document via the Document Feedback alias listed below: Product Domain/Family: MSG/Serengeti (Hardware Troubleshooting) MSG/Lw8 (Hardware Troubleshooting) Support Aliases: -Serengeti: [email protected] -Lw8: [email protected] Alias Archives - Serengeti: http://mailfinder3.sfbay/alias/serengeti-support - Lw8: http://mailfinder3.sfbay/alias/lw8-support Instant Message Forum: - GL-ESG Call Management Queue -IBIS: GL-ESG Sun Fire, Lightweight8, Serengeti, lw8, Lw8, Level 2 Check Error, l2checkerror, l2check, L2CheckError, crash, reboot, path broken, CBH, SDC, normalized Previously Published As 88171 Attachments This solution has no attachment |
||||||||||||
|