Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Troubleshooting Sure Solution 1019646.1 : Troubleshooting Interconnect errors on Sun Fire[TM] v1280, 3800, 4800, 4810, 6800, E2900, E4900, E6900, and Netra 1280, 1290 systems.
PreviouslyPublishedAs 242866
Applies to:Sun Fire 6800 ServerSun Netra 1290 Server Sun Fire V1280 Server Sun Fire 4810 Server Sun Fire E4900 Server All Platforms PurposeDescriptionThis document provides the basic troubleshooting steps to follow when needing to diagnose the cause of Interconnect Errors on Sun Fire[TM] Midrange ServersSymptoms:
NOTE: The example errors can be associated to any domain, RP, or any System Board (SB) or I/O Board (IB), and the examples above are not exclusive to these faults.Failed AR interconnect test. System Type:
Last Review DateJuly 23, 2010Instructions for the ReaderA Troubleshooting Guide is provided to assist
in debugging a specific issue. When possible, diagnostic tools are included in the document
to assist in troubleshooting.
Troubleshooting DetailsSteps to FollowCollect the appropriate troubleshooting data and contact Sun Support Services. The error you have encountered is a board interconnection issue. Essentially, this is a board connectivity issue. It is likely a hardware defect, a board or slot issue, or a board "seating" issue. The event requires that a Sun Support Engineer is engaged to diagnose and resolve this event. Please contact Sun Support Services in order to diagnose this issue. Being prepared with the following troubleshooting data will allow that engineer to immediately begin diagnosis of the issue, and decrease the time to resolution. Please provide:
Internal Comments Please validate that each troubleshooting step below is true for your environment. The steps will provide instructions or a link to a document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step. 1. Verify the components implicated in the interconnect errors were not recently replaced, reseated, or "handled". - Recently "handled" hardware would include any board that has been removed or inserted to replace it or hardware components contained on it. - Since the error is an interconnection problem, the physical act of servicing or handling the board could be the cause of the problem. Reference: Document 1019218.1 Sun Fire[TM] Midrange Servers: How to identify pin or socket damage. 2. Verify that the errors persist after executing System Controller Failover (dual SC config) or an SC Reset (single SC config). - Failover (scfailover) is only available on systems with Dual SCs. Reference: Document 1003245.1 Sun Fire[TM] 3800-6900: System Controller failover functionality - On Sun Fire[TM] v1280/E2900 and Netra[TM} 1280/1290 (single SC configurations) you will need to utilize the resetsc command to reset the SC and confirm it's sanity. Reference: Document 1012388.1 Sun Fire[TM] V1280/2900 LOM Quick Command Reference - If errors persist on both SCs or after the resetsc is issued, proceed to Step 3. - If errors go away after the resetsc you are done. - If they go away after executing scfailover, fail back to the original Main SC and confirm the errors persist again. - Replace the SC if they do. 3. Confirm that you are able to determine the suspect list for this issue and prioritize which suspect is most likely to be root cause. - See Document 1019649.1 How to determine the suspect list for Sun Fire[™] Midrange Server interconnect errors. 4. Verify that the primary FRU is NOT defective (primary FRU determined by the results of Step 3). - If a System Board or I/O Board is implicated, it can be verified as defective two different ways: - By replacing the board. - By having a Sun engineer move the suspect board into an empty slot or switch it with another board in the domain and observe the behavior. - If the board works in the alternate slot, the RP or the board slot (CP) is implicated (proceed to Step 4). - If the board fails to work in the alternate slot, the board is defective, so replace it. - If a Repeater (RP) is implicated, it can be verified as defective two different ways: - By replacing it. - By having an engineer switch the suspect RP with an alternate RP in th system and observe the behavior. - If the error follows the RP to it's new location, then the RP is defective, so replace it. - If the failure remains at the old RP's slot, then the Centerplane is suspect. - The Sun engineer performing any replacement or moving any hardware should be extremely careful to inspect the board and CP pins and sockets. Reference: Document 1019218.1 Sun Fire[TM] Midrange Servers: How to identify pin or socket damage. 5. Verify that the secondary FRU is not defective (secondary FRU determined by the results of Step 3). - If a System Board or I/O Board is implicated, it can be verified as defective two different ways: - By replacing the board. - By having a Sun engineer move the suspect board into an empty slot or switch it with another board in the domain and observe the behavior. - If the board works in the alternate slot, the RP or the board slot (CP) is implicated (proceed to Step 4) - If the board fails to work in the alternate slot, the board is defective, so replace it. - If a Repeater (RP) is implicated, it can be verified as defective two different ways: - By replacing it. - By having a Sun engineer switch the suspect RP with an alternate RP in the same system and observe the behavior. - If the error follows the RP to it's new location, then the RP is defective, so replace it. - If the failure remains at the old RP's slot, then the Centerplane is suspect. - The Sun engineer performing any replacement or moving any hardware should be extremely careful to inspect the board and CP pins and sockets. Reference: Document 1019218.1 Sun Fire[TM] Midrange Servers: How to identify pin or socket damage. 6. Collaborate with TSC prior to proceeding to a Centerplane replacement. - Make sure to have console data, explorer data, and a detailed explanation of what has been replaced, and when available when collaborating with TSC. - Most likely the Centerplane will have to be replaced, but TSC will want make absolutely sure that nothing has been overlooked before proceeding to this invasive replacement action. NOTE: The testinterconnect command can be utilized to test board interconnections if you obtain a service mode password (setkeyswitch on also accomplishes this testing). For details on testinterconnect command usage refer to Document 1005014.1. Document Information: This document contains normalized content and is managed by the the Content Lead(s) of the respective domains. Please provide feedback using the Add Comment link on this article to notify of a needed modification. Support Aliases: [email protected] or [email protected] Alias Archives: http://archives.central/alias/serengeti-support or http://archives.central/alias/lw8-support Instant Messenger Chat Room: Gl-ESG Service Request Queue: GL-ESG Keywords: interconnect, Interconnect, interconnect test, interconnection, testinterconnect, Service action required, failure, POST, normalized Attachments This solution has no attachment |
||||||||||||
|