Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Problem Resolution Sure Solution 1017926.1 : Sun Fire[TM] 3800-6800: Troubleshooting NCPQ_TO errors
PreviouslyPublishedAs 229185
Applies to:Sun Fire 3800 ServerSun Fire 4800 Server Sun Fire 6800 Server All Platforms SymptomsDescription:This document aids in troubleshooting Non-Cacheable Pending Queue Time Outs (NCPQ_TO) on Sun Fire 3800/4800/4810/6800 systems. NCPQ_TOs occur when data requests in Non-Cacheable address space do not complete a transaction. Non-Cacheable addresses space is Safari Device config and I/O address space.In most situations this is a type of error that requires Support Services to be engaged in order to be resolved. Please review the document to see if the resolution to this issue is available within, but if not, you will likely be required to log a Service Request with Support Services to get resolution. Please mention this article when logging the Service Request. Symptoms:Error messages indicating a NCPQ_TO occurred are seen on the Domain Console. The error messages are also stored in the Domain Console Buffer and can be retrieved by the Sun Fire System Controller (SC) command showlogs. If a loghost is configured, the error messages are stored on the loghost. NCPQ_TOs can occur during normal operation of the Domain or during POST.Here is an example log of a NCPQ_TO error: Feb 26 10:46:02 systemx DomC.SC: ErrorMonitor:Domain C has a SYSTEM ERROR CauseInterpretation of the example error above:A System error is detected and Domain C is PAUSED. From the device path in the error messages it can be determined that the error is detected on SB1 CPU A ./partition1/domain0/SB1/bbcGroup0/cpuAB/cpusafariagent0
Non-Cacheable Schizo Device Pair Agent ID 1E Leaf B. (I/O Boat 9 Slots 0,1,2 )Use Document 1006063.1 for decoding. Possible Causes:There are many possible hardware and software root causes for NCPQ_TOs. They can be caused by faulty CPUs, I/O Bridge ASICs (Schizo), PCI cards as well as Bugs in the Microcode of cPCI/PCI cards.The following scenarios have been known to cause NCPQ_TOs on Sun Fire 3800-6800 systems:
SolutionTroubleshooting:In general the device indicated by the AFAR_2 is likely to be the cause for the NCPQ_TO. However the device reporting the error can as well be the cause. It is advised to investigate whether the errors are a result of newly installed or moved cPCI or PCI adapters. Make sure to reseat any newly installed or relocated adapters. Make sure that drivers are up to date on the cards as well. Assuming this is not a newly installed PCI card (or driver issue), please collect an extended Explorer (see Document 1019066.1) and open a Service Request with Support Services. If an NCPQ_TO occurs the following steps should be taken to isolate the suspect FRU: Run POST with a diag level set to default or higher. @ - If the error is not reproducible escalate the issue to the next level of technical support. Provide the necessary logs and explorer data of the Domain & the Sun Fire SC. - If the error is reproducible please use the AFAR_2. See Document 1006063.1 for how to decode the AFAR_2. Depending on the AFAR_2 decoding two cases can be differentiated. Important to note is that the decoding of the AFAR_2 varies with the Firmware Version. The AFAR_2 decodes to an address in : Safari device config area : - The AFAR_2 decoding results in a Safari Agent ID # which points to a CPU on a CPU/Memory board or a Schizo on an I/O Boat. (Example for firmware 5.12.X) 0x00000400.0a400010 -> Safari Agent ID 14(hex), CPU0 on CPU/Memory board 5. - By either disabling the CPU or the deleting the CPU/Memory or I/O Boat from the configuration, the suspect FRU can be isolated. Run POST for verification. Schizo PCI Config & IO and Schizo I/O Board area : - The AFAR_2 decoding results in a Safari Agent ID # which is a Schizo and the Leaf on that Schizo. (Example for firmware 5.12.X) 0x00000402.61000380 -> Safari Agent 18(hex) Schizo 0 Leaf B, on I/O Boat 6 P0 B1. - By this we can only determine the fault down to a leaf. Leaf A supports one card slot, leaf B supports multiple card slots. - By disabling the Schizo or removing cPCI/PCI cards the error can be eliminated to a single component. Do not use the disablecompontent command for slots or leafs on I/O Boats in debugging NCPQ_TO. To narrow it down to a single card, cards need to be physically remove. - If parts are replaced based on a NCPQ_TO please attach the error log to the failing part and send it in for CPAS. References and bug IDs: Document 1008674.1 Sun Fire (3800-6800): Physical Device Mapping for I/O boats Document 1006063.1 Address Space Assignment Keyword: Sun Fire 6800,Sun Fire 4800, Sun Fire 3800, NCPQ_TO Previously Published As 48834 Attachments This solution has no attachment |
||||||||||||
|