Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Problem Resolution Sure Solution 1006088.1 : Sun Fire[TM] 12K/15K/E20K/E25K: Dstop: AMX: Detected Header parity error from AXQ
PreviouslyPublishedAs 208489
Applies to:Sun Fire 15K ServerSun Fire 12K Server All Platforms SymptomsDomain Dstop with this error type:AMX: Detected Header parity error from AXQ ADR Ereport: ereport.asic.amx.status.detected_header_parity_error_from_axq Related ADR Ereport: ereport.asic.rmx.status.detected_header_parity_error_from_axq Message in platform log report something similiar to: Event: SF15000-8009-VE CSN: 0409AK20AE DomainID: C ADInfo: 1.SMS-DE.1.6 Time: Fri Jul 14 10:54:07 IST 2006 Recommended-Action: Service action required and 'wfail' output, found in the domain directory /var/opt/SUNWSMS/SMS1.6/adm/<domain letter A..R>/wfailoutput, reports something similar to the following:
CauseThis error will result in domain reboot and in some system components being disabled.SolutionCollect an explorer from the main system controller and contact your authorized service provider.Product Sun Fire 12K Server Sun Fire 15K Server Sun Fire E20K Server Sun Fire E25K Server Keywords 15K, 12K, SF15K, SF12K, Sun Fire 15K, Enterprise, Server, Sun Fire 12K, Dstop, Detected Header parity error from AXQ, amx, rmx, AMX, RMX Internal Section Previously Published As 52162 Detailed troubleshooting info The dump header tells us that this Dstop was generated by dsmd (lines 10,11) while a domain was active. This is also evident by the dumpf file name - dsmd.dstop files are created by dsmd to capture the error state. Also note that 8 errors occured while collecting the state dump (line 12). These errors should be investigated; refer to Doc 1003356.1 (previously 52062) Walking the error chain: - All SDIs concur that the stop message is from EX9 [mstop1 analysis] (lines 14,15) - EX9/SDI0 is responding to a DARB request to stop (line 16) - DARB0 reports errors requested by the AMXs for port 9 (line 19-21) - AMX0.0 port 9 reports a parity error detected from the AXQ (lines 22-24) AMX0.1 port 9 also reports a parity error detected from the AXQ (lines 28-30) - 'wfail' FAILs out configurations using EX9 with the low address bus (line 25) - EX9 and CP half 0 are named as the primary and secondary FRUs (lines 26, 27,32,33) 'wfail' clearly notes that the parity error is between EX9 and the AMXs on CP0. Since both AMXs on CP0 are involved, the CSB supporting CP0 may be a factor. But, an exhaustive search of all AMX ports on CP0 reveals that only port 9 shows parity errors (hint: use the 'repeat' command). As such, the error(s) are isolated to EX9 and slot 9 in the centerplane. Since the pathways from EX9/AXQ and AMX0.0/AMX0.1 cross an interconnect, a single FRU cannot be isolated. Resolution Start with replacement of the expander. If the domain has been in operation for a period of time, bent pins are unlikely (pins don't magically bend when an expander has been in place for a while). If problems persist, replace the centerplane References and bug IDs 1001657.1 - An Overview of Dstop Diagnosis 1006074.1 - Dstop: Using the MStop1 1003356.1 - redx: Tips and Tricks 1010372.1 - Dstop: AMX: Detected Information parity error from AXQ Additional background information When an AMX detects a parity error, the history records in the AMX and AXQ can be compared to isolate single bit error. The 'parse axqoh' command is used to do this isolation. Refer to SRDB 1010372.1 (previously 52113) for an example. ******************************************************************* SMS 1.4 introduces Auto-Diagnosis and Recovery (ADR) for error events on the Sun Fire 12K/15K platform. Events that occur are automatically analyzed on platform and generate Event Codes, also known as Fault Analysis Codes. When translated, the codes provide a Service Action Plan to resolve the error event. Each error event log, produced by the SMS-DE or POST-DE diagnostic engine, is comprised of several layers of information, each layer providing more detail to the error event. The topmost layer is the Event Code. For example, Sf15000-8000-H9, represents a System Board failure. This code represents that a system board event occurred but does not specify what component on the system board has the problem. More detailed error event information is collected deeper within the log by examining the EReports (Error Reports) for each of these error events. The reports provide more detailed descriptions of the detectors of the error that were used in the analysis and diagnosis by the ADR diagnostic engine(s). By examining the EReports, you can identify the component(s) which are actually the root cause to the error event or the component(s) affected by the event. ******************************************************************* See the System Management Services (SMS) Administrator Guide, Chapter 5 for more details of Automatic Diagnosis and Recovery. ******************************************************************* The topic covered in this SRDB is ADR EReport event: ereport.asic.amx.status.detected_header_parity_error_from_axq ereport.asic.rmx.status.detected_header_parity_error_from_axq ******************************************************************* Attachments This solution has no attachment |
||||||||||||
|