Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Technical Instruction Sure Solution 1004797.1 : Sun Enterprise[TM] 3500/4500/5500/6500 Servers: “Fatal Reset” FAQ
PreviouslyPublishedAs 206657 Description This document provides answers to frequently asked questions pertaining to fatal resets on Sun Enterprise[TM] 3500/4500/5500/6500 Servers. Steps to Follow FAQ's: Question: What troubleshooting data is created What is the system's response to a Fatal Reset When a Fatal Reset is detected, a CPU will immediately 'reset' (see above, What is a Fatal Reset ), resulting in a Power-On Reset (POR on Enterprise systems), Externally Initiated Reset (XIR on Sun Fire[TM] systems). Power-On Self-Test (POST) diagnostics are run at maximum level ( diag-level=max ) as dictated by system firmware. Unfortunately, the needed troubleshooting data is displayed only to the system controller or console. If the console is not logged (by connecting external hardware to the serial port), the root cause information is lost. In the case of an intermittent error which caused the Fatal Reset, POST might not find the offending component. In other cases of hard failed components, POST will detect them, mark them as failed, and continue with the POST. Different systems respond differently but generally the Automatic System Reconfiguration (ASR) process is initiated to remove failed components and try to configure an operable system. During the next system boot , the operating system detects the prior fatal reset and a message is logged to syslog stating "System booting after fatal error FATAL". Additionally, each type of system and firmware revision has different Open Boot Parameters (OBP) that control how it responds to a fatal reset. Thus, users should reference product specific documentation for more details. Question: How is a Fatal Reset identified Answer: Fatal Reset error messages are only visible from the machine console or system controller. To see these messages it is necessary to log the output from the console. Additionally, Sun Fire[TM] systems controllers usually have a small first in first out (FIFO) ring buffer where data is logged. It is possible that the initial, relevant fatal reset message is flushed off the end of the buffer with the subsequent boot that takes place. To alleviate this issue, the system controller should be logged using syslogd(1M). There are many documents and resources dedicated to console and system controller logging. Reference SOLUTION 211946 to capture Fatal Reset output for Sun systems Question: How are Fatal Resets diagnosed Answer: Although this is outside the scope of this document, there are many other InfoDocs and SRDBs related to this topic. Generally, analysis of the type of error, as displayed in the Error Status Register (ESR), and Asynchronous Fault Status Register (AFSR), and the components involved, as displayed in Asynchronous Fault Address Register (AFAR) will identify the component that caused the error. Question : Where can I find more information on "fatal resets" Answer : Search both Sun Product Documentation and SunSolve via keyword string "fatal resets" for the latest resources. Contract customers may access additional SunSolve resources by logging into the repository with their unique username and password. The username and password are created by contract customers as part of SunSolve On-line Registration, which requires Terms of Use acceptance and a Sun Support Contract Number. Product Sun Enterprise 6500 Server Sun Enterprise 5500 Server Sun Enterprise 4500 Server Sun Enterprise 3500 Server Internal Comments Audited/updated 11/06/09 - [email protected], Mid-Range Systems Content Team To report Fatal Resets, please refer to the following web site: For more information on Fatal Resets, please refer to:
fatal, reset, FATAL, error Previously Published As 51105 Change History Date: 2006-01-18 User Name: 97961 Action: Update Canceled Comment: *** Restored Published Content *** SSH AUDIT. [email protected] KDO Knowledge Engineer Version: 0 Date: 2006-01-18 User Name: 97961 Action: Update Started Comment: SSH AUDIT. [email protected] KDO Knowledge Engineer Version: 0 Date: 2006-01-17 User Name: 97961 Action: Update Canceled Comment: *** Restored Published Content *** SSH AUDIT. [email protected] KDO Knowledge Engineer Version: 0 Date: 2006-01-17 User Name: 97961 Action: Update Started Comment: SSH AUDIT. Attachments This solution has no attachment |
||||||||||||
|