![]() | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Technical Instruction Sure Solution 1004797.1 : Sun Enterprise[TM] 3500/4500/5500/6500 Servers: Fatal Reset FAQ
PreviouslyPublishedAs 206657 In this Document
Applies to:Sun Enterprise 3500 ServerSun Enterprise 4500 Server Sun Enterprise 5500 Server Sun Enterprise 6500 Server All Platforms GoalThis document provides answers to frequently asked questions pertaining to fatal resets on Sun Enterprise[TM] 3500/4500/5500/6500 ServersSolutionDescriptionThis document provides answers to frequently asked questions pertaining to fatal resets on Sun Enterprise[TM] 3500/4500/5500/6500 Servers. Steps to Follow FAQ's: Question: What troubleshooting data is created What is the system's response to a Fatal Reset When a Fatal Reset is detected, a CPU will immediately 'reset' (see above, What is a Fatal Reset ), resulting in a Power-On Reset (POR on Enterprise systems), Externally Initiated Reset (XIR on Sun Fire[TM] systems). Power-On Self-Test (POST) diagnostics are run at maximum level ( diag-level=max ) as dictated by system firmware. Unfortunately, the needed troubleshooting data is displayed only to the system controller or console. If the console is not logged (by connecting external hardware to the serial port), the root cause information is lost. In the case of an intermittent error which caused the Fatal Reset, POST might not find the offending component. In other cases of hard failed components, POST will detect them, mark them as failed, and continue with the POST. Different systems respond differently but generally the Automatic System Reconfiguration (ASR) process is initiated to remove failed components and try to configure an operable system. During the next system boot , the operating system detects the prior fatal reset and a message is logged to syslog stating "System booting after fatal error FATAL". Additionally, each type of system and firmware revision has different Open Boot Parameters (OBP) that control how it responds to a fatal reset. Thus, users should reference product specific documentation for more details. Question: How is a Fatal Reset identified Answer: Fatal Reset error messages are only visible from the machine console or system controller. To see these messages it is necessary to log the output from the console. Additionally, Sun Fire[TM] systems controllers usually have a small first in first out (FIFO) ring buffer where data is logged. It is possible that the initial, relevant fatal reset message is flushed off the end of the buffer with the subsequent boot that takes place. To alleviate this issue, the system controller should be logged using syslogd(1M). There are many documents and resources dedicated to console and system controller logging. Reference SOLUTION 211946 to capture Fatal Reset output for Sun systems Question: How are Fatal Resets diagnosed Answer: Generally, analysis of the type of error, as displayed in the Error Status Register (ESR), and Asynchronous Fault Status Register (AFSR), and the components involved, as displayed in Asynchronous Fault Address Register (AFAR) will identify the component that caused the error. Question : Where can I find more information on "fatal resets" Answer : Search MOS with Advance Options and select the product via keyword string "fatal resets" for the latest resources.
Sun Enterprise 6500 Server For more information on Fatal Resets, please refer to: Troubleshooting Guide Fatal Reset Decoder Tools: Decoder Referenceshttp://download.oracle.com/docs/cd/E19095-01/index.htmlAttachments This solution has no attachment |
||||||||||||
|