Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Technical Instruction Sure Solution 1008390.1 : How to Verify whether a System Reboot is Caused by a Fatal Reset or a Red State Exception
PreviouslyPublishedAs 211473
Applies to:Sun Fire V880 ServerSun Fire V890 Server Sun Fire V210 Server Sun Fire V240 Server Sun Fire V440 Server All Platforms To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community, Oracle Entrylevel Servers. GoalThis document will help identify if the reason for an unexpected or unexplained system reboot is due to a Fatal reset error or a Red State Exception (RSE) condition.Please note that the purpose of this document is to help you with the root cause. In case the symptoms described in this document, are indeed what your system is experiencing, you will need to make a contact with qualified engineers at My Oracle Support (MOS). Please reference this document ID number once you are ready to make contact with MOS Support for assistance. SolutionSteps to FollowThe unexpected reboots are most often caused by hardware faults and reported by the system as a fatal reset or a red state exception. When errors like these occur, the OS is abruptly interrupted and can't continue to log error messages in /var/adm/messages or generate a core file. As a result, the system reboots but the error messages and all output will only appear on the system console (will be in console logs). So in order to do further troubleshooting, it is very important to gather the complete console logs at the time of the error (reboot). 1. The system reboot could be due to fatal reset errors. The fatal errors are most often caused by hardware (bad CPU, MB switches, I/O bridge, etc.) and are the result of an 'illegal' hardware state that is detected by the system. The Fatal Reset error and all output are only logged to the system console (ttya or RSC). Here are examples of fatal errors caused by CPU and motherboard switch ASICs (the full fatal reset output is too long and is not included): ERROR: System Hardware FATAL RESET from CPU0 For systems using ALOM serial console the fatal error would be reported as: Fatal Error Reset When your system reboots after fatal error, you will may also see ONLY a notice in the /var/adm/messages file like this one: [ID 796976 kern.notice] System booting after fatal error FATAL Sys Hardware Also, the prtconf -vp may show Fatal Sys Hardware message under " reset-reason: " # prtconf -vp In case the console logs have fatal errors. If your system is experiencing these errors, please contact a qualified engineer at My Oracle Support (MOS) for assistance. 1.a) For the UltraSPARC III/IV platforms (280R, V480/V880, V490/V890) and UltraSPARC IIIi platforms (V210/V240, V440) a trained MOS Engineer has access to important information along with an AFAR decoder tool and will carefully guide you through the steps to resolution. My Oracle Support can also assist you if you are experiencing V480 Fatal Resets with specific network and I/O configurations. 2. The unexpected reboot could also be due to Red State Exception (RSE) errors. The user needs to verify if the console output has any Red State Exception (RSE) errors. The RSE can be triggered by both Software and/or Hardware, but this condition is most commonly due to a hardware fault (bad DIMM or bad CPU/ L2SRAM). The RSE error and all output are only logged to the system console (ttya or RSC) and usually is reported by one of the CPUs: ERROR: CPU3 RED State Exception If your system does reboot after RSE, you may also see ONLY a notice in the /var/adm/messages file like this one: [ID 993603 kern.notice] System booting after RED CPU RED-StateThe prtconf -vp may show RED CPU RED-State message under " reset-reason: " #prtconf -vp
System Configuration: Sun Microsystems sun4u Memory size: 32768 Megabytes System Peripherals (PROM Nodes): banner-name: 'Sun Fire 880' watchdog-enable: reset-reason: 'RED CPU RED-State' <--- reset-reason In case the console logs have RSE errors, once again, this is a critical issue where you will need a qualified MOS Support Engineer to assist you, so please contact a qualified engineer at MOS for assistance.: 2.a) for the UltraSPARC III/IV platforms (280R, V480/V880, V490/V890) and UltraSPARC IIIi platforms (V210/V240, V440) please contact MOS for assistance.
Internal Comments Attachments This solution has no attachment |
||||||||||||
|