Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Technical Instruction Sure Solution 1003169.1 : Sun Fire[TM] reboots due to TOD-POR reason
PreviouslyPublishedAs 204355
Applies to:Sun Fire 12K ServerAll Platforms GoalSun Fire[TM] reboots due to TOD-POR issue (where TOD = Time Of Day chipset and POR = Power On Reset).SolutionWhen the Sun Fire[TM] (Enterprise server) hardware generates a reset, it is up to the openboot PROM (OBP) to handle it and recover any debugging information.The OBP's messages get printed to the system's console only. During a reset of the Sun Fires, the OBP will save reset_cause and previous_reset_cause. The messages can be displayed with the prtconf(1M) command. The fields "reset-reason" and "previous-reset-reason" in the output of "prtconf -vp"will list the reason(s). One reset type that can be observed from these OBP messages is TOD-POR reset or "TOD Watchdog". It is a Sun Fire feature enabled by the kernel flag "watchdog_enable". Sun Fires use the clock board's TOD as a watchdog facility. The "TOD Watchdog" may be used to recover from a hung system. However, if the watchdog timer expires (10sec), a system reset occurs so the system reboots rather than remain hung. Unfortunately does not allow for debugging the hard hang situation. To help debug the hang, you would need to disable the TOD watchdog feature and enable the deadman kernel instead. By default the TOD Watchdog feature is disabled. To enable it, add the following line in /etc/system, then reboot. set watchdog_enable = 1 MECHANISM ============ tod_setwatchdog() is initially called from clkstart(). Then the next tod_get() programs the TOD hardware's watchdog facility. tod_suspendwatchdog() temporarily suspends the watchdog timer and the next call to tod_get() re-enables the watchdog timer. The only call to tod_suspendwatchdog() is from complete_panic(). It is not possible to determine if the hang itself was from HW or SW when the timer expires, because the TOD watchdog mechanism involves both the Kernel, SW mechanism, and the clock board, HW mechanism. Note that if the even if the line is set in /etc/system, if Solaris[TM] is running under a debugger (RB_DEBUG), then the watchdog is disabled. Product Netra 1280 Server Sun Netra 1290 Server Sun Fire V1280 Server Sun Fire 3800 Server Sun Fire 4800 Server Sun Fire 4810 Server Sun Fire 6800 Server Sun Fire E2900 Server Sun Fire E4900 Server Sun Fire E6900 Server Sun Fire 12K Server Sun Fire 15K Server Sun Fire E20K Server Sun Fire E25K Server Internal Section On Serengeti systems, the SC provides TOD support for Solaris[TM] as Solaris does not have direct access to or exclusive ownership of TOD hardware. The TOD watchdog mechanism works differently on the Serengeti versus the Sun Fire, and is enabled by default. See Technical Instruction Document 1008873.1 for more information on troubleshooting resets. Previously Published As 46241 Keywords: TOD-POR, Time of day, watchdog, reboot, rebooted, reboots, kernel Attachments This solution has no attachment |
||||||||||||
|