Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Problem Resolution Sure Solution 1002033.1 : Sun Fire[TM] v1280, E2900, 3800, 4800, 4810, 6800, E4900, E6900, and Netra 1280, 1290 Server: How to Recover from a Hung System Controller
PreviouslyPublishedAs 202844 Symptoms When a system controller (SC) is hung, try a few steps before pressing the Reset button on the SC. Resolution Try the following steps: 1) Try to Telnet or directly connect to the serial port of the "hung" SC, TIP into the platform shell, and use the "reboot" command. 2) If the "reboot" command does not work, or you cannot enter anything, log in to the spare SC and try to force a failover by using the "setfailover force" command.
3) If failover does not complete, the LAST RESORT is to use the Reset button on the SC. BEFORE YOU PRESS THIS BUTTON, you must bring down the domains. Bringing down the domains is critical because there is a possibility that the domain will crash if the Reset button is pressed and the domains are up and running. See <Document: 1004364.1> for details. NOTE:- Make sure that connections setting are proper on SC. Use a tip session onto the serial port of the SC: 6800a-sc0:SC> showplatform -p network The system controller is configured to be on a network. Network settings: static Hostname: 6800a-sc0 IP Address: 129.156.xx.xx Netmask: 255.255.255.0 Gateway: 129.156.xx.1 DNS Domain: UK.Sun.COM Primary DNS Server: 129.156.xx.xx Secondary DNS Server: 129.156.xx.xx ***Connection type: none <----- No remote access enabled Idle connection timeout : No timeout Sun Fire Link Enabled: no *** This shows remote access via telnet or ssh is not enabled. Running the command below, changes Connection type : 6800a-sc0:SC> setupplatform -p network Network Configuration Is the system controller on a network? [yes]: Use DHCP or static network settings? [static]: Hostname [6800a-sc0]: IP Address [129.156.xx.xx]: Netmask [255.255.255.0]: Gateway [129.156.xx.1]: DNS Domain [UK.Sun.COM]: Primary DNS Server [129.156.xx.xx]: Secondary DNS Server [129.156.xx.xx]: **To enable remote access to the system controller, select "ssh" or "telnet". **Connection type (ssh, telnet, none) [telnet]: Idle connection timeout (in minutes; 0 means no timeout) [0]: Enable Sun Fire Link? [no]: To enable remote access to the system controller, select either: * ssh * telnet Rebooting the SC is required, for changes in the above network settings to take effect. Product Sun Fire 6800 Server Sun Fire 4810 Server Sun Fire 4800 Server Sun Fire 3800 Server Sun Fire v1280 Server Sun Fire E2900 Server Sun Fire E4900 Server Sun Fire E6900 Server Netra 1280 Server Netra 1290 Server Internal Comments If the force option does not initiate a failover and the customer or field personnel are remote from the system thus unable to press the reset button on the hung system controller, there is a risky, non-documented alternative to "waking-up" the hung SC. Execute the setfailover override command from the spare SC.
Note- This option cannot be seen in setfailover -h command. The override option ignores whatever the status of the system controller is supposed to be and tells the spare to become primary. It pays no attention to the fact that the other SC could still be primary. Warning This procedure should be used with caution as a last resort effort, because it could crash running domains. Example (firmware prior to 5.19.0): kremlin-sc1:sc> setfailover override SC: SSC1 Spare System Controller SC Failover: disabled This will abruptly interrupt operations on the other System Controller. This System Controller will become the main System Controller. Do you want to continue? [no] yes SC Failover did not complete. The system controllers may not be synchronized. Failover can be done forcefully but may crash domain(s). Do you want to force failover to continue? [no] yes kremlin-sc1:sc> Example (firmware 5.19.0): fort-sc0:sc> setfailover override override: is not a valid argument Usage: setfailover [-y|-n] off|on|force setfailover -h fort-sc0:sc> fort-sc0:sc> engineering fort-sc0:sc[engineering]> setfailover override Spare System Controller SC Failover: disabled Clock failover disabled. This will abruptly interrupt operations on this System Controller. This System Controller will become the spare System Controller. Do you want to continue? [no] fort-sc0:sc[engineering]> SunFire, 3800, 4800, 4810, 6800, reset, system controller, failover Previously Published As 75973 Change History Date: 2009-11-23 User Name: Josh Freeman Action: Refreshed Comment: Refreshed the article per ESG Content Team effort. Date: 2006-08-29 User Name: 97961 Action: Approved Comment: - Converted to STM formatting for better readability - Made simple sentence/grammatical corrections Version: 3 Date: 2006-08-29 User Name: 97961 Action: Accept Comment: Version: 0 Attachments This solution has no attachment |
||||||||||||
|