Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Technical Instruction Sure Solution 1006489.1 : Sun Fire[TM] 12K/15K/E20K/E25K: Main System Controller's Community Network (C1) interface failure causes a failover onto the Spare SC.
PreviouslyPublishedAs 209085 Description A failure of all configured C1 (Public) network interfaces will cause the SMS Main System Controller (SC) role to failover to the Spare and the outgoing Main to reboot. This is a normal behavior which is designed for the failover mechanism on the SC's on a Sun Fire[TM] 12K,15K,E20K,E25K. Steps to Follow On a C1 network configured using IPMP on SCs, if the IPMP "test" network interface fails its health check (Typically a ping test to a router or a multicast to the network) IPMP will try to fail over the logical IP to another member of the IPMP group. If the IPMP group is completely unavailable then the Solaris[TM] "policing" action takes over. At that point the Failover Management Daemon (fomd) may have to take action and possibly then force a failover to the Spare SC. The following message will be logged into the Main SC's /var/adm/messages before it reboots : Sep 02 22:00:52 e25k-sc0 in.mpathd[140]: [ID 168056 daemon.error] All Interfaces in group C1 have failed Sep 02 22:02:35 e25k-sc0 genunix: [ID 672855 kern.notice] syncing file systems... Sep 02 22:02:35 e25k-sc0 genunix: [ID 904073 kern.notice] done Sep 02 22:06:02 e25k-sc0 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.9 Version Generic_117171-12 64-bit The $SMSVAR/adm/platform/messages file will have the following error messages logged: Sep 02 22:00:54 2005 e25k-sc0 fomd[513]: [8569 1638658827360354 NOTICE FailoverMgr.cc 1279] The external network test FAILED Sep 02 22:02:23 2005 e25k-sc0 fomd[513]: [8567 1638747925688988 NOTICE FailoverMgr.cc 1956] Failing over to the spare SC because of the following faults on the main SC: External Network Failure There may be various reasons why the both the interfaces on the C1 network fails. Typical scenarios may include but are not restricted to:
The showfailover -v command on the Main SCcommand will give you the status of the C1 network : Status of e25k-sc1: Role: .......................................MAIN : : Public Network: Group "C1": .........................................Up eri0: .........................................Up eri3: .........................................Up Logical IP Addr. - C1:.........................................Up : : The ifconfig -a command on the Main SC will give you details about the C1 network interfaces : eri0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2 inet 10.0.0.5 netmask ffffff00 broadcast 10.0.0.255 groupname C1 ether 0:3:ba:6b:d6:48 eri0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 inet 10.0.0.10 netmask ffffff00 broadcast 10.0.0.255 eri3: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3 inet 10.0.0.15 netmask ffffff00 broadcast 10.0.0.255 groupname C1 ether 0:3:ba:6b:d6:49 Product Sun Fire E25K Server Sun Fire E20K Server Sun Fire 15K Server Sun Fire 12K Server Internal Comments Some internals on how the Community network related test works on the Main SC : XnetTest is a thread which belongs to the fomd (Failover Management Daemon) daemon. The function of the XNetTest thread is to test the external network. This test checks for the presence of incoming packets on each of the external network interface adapters. This information is retrieved using the Solaris kstat routines. For each of the configured external network interfaces, the test retrieves the current rxpkt (Received Packet) count, waits for two seconds, and then retrieves the rxpkt count again. The two counts are then compared. If they differ then the given network interface is considered to have passed the test. If the two counts are equal, the test proceeds to attempt to generate incoming packets on the given network interface by pinging the multicast router address (224.0.0.2), the multicast broadcast address (224.0.0.1) and finally the subnet broadcast address (x.x.x.255). If all of these pings fail to generate any incoming packets on the given network interface, that interface is considered to have failed the test. If there is at least one good interface on a given SC, the XNetTest posts a TEST_PASSED to the FM, otherwise it posts TEST_FAILED. This will lead to the SC failover if it is the Main SC. If routing over the I2 network is allowed then a XNetTest failover on the main SC will never trigger a failover. A XNetTest failure on the spare SC will always disable the failover mechanism because there is no advantage in failing over to the spare SC if the spare is unreachable from the external network.
References : SC, C1 network, Community network, 12K, 15K, 20K, 25K, SMS, fomd, MAN Network Previously Published As 82854 Change History Updated by the ESG Knowledge Content Team 4/2010 Attachments This solution has no attachment |
||||||||||||
|