Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Sun Alert Sure Solution 1220873.1 : A Misconfigured Gateway (Network Route) May Corrupt the XSCF Database on Sun SPARC Enterprise M8000/M9000 Servers Running XCP Firmware 1092 or 1093
In this Document
Applies to:Sun SPARC Enterprise M8000 Server - Version: Not Applicable and later [Release: N/A and later]Sun SPARC Enterprise M9000-32 Server - Version: Not Applicable and later [Release: N/A and later] Sun SPARC Enterprise M9000-64 Server - Version: Not Applicable and later [Release: N/A and later] Sun SPARC Sun OS _________________ _________________ Date of Workaround Release: 29-Sep-2010 Date of Resolved Release: 01-Dec-2010 ___________________________________ DescriptionSun SPARC Enterprise M8000/M9000 Servers with XCP firmware revision 1092 or 1093 will fail to start after an XSCF reboot (applynetwork/rebootxscf) if a gateway is misconfigured, which indicates the gateway IP address is not on the same subnet as the IP address for the LAN interface. During the XSCF reboot sequence, the XSCF (eXtended System Control Facility) database will become corrupted as a consequence of the misconfigured gateway. There will be no immediate impact to the running domains, but if no XSCF is available, the domains cannot reboot. The XSCF with the corrupted database is the one with the misconfigured gateway(s). It is possible that the database on either one of the two or both XSCFs are corrupted. Note: In the unlikely event of having both XSCFs corrupted, please contact
Oracle Support to guide you through a procedure to remove the trigger
of this software issue. If both XSCFs have incorrect routes leading to XSCF database corruption, no XSCF will be available and a platform power cycle will be required to recover. Note: As a precaution, preventive measures outlined in this document should also be taken on other XCP versions, even if an upgrade is not planned. Likelihood of OccurrenceThis issue can occur on the following platforms:
XCP 1093 output will appear similar to the following:XSCF> version -c xcp XSCF#0 (Active ) Notes: 1. This issue is not applicable to the Sun SPARC Enterprise M3000/M4000/M5000 Servers. 2. A cabling problem will not trigger this issue. 3. The XSCF 'showroute' command cannot be used to determine if the gateway is misconfigured. 4. A system is only vulnerable to this issue if an XSCF is rebooted (applynetwork/rebootxscf) and a gateway is misconfigured. To determine if the gateway is misconfigured, perform the 'applynetwork' procedure as described in the Workaround section. Possible SymptomsShould the described issue occur, erroneous routes configured on XCP 1091 and lower will produce errors similar to the following on the XSCF console during the XSCF reset sequence: [output omitted] Workaround or ResolutionPrior to installing the affected firmware revisions 1092 or 1093, it is important that the gateway configuration is verified to be sure to avoid this issue at the next reboot. Once firmware revision 1092 or 1093 is installed, it is critical that any configuration changes (setroute/setnetwork) are carefully checked to ensure that this issue is not invoked at the next reboot. It is possible to manually verify the network configuration which will avoid the XSCF database corruption by doing the following: 1. Log into the Active XSCF as a user with 'platadm' privilege 2. Change one of the network settings and use 'applynetwork -n' to see routes defined in the XSCF database via the following command: verify there are no routes with a gateway not located on the local network (i.e., same subnet) as in the following example:XSCF> setnetwork -c down xscf#0-lan#0; setnetwork -c up xscf#0-lan#0; applynetwork -n The following network settings will be applied:XSCF> setnetwork -c down xscf#0-lan#0; setnetwork -c up xscf#0-lan#0; applynetwork -n In the example above, gateway 128.244.128.1 is not reachable on the xscf#0-lan#0 10.244.128.xxx subnet.xscf#0 hostname :m8000-xscf0 3. Delete the bad routes: TIP: copy/paste the route from applynetwork output when constructing 'setroute -c del'XSCF> setroute -c del -n 0.0.0.0 -m 0.0.0.0 -g 128.244.128.1 xscf#0-lan#0 4. Apply the changes with "applynetwork -y", and verify that all routes are reachable on the interface subnet: 5. Re-verify all routes are reachable on the interface subnet (i.e., gateway is same subnet):XSCF> applynetwork -y Once the configuration is verified correct, you may safely upgrade to XCP 1092 or 1093 or safely proceed with XSCF reset.XSCF> rebootxscf -y This issue is addressed in the following release:
http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/downloads/index.html PatchesInternal Section:Please send technical questions to the following email: [email protected] and copy the Responsible Engineer It's possible to verify the network configuration by collecting an XSCF snapshot and then unpacking this snapshot on the OPL Snapshot analysis Toolset at http://oplpass.us.oracle.com/ The tool has been augmented with automation to verify the network configuration. The tool will provide the command(s) to fix the problem when applicable. Note that the snapshot must be properly unpacked in order to check the configuration. Once your configuration is verified correct you may safely upgrade to XCP 1092 or 1093 or safely proceed with XSCF reset. When the XSCF is unable to start, the following message can be observed in the XSCF console logs: execute S60checktestdb[317]: ERR: Database problems detected: (ret=-9010). Cleaning up scdb_init_all: -9010, Database verify bad root: ERROR: Database problems detected: Inconsistency. Cleaning up Initiating shutdown ... XSCF BOOT STOP (recover by NFB-OFF/ON) Eng Support: There is a special procedure which requires a complete platform power cycle to remove the condition that triggers this issue. Replacing XSCF hardware, or any other hardware for that matter, is not going to solve this issue. Internal Contributor/Submitter: [email protected] Internal Eng Responsible Engineer: [email protected] Internal Services Knowledge Engineer:[email protected] Internal Eng Business Unit Group: Systems Group - OPL Internal Escalation ID: 73220322, 73374292, 73361718, 73477978, 73501428, 73531790, 73522408 Modification History29-Sep-2010: Workaround Release01-Dec-2010: Republish - Issue is now Resolved ReferencesSUNBUG:6984765Attachments This solution has no attachment |
||||||||||||
|