Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Technical Instruction Sure Solution 1012186.1 : Sun Fire[TM] Server: How to Use Dynamic Reconfiguration (DR) in a Sun Cluster[TM] 3.x Environment
PreviouslyPublishedAs 216797
Applies to:Sun Fire 3800 ServerSun Fire 4800 Server Sun Fire 4810 Server Sun Fire 6800 Server Sun Fire E2900 Server All Platforms GoalThis document describes how to use Dynamic Reconfiguration (DR) in Sun[TM] Cluster 3.x configurations. Although DR is supported in a clustered environment, some restrictions apply; this document provides guidelines and best practices for performing a DR operation in a safe manner on production systems.Note: It is always safe to issue a DR detach operation because the DR subsystem rejects operations on boards containing active components. This document is aimed toward an audience that has basic knowledge about DR and Sun[TM] Cluster. SolutionSUPPORTED CONFIGURATIONS
KNOWN RESTRICTIONS AND CONSIDERATIONS All the requirements, procedures, and restrictions that are documented for the Solaris OS DR also apply to Sun Cluster DR support. In a Sun Cluster configuration, the following operations are not supported:
Sun Cluster software does not support detach operations on a board that contains the permanent kernel memory and will reject any such operation. Note: A DR detach operation that pertains to memory other than the kernel permanent memory will not be rejected.
Note: The kernel cage on OS versions up to and including Solaris[TM] 8 will always be located on a board within the domain to which the highest ordered slice number is assigned. The PCD address slice mapping can only be modified using interaction with a kernel cage copy/rename operation. Refer to
The board where the heartbeat threads are bound is the least significantly numbered in the domain when the domain is booted, with the least significant processors. DR detach operations will fail for the board where the heartbeat You can confirm which board has these threads in the following way: phys-motu-1# echo ::cycinfo | mdb -k Here we see that the cluster_heartbeat cyclic is on CPU 1. Note: This is for recent versions of Solaris/Sun Cluster where cyclics are used to trigger the heartbeat; with older versions you have to look where the clock is located.
A RCM script (SUNW_cluster_quorum_rcm.pl) prevents the removal and suspension of Sun Cluster quorum devices.
DR detach operations on active devices in the primary node are not permitted. DR operations can be performed on non-active devices in the primary node and on any devices in the secondary node. An RCM script (SUNW_cluster_storage_rcm.pl) prevents the removal and suspension of primary paths to the Sun Cluster managed storage devices.
Dr detach operations cannot be performed on active private interconnect interfaces. A workaround is to disable and remove the interface from the active interconnect. GUIDELINES Removing a board containing active components may result in system errors. Before removing a board, the DR subsystem queries other subsystems, such as Sun Cluster subsystems, to determine whether the components on the board are being used. As a result, it is always safe to issue a DR detach operation in a Sun Cluster 3.x environment. DR Detach of CPU/Memory Board - Determine if the CPU/Memory Board has the kernel permanent memory. - From the domain: # cfgadm -alv | grep SB | grep permanent - From the main system controller (SC): sc:sms-user:> rcfgadm -d or from the following: sc:sms-user:> showdevices -d For example: # cfgadm -alv | grep SB | grep permanent If the board being detached contains the kernel permanent memory, the operation will be rejected. For example: # cfgadm -c disconnect SB2 In the Solaris[TM] 8 OS, the most significantly numbered board fails to drain because Sun Cluster software uses the Real Time Thread scheduler class for both of its heartbeat threads, and there is a problem with RT threads that causes the DR operation to abort: In the Solaris[TM] 9 OS, the DR operation advises you of the RT threads and gives you the option to proceed anyway. Suspension of RT threads causes them to stop being real time. If you proceed with the operation "without" manual workarounds, problems can occur in some RT thread applications. Suspending RT threads for too long can panic the node. To DR detach the board containing the kernel permanent memory, the node must be shut down and booted in a non-cluster mode (Of course, using DR in that case is not necessary.): # /usr/cluster/bin/scswitch -S -h If the board does not contain kernel permanent memory, proceed with the detach of the CPU/Memory Board. - From the domain: # cfgadm -c disconnect SBxx - From the main SC: sc:sms-user:> rcfgadm -c disconnect SBxx or from the following: sc:sms-user:> deleteboard SBxx A DR Detach of a CPU/Memory Board that does not contain the kernel permanent memory might fail due to the presence of the cluster heartbeat threads. You can have up to NCPUS heartbeat threads. Usually, there are only two heartbeat threads because there are only two private interconnect links. The heartbeat threads act as bound threads, but they are not. Because the heartbeat threads are usually awakened by the clock, these threads appear to be bound to the CPU running the clock. If one of the heartbeat threads is busy during the DR Detach, it will prevent DR on the CPU it is using, and the DR operation fails. For example: #cfgadm -d A -c disconnect SB12 The status of the heartbeat threads cannot be reported using a user command (ps), but they can be reported from some Solaris OS internal structures (for example, clock thread callout tables). Use the cfgadm -al command to check the state and condition of a board prior to replacing it. The 'Receptacle/Occupant/Condition' fields must indicate disconnected/unconfigured/unknown. The CPU/Memory Board can be physically removed as soon as the board is powered off. DR Attach of CPU/Memory Board The DR attach of CPU/Memory Board is always safe. - From the domain: # cfgadm -c configure SBxx - From the main SC: sc:sms-user:> rcfgadm -c configure SBxx or from the following: sc:sms-user:> addboard -d Use the cfgadm -alv command to check that all the resources have been added to the domain. After the DR attach, Receptacle/Occupant/Condition must be Connected/Configured/Ok. DR Detach of PCI card As stated previously, there are some DR considerations for PCI cards. 1) Determine if the PCI card has any Quorum Devices by using the following to show the status for all device quorums and node quorums: # /usr/cluster/bin/scstat -q For example: # /usr/cluster/bin/scstat -q Device Name Present Possible Status 2) Use the scdidadm command to list the mapping between device entries and DID driver instance numbers. For example: # /usr/cluster/bin/scdidadm -L d14 If the DR detach pertains to a quorum device, the operation will be rejected as follows: ERROR: Unable to unassign IO3 from domain: B A new quorum device must be added and the current quorum device must be removed. Use the /usr/cluster/bin/scsetup command to add a new quorum device and then disable the quorum device that needs to be removed. 3) Determine if the PCI card has any active devices in the primary node by using the following to show the status for all disk device groups: # /usr/cluster/bin/scstat -D For example: # /usr/cluster/bin/scstat -D -- Device Group Status -- Depending on the nature of the Device Group (Solaris[TM] Volume Manager, VERITAS Volume Manager), use the appropriate commands to determine the Solaris devices path for the device group, and then determine if the PCI card to be removed affects an active device group on the current primary node. Before proceeding with the DR detach of a Sun[TM] PCi card that belongs to an active device group on the current primary node, you must switch the primary and secondary nodes. # /usr/cluster/bin/scswitch -z -D 4) Determine if the PCI card has any Active Private Interconnect Interfaces by using the following to show the status for the cluster transport path: # /usr/cluster/bin/scstat -W For example: # /usr/cluster/bin/scstat -W -- Cluster Transport Paths -- Use the /usr/cluster/bin/scsetup command to disable and remove the interface from the active interconnect. Correct removal for the cable, adapter, or junction can also be checked using the following: # /usr/cluster/bin/scconf -p | grep cable NOTE: Refer to the Sun Cluster 3.x System Administration Guide for more information about Quorum, Global Devices, and Cluster Interconnects administration, and also for detailed instructions to perform the actions previously mentioned. 5) If all the previous tests have been done AND any I/O device activity is stopped AND all the alternate paths to storage and network (MPxIO, IPMP, and so on) are properly set (refer to the appropriate documentation), a PCI card can be DR detached as follows: # cfgadm -c disconnect For example: # cfgadm -c disconnect pcisch7:e02b1slot2 Use the cfgadm -al command to check the state and condition of the card prior to detaching the PCI card. For example: # cfgadm -al | grep pcisch7:e02b1slot2 Check the logs (/var/adm/messages) to confirm that the DR operation has successfully completed. Then, the PCI card can be safely removed from the hsPCI I/O Board. DR Attach of a PCI Card The DR attach of a PCI card is safe and can be done using the following: # cfgadm -c configure
Note: Use the cfgadm -al command to check the state and condition of the card before and after attaching the PCI card. After the DR attach, Receptacle/Occupant/Condition must be Connected/Configured/Ok. For example: # cfgadm -c configure pcisch7:e02b1slot2 If any changes have been applied to the cluster configuration (quorum device removal, transport path removal, node switch), the system administrator can manually configure the cluster as was done previously. DR Detach of an hsPCI I/O Board To DR detach an hsPCI I/O Board from a clustered domain, you must consider all the issues for the PCI card removal as described previously. 1) Check that none of the four PCI cards pertain to the following:
2) Check that I/O device activity and all the alternate paths to storage and to the network (vxdmp, MPxIO, IPMP, and so on)are properly set. 3) Check the state and condition of the hsPCI I/O Board prior to detaching it. The Recipient/Occupant/Condition must be disconnected/unconfigured/unknown. 4) Power off the board. 5) Now, you can safely remove the hsPCI I/O Board from the configuration. - From the domain: # cfgadm -c disconnect IOxx - From the main SC: sc:sms-user:> rcfgadm -c disconnect IOxx or from the following: sc:sms-user:> deleteboard IOxx DR Attach of the hsPCI I/O Board The DR attach of the hsPCI I/O Board is safe. - From the domain: # cfgadm -c configure IOxx - From the main SC: sc:sms-user:> rcfgadm -c configure IOxx or from the following: sc:sms-user:> addboard -d Use the cfgadm -alv command to check that all the resources have been added to the domain. After the DR attach, Receptacle/Occupant/Condition must be Connected/Configured/Ok. BEST PRACTICES Best Practices 1: The following section describes a special boot method for clustered domain to improve usage for the DR feature in such a domain. Technical Background After a setkeyswitch on operation, DR will work with Sun Cluster 3.x on all but two boards in the domain, those being both the least and most significantly numbered boards, the most significantly numbered board because of the presence of the kernel cage, the least significantly numbered board because of the potential presence of the heartbeat threads. Special Boot Method for a Clustered Domain As an example, your domain consists of SB0-7 and IO0-3. At boot time, you configure your domain with SB0, IO0-3 and key it on. Benefit and Drawback Both the kernel cage AND the default "bindings" of the Sun Cluster heartbeat threads will reside on SB0, leaving all but _one_ board, the "boot board," easily detachable using DR. SB0 would then be considered as a system critical resource, like the boot device. Once a system is booted in this manner, if the system crashes/dstops/hangs, and if sms-svc does automatic system (domain) recovery, the domain will be brought back up in default mode, with SB7 having the cage, and SB0 the heartbeat threads, and so on. Prepare a Domain for Using DR with Sun Cluster As previously stated, the kernel cage will always be located on a board within the domain to which is assigned the highest ordered slice number. The idea is to force the location of the kernel cage and the heartbeat threads on the least significant board prior to the installation of the Sun Cluster software. As an example, your domain consists of SB0-7 and IO0-3. Prior to Sun Cluster installation, you create a domain with the final configuration SB0-7 and then DR detach each of the other boards in sequence, that is, SB7, then SB6, and so on until all boards are out of the domain but SB0. Hence, SB0 will "host" the kernel cage because it has the highest ordered slice number. You can now DR attach all the system boards back to the domain and install the Sun Cluster software. Benefit and Drawback Both the kernel cage AND the default "bindings" of Sun Cluster heartbeat threads will reside on SB0, leaving all but _one_ board, the "boot board," easily detachable using DR. SB0 would then be considered as a system critical resource, like the boot device. Only a fresh installation of SMS without restoring a SMS backup image (no restoration of the platform configuration information) can change the location of the kernel cage so the system can survive any crashes/dstops/hangs/ASR. This option must be considered early in the installation process and must be done prior to Sun Cluster software installation. Nothing should be running in the domain in the way of user jobs and applications at the time you set up the domain. You should examine the PCD using redx after each detach to make sure that the cage has actually ended up on the least significantly numbered board. Best practice 2 : Technical Background In some rare cases, it's been reported from site that during the DR detach operation of a CPU/Memory Board that does not contain the kernel permanent memory, timeout for fault monitor probes can be reached. This can lead to a stop for the associated Sun Cluster Data Service. This can occur mainly when ISM segments are located on the board to be detached from the system; and so, even if the Solaris 8 patch 117350-05 or Solaris 9 patch 117171-08 is installed. Note that increasing the probe timeout value (Probe_timeout) does not seem like the right course of action in such a case; see Sun Cluster Data Services Planning and Administration Guide for Solaris OS for more details . The safest approach In clustered domains, the safest approcach is to failover the cluster service to the other node, then perform the DR operations, then fail back when desired. Alternate option An alternate solution may be to disable the Data Service Fault Monitor during the DR operation and to enable it back after the operation. # /usr/cluster/bin/scswitch -M -n -j You can then use the scstat command to verify that the resource is reported as "Online but not monitored". To monitor the resource again : # /usr/cluster/bin/scswitch -M -e -j Internal Comments References & Solutions for Dynamic Recofiguration Dynamic Reconfiguration for High-End Servers: Part 1 - Planning Phase (Clustered Domains p.41) - BluePrint Part No. 817-5949-10. Sun Fire[TM] 15K Dynamic Reconfiguration Troubleshooting Guide - http://webhome.emea/sdutille/sf15k/DR_TBS_Guide.html Document: 1001683.1 Sun Fire[TM] 12K/15K: Location and Relocation of Kernel for DR Operations Document: 1017710.1 Sun Fire[TM] Servers : Dynamic Reconfiguration and Intimate Shared Memory. Sun Cluster configuration guide - http://suncluster.eng.sun.com/products/SC3.1/config/sc3ConfigurationGuide-5.htm#46677 Sun Cluster 3.x & Dynamic Reconfiguration (DR) http://sunweb.germany/SSS/MCSC/ET/suncluster/clusttips/sc3xDR.html Sun Cluster 3.0 System Administration Guide - 806-1423 Sun Cluster 3.1 System Administration Guide - 816-3384 Sun Cluster Data Services Planning and Administration Guide for Solaris OS For VERITAS Cluster software, refer to the "Veritas Cluster Server Application Note Sun Fire 12K/15K Dynamic Reconfiguration" available at http://seer.support.veritas.com/docs/254514.htm Previously Published As 76514 Attachments This solution has no attachment |
||||||||||||
|