Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Technical Instruction Sure Solution 1010760.1 : Sun Fire[TM] 15K/12K/E20K/E25K Servers: What Happens in a DR Slot0 Attach Operation
PreviouslyPublishedAs 214863 Description The Dynamic Reconfiguration(DR) feature on the Sun Fire[TM] 15K/12K/20K/25K servers, enables you to perform hardware configuration changes to a live domain, that is running the Solaris[TM] Operating System(OS), without causing machine down-time. Steps to Follow DR may also be used in conjunction with hot-swap functionality to physically remove boards from, or add them to, the server. DR operations can be executed from the System Controller (SC) by using the System Management Services commands: addboard(1M), moveboard(1M), deleteboard(1M), and rcfgadm(1M). The command on the domain is cfgadm(1M). THE DR FRAMEWORK The following is an architectural overview of the DR infrastructure that is referenced in this document: SC DR Capable Domain ------ . ----------------- ------------- . ( rcfgadm(1M) )------| . ------------- | . | . ----------------- | . ( showdevices(1M) )--| . ----------------- | . | . --------------- | . ( moveboard(1M) )----| . --------------- | . | . -------------- | ----- . ----- ------------ ( addboard(1M) )-----|--( DCA )---( DCS ) ( cfgadm(1M) ) -------------- | ----- ^. ----- ------------ | / . | | ----------------- | / . ------------- ( deleteboard(1M) )--| / . | ----------------- / . -------------- network i/f . ( libcfgadm(4) ) . -------------- . | . --------------- -------- . ( cfgadm_plugins )--( librcm ) . --------------- -------- . | | . | ------------ . | ( rcm_daemon ) . | ------------ . | | . | ---------------- . | | | . | --------- -------- -------------------- . | ( scripts ) (modules ) ( "Hardware Control" ) . | --------- -------- -------------------- . | \ . . . . . .|. . . . . . . . . . . . . . . . . \ . | \ . ----------- \ . ( DR driver )--------------| \ . ----------- | \ . | ------------------ \ . | | | | \ . | --------- ------ ----- ( hardware i/f )------| ( NDI/DDI ) ( CPUs ) ( MEM ) . --------- ------ ----- Domain Configuration Server(DCS) The DCS listens for incoming DR requests and facilitates applications on the SC, such as the remote version of cfgadm (rcfgadm) and so on, to control DR operations on the domain. DCS exports the full functionality of the libcfgadm framework through a secure network protocol. libcfgadm This is the main module of the libcfgadm library. It exports the config_admin interface, which in turn offers a generic interface that is used by DR. Under this arrangement each piece of hardware that supports DR must supply a hardware-specific plug-in library. Hence, a primary function, would be to locate, load, and make calls to the correct hardware-specific plug-in library for the hardware type involved in the DR operations. cfgadm_plugins This document focuses on the libcfgadm plug-in for System Board(slot0) DR -- the cfgadm_sbd plug-in (which resides in /usr/platform/sun4u/lib/cfgadm). This library provides DR functionality for connecting, configuring, unconfiguring, and disconnecting class sbd system boards. It also enables you to connect or disconnect a system board from a running system, without having to reboot the system. DR DRIVER The DR driver consists of a platform-independent driver( dr), and a platform-specific module (drmach). The DR driver uses standard features of Solaris OS to control DR operations and calls the platform-specific module as needed. The DR driver is also responsible for maintaining board information and performing state transition checking. The drmach driver provides platform-specific DR functionality. Regarding Sun Fire 15K/12K servers, the drmach driver works in concert with the In-Kernel Probing(IKP) sub-system to identify devices on a system board part. For example, the drmach driver determines if a given device belongs to the system board by determining if the device has a valid port-id that maps to the system board, and if the device has a property field name="name." Devices such as CPUs, AXQs, and memory controllers have this property. SAFARI CONFIGURATOR A Safari device is one that is connected to a port of a Safari bus, and the Safari Configurator defines the specification that describes a common interface for manipulating CPUs, I/O, Graphics, and memory controllers. This is done to facilitate a generic way for the devices to interface with one another. Because the Safari bus is deployed in many different platforms, there is a need for one loadable module that inter-mediates between board drivers and the generic Safari Configurator. On the Sun Fire 15K/12K resident domains a platform-specific loadable module(sc_gptwocfg) sits between the DR driver and the Safari Configurator(gptwocfg). This module is responsible for mediating transactions between the DR driver and the Safari Configurator and the FCODE interpreter. Three functions will be provided: - sc_probe_board () - sc_unprobe_board () - sc_next_node () THE DR SB ATTACH OPERATION The DR attach CLI initiated from the main SC v4u-15ka-sc0:sms-svc:2> addboard -d A SB4 The primary point of note here is the loading of the SMS library, libscdr, which in turn provides the Remote DR(RDR) interface to the DR CLI. This would facilitate a DR request to the Domain Configuration Agent (DCA). DCS A socket connection between DCA and DCS is set up by establishing a TCP/IP 3-way handshake over the I1 network link. This socket connection provides the medium on which RDR request/replies are enabled. Receives incoming RDR_CONF_CHANGE_STATE request to change the configuration state of an attachment point. In the sample DR ops in this document, the ap_id concerned would be "sb4" and the state change command to execute (that is, CFGA_CMD_CONFIGURE) would be through the state_change_cmd field). For more examples of the configuration state changes that are possible, refer to the MAN pages for config_admin (Configuration Administration Library Functions). The CFGA_CMD_CONFIGURE argument is passed on to libcfgadm's config_change_state() function. libcfgadm The config_change_state() function processes the CFGA_CMD_CONFIGURE request. For each ap_id applied to the above config state change ops it will in turn find and load the correct cfgadm_plugins. Based on the fact that the SB part's ap_id is classified as "sbd", the plugin loaded for the sample DR SB4 detach ops is sbd.so.1. cfgadm_plugin(sbd plugin) Status ioctl using ap_stat() against the physical ap_id, that is: /devices/pseudo/dr@0:SB4 to ensure that the board number returned by the driver, matches the plug-in's notion of the board number, as extracted from the ap_id. GetNCM (that is, get number of components) ioctl (SBD_CMD_GETNCM) against the SB part involved. This should return a value of 0 (because components resident on a SB in a disconnected state should not be "visible" to the OS). This is followed by a GetSTATUS ioctl (SBD_GET_STATUS). The previous GetNCM / GetSTATUS operations are facilitated in part by the dr/drmach pair, that is, dr_pre_op() / dr_post_op () with CMD=STATUS or GETNCM will direct drmach to initiate a SHOWBOARD request. The returned data of the SHOWBOARD reply determines the "no device present" state to the GetNCM ops. DR DRIVER -- ASSIGN / POWERON / TEST ap_seq() "exec assign" against the ap_id involved in this DR ops will initiate dr_ioctl ( SBD_CMD_ASSIGN ). dr_pre_op ( CMD=ASSIGN ) will initiate a check for the validity of the state transition requested before drmach will initiate an ASSIGN Request. The ensuing ASSIGN reply will facilitate dr_post_op ( CMD=ASSIGN ) and drmach_log_sysevent (). ap_seq() "exec poweron" against the ap_id involved in this DR ops will initiate dr_ioctl ( SBD_CMD_POWERON ). dr_pre_op ( CMD=POWERON ) will initiate a check for the validity of the state transition requested before drmach will initiate a POWERON Request. The ensuing POWERON reply will facilitate dr_post_op ( CMD=POWERON ) and drmach_log_sysevent (). ap_seq() "exec test" against the ap_id involved in this DR ops will initiate dr_ioctl ( SBD_CMD_TEST ). dr_pre_op ( CMD=TEST ) will initiate a check for the validity of the state transition requested before drmach will initiate a TESTBOARD Request. The ensuing TESTBOARD reply will facilitate dr_post_op ( CMD=TEST ) and drmach_log_sysevent (). DR DRIVER -- CONNECT The ap_seq() "exec connect" against the ap_id involved in this DR ops will initiate dr_ioctl ( SBD_CMD_CONNECT ). dr_pre_op ( CMD=CONNECT ) will initiate a check for the validity of the state transition requested before dr_connect(). dr_connect() will only proceed if called to operate on an entire board that doesn't already have components present in the domain. drmach_board_connect() will be responsible for building the CASM information portion of the subsequent CLAIM request (to the SC). The drmach initiated CLAIM request to the SC will enclose information from the 18-entry table in the AXQ that defines which expander contains the home memory of each 128 GByte range of the Physical Address for PA[41:37], and whether the slots in this expander have permission to send these transactions. For example, the example DR attach SB4 operations will involve a CLAIM request that includes information about which expander houses (and does not house) memory resident to this domain at this point in time -- that is, it will continue to report no memory resident at EX#4 : v4u-15ka-a drmach: exp4: val=0 slice=0x0 v4u-15ka-a drmach: MC 0: MADR[0] =0x0, MADR[1] = 0x0 v4u-15ka-a drmach: : MADR[2] =0x0, MADR[3] = 0x0 v4u-15ka-a drmach: MC 1: MADR[0] =0x0, MADR[1] = 0x0 v4u-15ka-a drmach: : MADR[2] =0x0, MADR[3] = 0x0 v4u-15ka-a drmach: MC 2: MADR[0] =0x0, MADR[1] = 0x0 v4u-15ka-a drmach: : MADR[2] =0x0, MADR[3] = 0x0 v4u-15ka-a drmach: MC 3: MADR[0] =0x0, MADR[1] = 0x0 v4u-15ka-a drmach: : MADR[2] =0x0, MADR[3] = 0x0 Upon the SC's CLAIM reply received by the IOSRAM facilitated Mailbox drmach will initiate the Safari Configurator phase through the sc_probe_board() to facilitate the CONFIGURE phase. DR DRIVER -- CONFIGURE ap_seq() "exec configure" against the ap_id involved in this DR ops will initiate dr_ioctl ( SBD_CMD_CONFIGURE ), which would initiate the Sun Fire 12K/15K/20K/25K specific configurator module sc_gptwocfg's sc_configure() against slot0 at EX4. The ensuing sc_find_axq_node (id = 0x9e) against AXQ0 at EX4 would step through the device tree to verify that it is not already configured to this domain. Given the expected situation where sc_find_axq_node ( id = 0x9e ) returns 0, the generic Safari Configurator module, gptwocfg, will initiate gptwo_configure_axq ( id = 0x9e ) and add the device node. Next, it will access the Global DCD Structure from the golden IOSRAM using sc_gptwocfg's sc_get_common_pcd() and then proceed on to dump_pcd() against CPU IDs 0x80 to 0x84 ( resident at SB4 ) along with the requisite information with regards to its DIMMs / Ecache banks. The returned agentID of the processors will allow the gptwo_cpu module to proceed to initiate with configuring in the CPU ( using gptwo_configure_cpu() ) and creating the device for the associated memory-controller ( using gptwocfg_create_mc_node() ). The previous newly configurated data is then maintained as "cookies" before updating the CASM's slice table using the drmach module and re-programming the LPA settings. The previous state allows dr_pre_op ( CMD = CONFIGURE ) to prepare and validate the ensuing state transition, that is: dr_pre_attach_cpu() --> i_ndi_block_device_tree_changes() to facilitate the drmach module's drmach_configure(), to walk the DDI branch a d initiate the online ops against the four CPUs. The previous will drive dr_post_attach_cpu()'s COLD START initialization of SB4's processors, and transition them into the expected CONFIGURED state. The same process is repeated to enable the memory-controllers that are resident on the same SB part. ap_seq() "exec notify online" against the ap_id involved in this DR ops will initiate GetNCM (that is, get number of components) ioctl (SBD_CMD_GETNCM) against the SB part involved. This should return a value of 5 ( four CPUs + one memory-controller ). This is followed by a GetSTATUS ioctl (SBD_GET_STATUS). The previous GetNCM / GetSTATUS operation is facilitated in part by the dr / drmach pair. That is, dr_pre_op() / dr_post_op() with CMD=STATUS or GETNCM will direct drmach to initiate a SHOWBOARD request. The returned data from the SHOWBOARD reply will determine the returned value to the GetNCM ops. The final operation to wrap the whole DR attach process would be for the RCM to be notified of the current amount of configured memory and the current number of configured CPUs through the SBD class's plugin's ap_rcm_cap_cpu() and ap_rcm_cap_mem(). NOTE: Although mostly concerned with DR detach operations, readers should be familiar with the changes to the kernel cage under Solaris 9 OS KU patch 118558-05. See Technical Instruction < Solution: 217037 > for more information. Product Sun Fire 15K Server Sun Fire 12K Server Sun Fire E25K Server Sun Fire E20K Server 12K, 15K, 20K, 25K, starcat, DR, dynamic reconfiguration, casm, IKP, sc_gptwocfg, drmach, librcm, sbd, cfgadm_sbd, dcs, dca, config_admin, libcfgadm Previously Published As 76338 Attachments This solution has no attachment |
||||||||||||
|