|  | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
| 
 |  | ||
|  | ||||||||||||
| 
 Solution Type Technical Instruction Sure Solution 1003582.1 : Sun Fire[TM] 12K/15K/E20K/E25K: What Happens in a DR Slot0 Detach Operation 
 
 PreviouslyPublishedAs 205053 
 Applies to:Sun Fire E25K ServerSun Fire 15K Server Sun Fire E20K Server - Version: Not Applicable to Not Applicable [Release: N/A to N/A] Sun Fire 12K Server - Version: Not Applicable to Not Applicable [Release: N/A to N/A] All Platforms GoalThe dynamic reconfiguration feature on the Sun Fire 12K/15K/E20K/E25K server enables you to perform hardware configuration changes to a live domain that is running the Solaris[TM] operating environment without causing machine downtime.This Document will discuss details about Slot0 Detach Operation, while the attach is described into Document 1010760.1. SolutionDR Framework and OperationThe dynamic reconfiguration feature on the Sun Fire 12K/15K/E20K/E25K server enables you to perform hardware configuration changes to a live domain that is running the Solaris[TM] operating environment, without causing machine downtime. You can also use DR (dynamic reconfiguration), in conjunction with hot-swap functionality to physically remove boards from or add them to the server. You can execute DR operations from the SC (system controller) by using the system management services commands addboard(1M), moveboard(1M), deleteboard(1M), and rcfgadm(1M). The DR frameworkBefore proceeding on to the matter of DR, it would be useful to highlight an architectural overview of the DR infrastructure that is involved in this discussion: 
 In correlation with the above illustration of the DR framework, we will continue to highlight a few of the above highlighted subsystems that are pertinent for our discussion: 
 DCS
 The DCS listens for incoming DR requests and facilitates applications on the SC such as the remote version of cfgadm (rcfgadm) etc., to control DR operations on the domain. It exports the full functionality of the libcfgadm framework through a secure network protocol. 
 libcfgadm
 This is the main module of the libcfgadm library. It exports the config_admin interface, which in turns offers a generic interface that is used by DR. Under this arrangement, each piece of hardware that supports DR must supply a hardware-specific plugin library. Hence, a primary function would lie with the specific purpose of locating, loading, and making calls to the correct hardware-specific plugin library for the hardware type involved in the DR operations. 
 cfgadm_plugins
 For this discussion, we will direct our attention to the libcfgadm plugin for System Board (slot0) DR - the cfgadm_sbd plugin (which resides in /usr/platform/sun4u/lib/cfgadm). This library provides DR functionality for connecting, configuring, unconfiguring, and disconnecting class sbd system boards. It also enables you to connect or disconnect a system board from a running system without having to reboot the system. 
 DR driver
 The dynamic reconfiguration (DR) driver consists of a platform-independent driver (dr) and a platform-specific module (drmach). The dr driver uses standard features of the Solaris[TM] Operating System to control DR operations and calls the platform specific module as needed. It is also responsible for maintaining board information and performing state transition checking. The drmach driver provides platform specific DR functionality. In the case of the Sun Fire 15K/12K server, it includes working in concert with the In-Kernel Probing (IKP) subsystem to identify devices on a system board part. For example, it determine if a given device belongs to the system board via verifying if the device has a valid portid that maps to the system board and if the device has a property field name='name'. Devices such as CPUs, AXQs, and Memory Controllers have this property. 
 Safari Configurator
 A Safari device is one that is connected to a port of a Safari bus and the Safari Configurator defines the specification that describes a common interface for manipulating CPUs, I/O, Graphics, and Memory Controllers. This is done to facilitate a generic way for the devices to interface with one another. Since the Safari Bus is deployed in many different platforms there is a need for one loadable module that intermediates between board drivers and the generic safari configurator. On the Sun Fire 12K/15K/E20K/E25K resident domains, a platform specific loadable module sits between the DR Driver and the Safari Configurator (gptwocfg) and this module is sc_gptwocfg. This module is responsible for mediating transactions between DR driver and the safari configurator and the FCODE interpreter. Three functions will be provided: 
 Walking through the DR SB detach operation
 The DR detach CLI initiated from the main SC
 v4u-15ka-sc0:sms-svc:2> deleteboard SB4 The primary point of note here is the loading of the SMS library, libscdr which in turn provides the Remote DR (RDR) interface to the DR CLI. This would facilitate a DR request to DCA. 
 DCS
 A socket connection between DCA and DCS is setup via establishing a TCP/IP 3-way handshake over the I1 network link. This socket connection provides the medium on which RDR request/replies are enabled. DCDS receives incoming RDR_CONF_CHANGE_STATE request to change the configuration state of an attachment point. In our sample DR ops, the ap_id concerned would be "sb4" and the state change command to execute (i.e., CFGA_CMD_DISCONNECT, through the state_change_cmd field). For more examples of the configuration state changes that are possible, refer to the MAN pages for config_admin (Configuration Administration Library Functions). The CFGA_CMD_DISCONNECT argument is passed on to libcfgadm's config_change_state() function. 
 libcfgadm
 config_change_state() function process the CFGA_CMD_DISCONNECT request. For each ap_id applied to the above config state change ops, it will in turn find and load the correct cfgadm_plugins. Based on the fact that the SB part's ap_id is classified as "sbd", the plugin loaded for our sample DR SB4 detach ops is sbd.so.1. cfgadm_plugin (i.e., sbd plugin)Status ioctl via ap_stat() against the physical ap_id - i.e., /devices/pseudo/dr@0:SB4 to ensure that the board number returned by the driver matches the plugin's notion of the board number as extracted from the ap_id. GetNCM (i.e., get number of components) ioctl (SBD_CMD_GETNCM) against the SB part involved. This should return a value of 5 - 4 procs + 1 memory controller. This is followed by a GetSTATUS ioctl (SBD_GET_STATUS). ap_cm_capacity() against the 4 procs + 1 memory controller involved in this DR ops. This is done in the context of RCM capacity change notifications (via librcm.so). ap_suspend_check () will be initiated to perform a sanity check on whether the suspend ops is necessary. The returned confirmation will exec rcm request suspend, request delete capacity and request offline ops as follows: CMD_RCM_SUSPEND --> CMD_RCM_CAP_DEL (CPUs) -> CMD_RCM_CAP_DEL (Memory Controller) --> CMD_RCM_OFFLINE The following messages would be observed from the SC (where the deleteboard CLI was initiated): request delete capacity (4 cpus) ioctl ops - ap_ioctl(SBD_CMD_UNCONFIGURE). See "DR Driver -' Unconfigure" below. Following the devices' successfully unconfigure ops, ap_ioctl SBD_CMD_DISCONNECT). See "DR Driver - Disconnect " below. DR Driver - Unconfiguredr driver initiates dr_pre_op () with CMD = UNCONFIGURE and proceeds on to checking for valid state transitions The 4 CPUs' state are then transitioned in the following order: CONFIGURED -> RELEASE -> UNREFERENCED The dr driver proceeds onto acquiring the memlist for the SB part's memory controller. Walking through the memlist, it schedules each span for removal with kphysm_del_span(). At this stage, it decides if a span may intersect an area occupied by the kernel cage (i.e., permanent memory) and if a copy/rename ops is necessary to release the memory. NOTE: In Solaris 9 with KU 118558-05 and platmod patch 117124-07, the kernel kage may be split over multiple boards. Please see Document 1012349.1 for more details. In our simple example, SB4 does not have any permanent memory resident and the memory's state transitions: CONFIGURED -> RELEASE -> UNREFERENCED dr driver's dr_pre_detach_cpu() will initiate a recursive entering of the device tree to facilitate the drmach driver calling drmach_cpu_poweroff (). This phase will correlate the following messages reported off the domain's console: May 6 07:56:15 v4u-15ka-a dr: OS unconfigure dr@0:SB4::cpu0 The CPUs' state will transition in the following order: UNREFERENCED -> UNCONFIGURED The SB's non-permanent memory are detached via clearing the memory addresses involved + scrubing/flushing the ecache involved. In the case involving permanent memory, a copy/rename ops will be initiated. dr_post_op (cmd = UNCONFIGURE) --> drmach_post_op () --> drmach_log_sysevent () DR Driver - Disconnectdr_pre_op (cmd = DISCONNECTED) and upon verification of the validity of the state transitions, will initiate dr_disconnect () --> drmach_board_disconnect (). As we're attempting to disconnect a slot0 board, there is a need to reprogram the LPA setting for the slot1 board (i.e., IO4) - i.e., no memory at the slot0 board, which would infer no cachable address space, hence, the CASM slice field will be zero. At this stage, the drmach driver calls the platform dependent safari configurator (sc_gptwocfg) to initiate a "deprobe" ops. This is done as the DR makes use of a facility called "In Kernel Probing" - i.e., devices probed after stod, are probed by the in-kernel-prober (not OBP), to construct a device tree image. The deprobe process commences with sc_unprobe_board() which in turn calls gptwocfg_unconfigure() for each port associated with the board starting with the root device. It traverses the device tree and identifying devices with valid portid's (associated with the SB board). drmach driver will initiate a UNCLAIM Request to the SC via the IOSRAM facilitated Mailbox: i.e., Last DR-->SC message: This UNCLAIM cmd will enclosed reprogrammed CASM info to the SC (i.e., slice field = 0x0). This will be followed by a UNCLAIM Reply. At this stage, the CPUs & Memory Controller's state will transition from UNCONFIGURED to EMPTY. DR Driver -- Poweroff / Unassigndr_pre_op (cmd = POWEROFF) and drmach will initiate SHOWBOARD request/reply to ascertain the board's status (power, assigned, active, t_status etc.) before finally setting up POWEROFF request/reply to exec the poweroff ops: i.e., Last DR-->SC message: Last DR-->SC message: dr_pre_op (cmd = UNASSIGN) and drmach will again initiate SHOWBOARD request/reply to ascertain board's status before setting up UNASSIGN request/reply to ensure that the SB part is no longer in the domain's ACL. Product Sun Fire E25K Server Sun Fire 15K Server Internal Section Keywords: starcat, DR, dynamic, reconfiguration, casm, IKP, sc_gptwocfg, dr, drmach, librcm, sbd, cfgadm_sbd, dcs, dca, config_admin, libcfgadm Previously Published As 75994 Product_uuid d842dd03-059b-11d8-84cb-080020a9ed93|Sun Fire E25K Server 29e4659c-0a18-11d6-9fa1-e67bbc033df8|Sun Fire 15K Server Attachments This solution has no attachment | ||||||||||||
| 
 | ||||||||||||