![]() | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Technical Instruction Sure Solution 1012392.1 : How to isolate a processor from a running system ?
PreviouslyPublishedAs 217091 Applies to:Sun Fire 12K Server - Version Not Applicable and laterSun Fire 15K Server - Version Not Applicable and later Sun Fire 3800 Server - Version Not Applicable and later Sun Fire 4800 Server - Version Not Applicable and later Sun Fire 4810 Server - Version Not Applicable and later All Platforms GoalThere are several ways to "remove" a processor from a running system but these operations have different goals and different consequences. Processor isolation can be done by changing the operational status of a processor. To achieve that goal, a processor can be off-line or unconfigured. The aim of this document is to present the differences between psradm -f, psradm -i and cfgadm -c unconfigure for the UltraSparc II, UltraSparc ( III, IV, IV+ ) and SPARC64 ( VI,VII, VII+ ) processors. This document provides an overview of the different ways to isolate a cpu from a running system on various SPARC CPUs. Using the appropriate status and the appropriate command can be very useful in many cases: troubleshooting, performances analysis and so on .. For example, a cpu can be offlined to see if that cpu has any role in a transient hardware failure. Once a cpu is confirmed to have a hardware issue, it can be isolated using cfgadm. A cpu can be dedicated to processing just user level/system level threads and isolated from processing interrupts. Depending on what one needs, these commands can be effectively used. FixHow to isolate a processor from a running system ? From a manual pages point of view : The role of the psradm command is to change processor operational status; to the off-line and no-intr status for instance. The role of the cfgadm command is to dynamically reconfigure hardware resources; to unconfigure a processor. . An off-line processor does not process any LWPs. Usually, an off-line processor is not interruptible by I/O devices in the system. On some processors or under certain conditions, it may not be possible to disable interrupts for an off-line processor. Thus, the actual effect of being off-line may vary from machine to machine. . A no-intr processor processes LWPs but is not interruptible by I/O devices. . A component is unconfigured when it is not available for use by the Solaris Operating Environment. The default status of a processor is on-line : Let's see what this means on various CPUs. UltraSparc II/III : Offlining a processor : Both USII and USIII can be off-line in the same way. The subsequent consequences on the processor state are similar. This can be done by using the psradm -f processor_id command. Example from a 4 procs SF15K domain : Initial state : # psrinfo 96 on-line since 09/14/2004 22:18:00 97 on-line since 09/14/2004 22:18:00 98 on-line since 09/14/2004 22:18:00 99 on-line since 09/14/2004 22:18:00 # psradm -f 97 # psrinfo 96 on-line since 09/14/2004 22:18:00 97 off-line since 09/24/2004 13:11:13 98 on-line since 09/14/2004 22:18:00 99 on-line since 09/14/2004 22:18:00 When a proc is off-line, it is excluded from scheduling. # echo "::cpuinfo -v" | mdb -k ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 97 3002383e000 2f 0 0 -1 no no t-1028515 2a100333d40 (idle)'' | RUNNING <--+ READY QUIESCED EXISTS OFFLINE compared to an on-line proc : ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 98 3002383a000 1b 0 0 -1 no no t-37 2a10032bd40 (idle) | RUNNING <--+ READY EXISTS ENABLE As reported in the previous output, off-line processor is running the idle thread. QUIESCED means that we'll stay in the idle loop -which means that the processor is made to spin in a tight loop and the cpu no longer processes any LWPs and does not handle device interrupts. But the proc remains in the cpu_ready_set which means it will get all xt_all() (incl. E$ scrubber) cross traps and xc_all() cross calls and softints. Notes on the no-intr status : Example from a 4 procs SF15K domain : # psradm -i 99 # psrinfo 96 on-line since 09/14/2004 22:18:00 97 off-line since 09/24/2004 13:11:13 98 on-line since 09/29/2004 11:05:39 99 no-intr since 09/29/2004 11:05:43 # echo "::cpuinfo -v" | mdb -k ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 99 300238ce000 b 0 0 0 no no t-0 30047c1c000 sleep | RUNNING <--+ READY EXISTS compared to an on-line proc : ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 98 3002383a000 1b 0 0 -1 no no t-37 2a10032bd40 (idle) | RUNNING <--+ READY EXISTS ENABLE no-intr processor is part of the scheduler; LWPs can be scheduled on the proc. In both cases, off-line and no-intr, only the status at the Solaris level has changed these state changes are not relevant in OBP. These cpus are also totally visible in Solaris because off-line and no-intr are changes to the state of the cpu inside Solaris. # prtconf -vp | grep "name: 'SUNW,UltraSPARC-III" name: 'SUNW,UltraSPARC-III+' name: 'SUNW,UltraSPARC-III+' name: 'SUNW,UltraSPARC-III+' name: 'SUNW,UltraSPARC-III+' Note that offlining a processor may fail due to several reasons. man psradm will give you all the various error conditions and the reason for each of these conditions. At least one processor in the system must be able to process LWPs. A processor may not be taken off-line if there are LWPs that are bound to the processor. At least one processor must also be able to be interrupted. It's noticeable that, although the memory controller resides on the processor with the USIII architecture, when a processor is off-line, the associated memory is still accessible. A processor can be unconfigured : As per the definition, the unconfigure operation consists in removing a resource from the system and so it cannot be used by Solaris. Ex from a Sun Fire 4800 : # cfgadm -c unconfigure N0.SB4::cpu3 cfgadm: Hardware specific failure: unconfigure N0.SB4::cpu3: Can't unconfig cpu if mem online: /ssm@0,0/memory-controller@13,400000 Ex from an Enterprise 10000 : # cfgadm -c unconfigure SB9::cpu3 cfgadm: Hardware specific failure: unconfigure SB9::cpu3: Operation not supported The unconfigure operation can be done via the cfgadm -c unconfigure Ap_Id command. # cfgadm -c unconfigure SB3::cpu0 # cfgadm -alv -s "match=partial,select=type(cpu)" Ap_Id Receptacle Occupant Condition Information When Type Busy Phys_Id SB3::cpu0 connected unconfigured ok cpuid 96, speed 1200 MHz, ecache 8 MBytes Sep 24 12:57 cpu n /devices/pseudo/dr@0:SB3::cpu0 SB3::cpu1 connected configured ok cpuid 97, speed 1200 MHz, ecache 8 MBytes Sep 24 12:52 cpu n /devices/pseudo/dr@0:SB3::cpu1 SB3::cpu2 connected configured ok cpuid 98, speed 1200 MHz, ecache 8 MBytes Sep 24 12:52 cpu n /devices/pseudo/dr@0:SB3::cpu2 SB3::cpu3 connected configured ok cpuid 99, speed 1200 MHz, ecache 8 MBytes Sep 24 12:52 cpu n /devices/pseudo/dr@0:SB3::cpu3 When you unconfigure a cpu, the cpu is removed from the scope of Solaris kernel and the cpu is not part of scheduling, or interrupt processing. Solaris device tree will no longer have this CPU(resource). # psrinfo 97 off-line since 09/24/2004 13:11:13 98 on-line since 09/29/2004 11:05:39 99 no-intr since 09/29/2004 11:05:43 # echo "ncpus/D" | mdb -k physmem 4ddf90 ncpus: ncpus: 3 Though the proc is no longer available to Solaris, the unconfigured proc is still seen via OBP. It is still seen from OBP because OBP device tree is not relfecting the change. # prtconf -vp | grep "name: 'SUNW,UltraSPARC-III" name: 'SUNW,UltraSPARC-III+' name: 'SUNW,UltraSPARC-III+' name: 'SUNW,UltraSPARC-III+' name: 'SUNW,UltraSPARC-III+' Btw, cfgadm is using cpu_offline() as part of the removal process. It's noticeable that, although the memory controller resides on the processor with the USIII architecture, when a processor is unconfigured, the associated memory is still accessible : Original configuration : # prtdiag -v System Configuration: Sun Microsystems sun4u Sun Fire 15000 System clock frequency: 150 MHz Memory size: 16384 Megabytes ========================= CPUs ========================= CPU Run E$ CPU CPU Slot ID ID MHz MB Impl. Mask -------- ------- ---- ---- ------- ---- /SB11/P0 352 1200 8.0 US-III+ 11.0 /SB11/P1 353 1200 8.0 US-III+ 11.0 /SB11/P2 354 1200 8.0 US-III+ 11.0 /SB11/P3 355 1200 8.0 US-III+ 11.0 # cfgadm -alv Ap_Id Receptacle Occupant Condition Information When Type Busy Phys_Id SB11 connected configured ok powered-on, assigned Jun 5 11:44 CPU n /devices/pseudo/dr@0:SB11 SB11::cpu0 connected configured ok cpuid 352, speed 1200 MHz, ecache 8 MBytes Jun 5 11:44 cpu n /devices/pseudo/dr@0:SB11::cpu0 SB11::cpu1 connected configured ok cpuid 353, speed 1200 MHz, ecache 8 MBytes Jun 5 11:44 cpu n /devices/pseudo/dr@0:SB11::cpu1 SB11::cpu2 connected configured ok cpuid 354, speed 1200 MHz, ecache 8 MBytes Jun 5 11:44 cpu n /devices/pseudo/dr@0:SB11::cpu2 SB11::cpu3 connected configured ok cpuid 355, speed 1200 MHz, ecache 8 MBytes Jun 5 11:44 cpu n /devices/pseudo/dr@0:SB11::cpu3 SB11::memory connected configured ok base address 0x1e000000000, 16777216 KBytes total, 1040312 KBytes permanent Jun 5 11:51 memory n /devices/pseudo/dr@0:SB11::memory c0 connected configured unknown [...] # psrinfo 352 on-line since 06/05/2007 11:44:41 353 on-line since 06/05/2007 11:44:41 354 on-line since 06/05/2007 11:44:41 355 on-line since 06/05/2007 11:44:41 # cfgadm -c unconfigure SB11::cpu0 OS unconfigure dr@0:SB11::cpu0 # psrinfo 353 on-line since 06/05/2007 11:44:41 354 on-line since 06/05/2007 11:44:41 355 on-line since 06/05/2007 11:44:41 No change in the memory configuration : # cfgadm -alv Ap_Id Receptacle Occupant Condition Information When Type Busy Phys_Id SB11 connected configured ok powered-on, assigned Jun 5 11:58 CPU n /devices/pseudo/dr@0:SB11 SB11::cpu0 connected unconfigured ok cpuid 352, speed 1200 MHz, ecache 8 MBytes Jun 5 11:58 cpu n /devices/pseudo/dr@0:SB11::cpu0 SB11::cpu1 connected configured ok cpuid 353, speed 1200 MHz, ecache 8 MBytes Jun 5 11:44 cpu n /devices/pseudo/dr@0:SB11::cpu1 SB11::cpu2 connected configured ok cpuid 354, speed 1200 MHz, ecache 8 MBytes Jun 5 11:44 cpu n /devices/pseudo/dr@0:SB11::cpu2 SB11::cpu3 connected configured ok cpuid 355, speed 1200 MHz, ecache 8 MBytes Jun 5 11:44 cpu n /devices/pseudo/dr@0:SB11::cpu3 SB11::memory connected configured ok base address 0x1e000000000, 16777216 KBytes total, 1040312 KBytes permanent Jun 5 11:51 memory n /devices/pseudo/dr@0:SB11::memory The amount of memory available to the domain is still the same : # prtconf -pv | grep Memory Memory size: 16384 Megabytes # prtdiag -v | more System Configuration: Sun Microsystems sun4u Sun Fire 15000 System clock frequency: 150 MHz Memory size: 16384 Megabytes [...] UltraSparc IV / UltraSparc IV+ : Reminder : The above reasoning is applicable to the UltraSPARC IV+ processors - it has 2 MB Level-2 Cache (On-chip tags and data) and 32 MB Level-3 Cache (On-chip tags, off-chip data) Exclusive of L2 cache From a 4 procs (8 cores) SF15K domain : # prtdiag -v System Configuration: Sun Microsystems sun4u Sun Fire 15000 System clock frequency: 150 MHz Memory size: 16384 Megabytes ========================= CPUs ========================= CPU Run E$ CPU CPU Slot ID ID MHz MB Impl. Mask -------- ------- ---- ---- ------- ---- /SB02/P0 64, 68 1050 16.0 US-IV 2.3 /SB02/P1 65, 69 1050 16.0 US-IV 2.3 /SB02/P2 66, 70 1050 16.0 US-IV 2.3 /SB02/P3 67, 71 1050 16.0 US-IV 2.3 # psrinfo 64 on-line since 09/29/2004 22:29:06 65 on-line since 09/29/2004 22:29:06 66 on-line since 09/29/2004 22:29:07 67 on-line since 09/29/2004 22:29:06 68 on-line since 09/29/2004 22:29:06 69 on-line since 09/29/2004 22:29:07 70 on-line since 09/29/2004 22:29:07 71 on-line since 09/29/2004 22:29:06 A core can be off-line : # psradm -f 65 # psrinfo 64 on-line since 09/29/2004 22:29:06 65 off-line since 09/30/2004 10:45:15 66 on-line since 09/29/2004 22:29:07 67 on-line since 09/29/2004 22:29:07 68 on-line since 09/29/2004 22:29:07 69 on-line since 09/29/2004 22:29:07 70 on-line since 09/29/2004 22:29:07 71 on-line since 09/29/2004 22:29:07 In the same manner as USII and USIII processors, on-line and off-line cores states differ in the way that the off-line core is excluded from scheduling, is running the idle thread and may be interruptible by cross traps and cross calls. In the following example, processor 65 and 69 are 2 cores from the same USIV cpu. # echo "::cpuinfo -v" | mdb -k ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 65 30019610000 2f 0 0 -1 no no t-1402587 2a100013d40 (idle) | RUNNING <--+ READY QUIESCED EXISTS OFFLINE ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 69 3001965e000 1b 0 0 -1 no no t-37 2a100353d40 (idle) | RUNNING <--+ READY EXISTS ENABLE A core cannot be unconfigured, but the cpu (2 cores) can be unconfigured : Each CPU attachment point represents two CPUID numbers because, from a DR perspective, Solaris treats each core as a single entity. # cfgadm -alv -s "match=partial,select=type(cpu)" Ap_Id Receptacle Occupant Condition Information When Type Busy Phys_Id SB2::cpu0 connected configured ok cpuid 64 and 68, speed 1050 MHz, ecache 16 MBytes Sep 29 22:36 cpu n /devices/pseudo/dr@0:SB2::cpu0 SB2::cpu1 connected configured ok cpuid 65 and 69, speed 1050 MHz, ecache 16 MBytes Sep 29 22:36 cpu n /devices/pseudo/dr@0:SB2::cpu1 SB2::cpu2 connected configured ok cpuid 66 and 70, speed 1050 MHz, ecache 16 MBytes Sep 29 22:36 cpu n /devices/pseudo/dr@0:SB2::cpu2 SB2::cpu3 connected configured ok cpuid 67 and 71, speed 1050 MHz, ecache 16 MBytes Sep 30 10:46 cpu n /devices/pseudo/dr@0:SB2::cpu3 # cfgadm -c unconfigure SB2::cpu3 # cfgadm -alv -s "match=partial,select=type(cpu)" Ap_Id Receptacle Occupant Condition Information When Type Busy Phys_Id SB2::cpu0 connected configured ok cpuid 64 and 68, speed 1050 MHz, ecache 16 MBytes Sep 29 22:36 cpu n /devices/pseudo/dr@0:SB2::cpu0 SB2::cpu1 connected configured ok cpuid 65 and 69, speed 1050 MHz, ecache 16 MBytes Sep 29 22:36 cpu n /devices/pseudo/dr@0:SB2::cpu1 SB2::cpu2 connected configured ok cpuid 66 and 70, speed 1050 MHz, ecache 16 MBytes Sep 29 22:36 cpu n /devices/pseudo/dr@0:SB2::cpu2 SB2::cpu3 connected unconfigured ok cpuid 67 and 71, speed 1050 MHz, ecache 16 MBytes Sep 30 14:27 cpu n /devices/pseudo/dr@0:SB2::cpu3 So, 2 cores are now missing from the original configuration : # psrinfo 64 on-line since 09/29/2004 22:29:06 65 off-line since 09/30/2004 10:45:15 66 on-line since 09/29/2004 22:29:07 68 no-intr since 09/30/2004 10:45:27 69 on-line since 09/29/2004 22:29:07 70 on-line since 09/29/2004 22:29:07 # echo "ncpus/D" | mdb -k ncpus: ncpus: 6 and, as usual, all the cores are visible at the OBP level. # prtconf -vp | grep "SUNW,UltraSPARC-IV" compatible: 'SUNW,UltraSPARC-IV' compatible: 'SUNW,UltraSPARC-IV' compatible: 'SUNW,UltraSPARC-IV' compatible: 'SUNW,UltraSPARC-IV' compatible: 'SUNW,UltraSPARC-IV' compatible: 'SUNW,UltraSPARC-IV' compatible: 'SUNW,UltraSPARC-IV' compatible: 'SUNW,UltraSPARC-IV' Obviously, when a processor (2 cores) is unconfigured, the associated memory is still accessible; the amount of memory available to the domain is still the same : # prtdiag -v System Configuration: Sun Microsystems sun4u Sun Fire 15000 System clock frequency: 150 MHz Memory size: 16384 Megabytes [...output omitted] SPARC64 (VI, VII, VII+) : Reminder : SPARC64 cpu offers two or four SPARC V9 cores and two vertical threads (two CMT strands) per core. 5-12MB on-chip shared L2$, no external cache. The memory controller (MAC) is off-chip. The following reasoning is applicable to the M4000, M5000, M8000 and M9000 domains. The processor numbering is based on the Logical System Board mapping therefore, the numbering is common to the Mid-Range Servers (M4000 + M5000) and High-End Servers (M8000 + M9000). See <Document 1005329.1><document:> for more details. Since Solaris sees each strand as an individual processor, they are reported in the psrinfo output : Example from a M9000-32 domain composed of one CMU : (4 * CPUM) * (2 * cores) * (2 * strands) => 16 processors # echo "ncpus/D" | mdb -k ncpus: ncpus: 16 # prtdiag -v System Configuration: Sun Microsystems sun4u Sun SPARC Enterprise M9000 Server System clock frequency: 960 MHz Memory size: 32768 Megabytes ==================================== CPUs ==================================== CPU CPU Run L2$ CPU CPU LSB Chip ID MHz MB Impl. Mask --- ---- -------------------- ---- --- ----- ---- 00 0 0, 1, 2, 3 2280 5.0 6 146 00 1 8, 9, 10, 11 2280 5.0 6 146 00 2 16, 17, 18, 19 2280 5.0 6 146 00 3 24, 25, 26, 27 2280 5.0 6 146 # psrinfo 0 on-line since 05/23/2007 16:07:08 1 on-line since 05/23/2007 16:07:09 2 on-line since 05/23/2007 16:07:09 3 on-line since 05/23/2007 16:07:09 8 on-line since 05/23/2007 16:07:09 9 on-line since 05/23/2007 16:07:09 10 on-line since 05/23/2007 16:07:09 11 on-line since 05/23/2007 16:07:09 16 on-line since 05/23/2007 16:07:09 17 on-line since 05/23/2007 16:07:09 18 on-line since 05/23/2007 16:07:09 19 on-line since 05/23/2007 16:07:09 24 on-line since 05/23/2007 16:07:09 25 on-line since 05/23/2007 16:07:09 26 on-line since 05/25/2007 06:57:15 27 on-line since 05/23/2007 16:07:09 Note : To determine the physical location of the component, a 'showboards -v' for the domain can be collected from the active XSCF. XSCF> showboards -d 1 XSB DID(LSB) Assignment Pwr Conn Conf Test Fault ---- -------- ----------- ---- ---- ---- ------- -------- 08-0 01(00) Assigned y y y Passed Normal In this eaxmple, the processors listed in the prtdiag/psrinfo outputs are belonging to CMU#8 associated with LSB#0 of domain 1. The information about the processors is also available from the main SP : XSCF> showdevices -d 1 CPU: ---- DID XSB id state speed ecache 01 08-0 0 on-line 2280 5 01 08-0 1 on-line 2280 5 01 08-0 2 on-line 2280 5 01 08-0 3 on-line 2280 5 01 08-0 8 on-line 2280 5 01 08-0 9 on-line 2280 5 01 08-0 10 on-line 2280 5 01 08-0 11 on-line 2280 5 01 08-0 16 on-line 2280 5 01 08-0 17 on-line 2280 5 01 08-0 18 on-line 2280 5 01 08-0 19 on-line 2280 5 01 08-0 24 on-line 2280 5 01 08-0 25 on-line 2280 5 01 08-0 26 on-line 2280 5 01 08-0 27 on-line 2280 5 [...] A processor can be off-line : # psradm -f 2 # psrinfo 0 on-line since 05/23/2007 16:07:08 1 on-line since 05/23/2007 16:07:09 2 off-line since 05/25/2007 07:09:05 3 on-line since 05/23/2007 16:07:09 8 on-line since 05/23/2007 16:07:09 [...] In the same manner as UltraSparc processors, on-line and off-line cores states differ in the way that the off-line core is excluded from scheduling, is running the idle thread and may be interruptible by cross traps and cross calls. In the following example, processor 0, 1, 2 and 3 are (2 cores * 2 strands) from the same SPARC64 VI CPUM. # echo "::cpuinfo -v" | mdb -k ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 0 0000180c000 1b 0 0 -1 no no t-58 2a10001fcc0 (idle) | RUNNING <--+ READY EXISTS ENABLE ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 1 3000405a000 1b 0 0 -1 no no t-5961 2a1004c9cc0 (idle) | RUNNING <--+ READY EXISTS ENABLE ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 2 3000405e000 2f 0 0 -1 no no t-7468 2a100551cc0 (idle) | RUNNING <--+ READY QUIESCED EXISTS OFFLINE As seen in other SPARC cpus, when a SPARC64VI processor is off-line, it's still visible in OBP : # prtconf -vp | grep "SPARC64" compatible: 'FJSV,SPARC64-VI' compatible: 'FJSV,SPARC64-VI' compatible: 'FJSV,SPARC64-VI' compatible: 'FJSV,SPARC64-VI' compatible: 'FJSV,SPARC64-VI' compatible: 'FJSV,SPARC64-VI' compatible: 'FJSV,SPARC64-VI' compatible: 'FJSV,SPARC64-VI' And, of course, the memory is accessible if one or more processor are off-line : # prtdiag -v System Configuration: Sun Microsystems sun4u Sun SPARC Enterprise M9000 Server System clock frequency: 960 MHz Memory size: 32768 Megabytes A SPARC64 VI processor cannot be unconfigured : From a cfgadm perspective, we can each CPUM reported as an entity : # cfgadm -a -s "cols=ap_id:info,select=type(cpu)" Ap_Id Information SB0::cpu0 cpuid 0, 1, 2, and 3, speed 2280 MHz, ecache 5 MBytes SB0::cpu1 cpuid 8, 9, 10, and 11, speed 2280 MHz, ecache 5 MBytes SB0::cpu2 cpuid 16, 17, 18, and 19, speed 2280 MHz, ecache 5 MBytes SB0::cpu3 cpuid 24, 25, 26, and 27, speed 2280 MHz, ecache 5 MBytes Dynamic Reconfiguration on the Sun SPARC Enterprise Mx000 Servers is a "SP initiated" model, therefore a CPUM (nor a core, nor a strands) cannot be unconfigured from Solaris : # cfgadm -c unconfigure SB0::cpu2 May 25 07:13:48 mammothcar-b drmach: WARNING: Operation not supported cfgadm: Hardware specific failure: unconfigure SB0::cpu2: Operation not supported As a summary : Since Solaris does not differentiate cores/strands/cpus, each entity appears to Solaris as a cpu and so the application level command psradm does not function different on different CPUs. Whether it is US-II, US-III, US-IV[+] or SPARC64 VI, its the same. Each entity seen as cpu is handled in the same way, . cfgadm on the other hand is getting information from lowlevel device tree and is closer to the hardware. So it knows the difference between core/strand/chip/cpu etc. So this command will have a difference depending on the type of the cpu, . off-line processors (Quiesced) are not completely idle (still running cross calls, traps and running the idle thread) and E$ scrubber continues to cross-trap to an offlined proc, off-line means a processor is not part of the scheduler, not taking device-interrupts but still taking software interrupts and part of cpu-ready-set taking part in demap cross calls, . no-intr means the cpu is part of the scheduler and does not take device interrupts. Again software interrupts are an exception and the cpu is still taking them, . Unconfigured procs are completely removed from Solaris scope but are still visible at the OBP level. The unconfigure operation is part of the DR process where a cpu is physicaly made to go back to a tight for loop and removed from solaris's device tree. Solaris no longer has any idea about the existence of this cpu. This is totally different from psradm which are done at user level and the os is controlling these, . Possible states combinations are : After boot/initialization : RUNNING, READY, EXISTS, ENABLE Interrupts disabled : RUNNING, READY, EXISTS Offline : RUNNING, READY, QUIESCED, EXISTS, OFFLINE . In all the cases, UltraSparc II, UltraSparc III, UltraSparc IV and UltraSparc IV+, if the system reboots/crashes/dstops/hangs, and/or if a system recovery occurs, the system will be brought back up in default mode : all processors configured and online. . All the memory available before offlining/unconfiguring procs remains available after the operation. What's new with Solaris 10 FMA (Fault Management Architecture) introduces 2 new states: FAULTED : processor is offline due to fault SPARE : processor is offline as waiting Morever, psradm introduces 2 new options: -F and -s -Fs to force processor into "spare" state The force option forces a processor to be offlined, set to faulted or set to spare even if there is processes bound to that processor. In this case, the binding is revoked for these processes. In many respects the "spare" state is similiar to the "offline" state. The difference is that a processor in this state cannot be changed to a different state unless the user has the appropriate privilege. The "spare" state adds a meaningful semantic to distinguish offline processors on the system for purposes of automated resource management.
Attachments This solution has no attachment |
||||||||||||
|