Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Technical Instruction Sure Solution 1020078.1 : Sun SPARC(R) Enterprise Mx000 (OPL) Servers: How to deal with a hung or unresponsive domain ?
PreviouslyPublishedAs 251786
Applies to:Sun SPARC Enterprise M9000-64 Server - Version: Not ApplicableSun SPARC Enterprise M3000 Server - Version: Not Applicable and later [Release: N/A and later] Sun SPARC Enterprise M4000 Server - Version: Not Applicable and later [Release: N/A and later] Sun SPARC Enterprise M5000 Server - Version: Not Applicable and later [Release: N/A and later] Sun SPARC Enterprise M8000 Server - Version: Not Applicable and later [Release: N/A and later] All Platforms GoalThe goal of this document is to provide some details about the Alive check mechanism and to provide some guidance on how to manage hang-up situations on OPL domains.There is a mechanism in place that
monitors the domains and detects any hang-up situation : the Alive
Checking / Monitoring (aka Host Watchdog).
To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - M Series Servers SolutionFrom Table 2-27 of the Sun SPARC Enterprise M3000/M4000/M5000/M8000/M9000 XSCF User's Guide:Host watchdog : Monitoring the Solaris domain via the SCF driver
When Secure Mode is set to “on” then the Alive checking function is enabled.
There is a configuration file for the scfd driver /platform/SUNW,SPARC-Enterprise/kernel/drv/scfd.conf.
Note : the mechanism must be enabled on both the XSCF and
on the Solaris domain for the Alive check function to be enabled. What happens when a domain hangs while running Solaris
?
The domain did not respond to a keepalive message. The lack of
response is probably due to a software problem on the domain. If this
happens, the XSCF will send a message to the domain asking it to
panic.
The domain did not panic in response to the panic request. The
lack of a panic is probably due to a software problem, although there
may be a hardware problem that caused this.
The domain does not respond to the XIR interrupt. This is likely
to be a hardware problem, although there are software problems that
can cause this situation.
The domain reset did not occur. This is certain to be a hardware
problem (domain software is not involved with a reset request). Note : The keyswith position and Secure mode setting are
available in the snapshot. Oct 22 18:53:08.6733 ereport.chassis.domain.panic Oct 22 19:18:09.0490 ereport.chassis.domain.keepalive.panic-fail Date: Oct 22 18:53:08 CDT 2008 Code: 60000000-ffffffff-0109001500000000 Status: Warning Occurred: Oct 22 18:48:08.422 CDT 2008 FRU: /UNSPECIFIED,/UNSPECIFIED Msg: XSCF command: System status change (OS panic) (DID#00, path: 00) Diagnostic Code: 00000000 00000000 00000000 00002140 01000000 00000000 00000000 00000000 00000000 00000000 00000000 UUID: 8e53ef5d-10d3-4a6b-bd1d-fa757f247f8d MSG-ID: SCF-8005-PX Date: Oct 22 19:18:09 CDT 2008 Code: 60000000-fcff0000-0109000700000000 Status: Warning Occurred: Oct 22 19:18:08.480 CDT 2008 FRU: /DOMAIN#0 Msg: Domain hang-up detected (panic), DID 0 Diagnostic Code: 00000000 00000000 00000000 00000000 00002000 00000000 00000000 00000000 00000000 00000000 00000000 UUID: 8be1af8c-313b-4a03-9c22-4215d33ac89c MSG-ID: SCF-8005-US Date: Oct 22 19:18:24 CDT 2008 Code: 60000500-ffff0000-0300000800030000 Status: Warning Occurred: Oct 22 19:18:23.791 CDT 2008 FRU: /UNSPECIFIED Msg: Externally initiated reset occurred Diagnostic Code: ffffffff ffff0000 00000000 58495200 00000000 00000000 00000000 00000000 00000000 00000000 00000000 UUID: a54ebfc1-a260-4bff-b108-5f65819bffc3 MSG-ID: SCF-8008-3U Diagnostic Messages Monitoring POST/OBP What happens when a domain hangs while running POST/OBP ?
Note :
POST/OBP are monitored regardless of the keyswitch position (Service
/ Locked). The mechanism is slighly different, as monitoring POST/OBP does not use the Alive interrupts. If the domain is running POST or OBP, then msg-fail, panic-fail, and xir-fail cannot occur. Instead, if the keepalive fails, the XSCF will immediately perform a domain reset. If this domain reset fails, then the reset-fail ereport will be issued. In order to investigate further after a domain has been forced to XIR/panic/reset by XSCF, you must collect a full snapshot, a domain explorer and any existing corefiles. If the Alive check is not enabled or no action has been taken by the XSCF to recover from the hang-up situation then a manual operation is required from the user. Steps to Follow This section describes how to recover from a hang-up situation and provides a step-by-step procedure to deal with such a situation and collect the appropriate information for post-mortem analysis. First of all, check the setting for
the Secure mode (showdomainmode) and the position for the keyswitch
(showhardconf). This may influence the result for the above
actions. Note : it's also possible to break the
domain by using the "CTRL-\" combination. Note : in order for the sendbreak to break the domain, the “Secure Mode” for the domain must be set to “Off”. This can be confirmed via the 'showdomainmode' dommand. Note : panic does check the auto-boot? OBP variable or Autoboot variable values. This is controlled by the "halt_on_panic" /etc/system parameter on the domain. At this stage, the domain should restart and a coredump is available for postmortem analysis. Make sure to collect a full snapshot (snapshot -L F) for a proper analysis as well as the domain explorer and corefiles. 2. Try to force the domain to panic via the reset command XSCF> reset -d 0 panic DomainID to panic:00 Continue? [y|n] :y 00 :Panicked Note : the reset command will panic
the domain whatever the value of the “Secure
Mode”. panic[cpu17]/thread=2a100975ca0: System Panel Driver: Emergency panic request detected! 000002a1009dddf0 oplpanel:panel_intr+a0 (6002188f9d8, 10, 7bf37800, 16, 0, 188e800) %l0-3: 000002a1009ddda8 000002a1009dddd0 0000000000000037 00000000018e3c00 %l4-7: 0000000000000000 0000000000000001 000000007009bc00 0000000000000011 000002a1009ddea0 pcicmu:pcmu_intr_wrapper+54 (3000282d348, 0, 30003056a48, 30006c30000, 8000, 1) %l0-3: 00000000018e2800 0000000000000001 0000060021818770 0000000000000000 %l4-7: 0000000000000001 00000000018e3d64 000006002188f9d8 000000007bf376b4 000002a1009ddf50 unix:current_thread+164 (1, 600219b8ca8, f0d0f0f, f0d0f0f, 0, 1b) %l0-3: 00000000010076c8 000002a100974fe1 000000000000000f 000000007002c580 %l4-7: ffffffffffffffff 000006002cd9e6a8 0000000000000000 000002a100975890 000002a100975930 unix:cpu_halt+180 (16, 18baf60, 11, 1, 16, 30006c30000) %l0-3: 0000000000000000 0000000000000001 0000000000000001 0000000001266800 %l4-7: 000000000f0f0f0f 0000000000020000 0000000000000001 0000000000000011 000002a1009759e0 unix:idle+128 (1832000, 0, 30006c30000, ffffffffffffffff, a, 1831000) %l0-3: 00000600219b8ca8 000000000000001b 0000000000000000 ffffffffffffffff %l4-7: 00000000018e0c00 0000000000000000 000000000000042c 000000000103ed8c syncing file systems... done dumping to /dev/dsk/c0t0d0s1, offset 108396544, content: kernel At this stage, the domain should restart and a coredump is available for postmortem analysis. Make sure to collect a full snapshot (snapshot -L F) for a proper analysis as well as the domain explorer and corefiles. 3. Try to send a XIR to the CPUs for the domain via the reset command XSCF> reset -d 0 xir DomainID to reset:00 Continue? [y|n] :y 00 :Reset If the domain drops to OBP then force a panic using the 'sync' command. Note : the reset command will XIR the domain regardless the value of the “Secure Mode”. ERROR: Externally Initiated Reset has occurred. {19} ok sync panic[cpu25]/thread=2a100b55ca0: sync initiated sched: trap type = 0x3 pid=0, pc=0x1266894, sp=0x2a100b55131, tstate=0x80001605, context=0x0 g1-g7: 0, 18baf60, 0, 30006c42000, 600219b8b88, 3c, 2a100b55ca0 00000000fdb7bcd0 unix:sync_handler+144 (182e400, 1b, 0, 1, 1, 109bc00) %l0-3: 000000000188dc90 00000000018d9aa8 00000000018d9800 0000000000000003 %l4-7: 00000000018bb000 0000000000000000 00000000018b4800 000000000000001b 00000000fdb7bda0 unix:vx_handler+80 (fdb64000, 183dd10, a00003c3ffbf0066, 0, 183de18, f006d515) %l0-3: 000000000183de18 0000000000000000 0000000000000001 0000000000000001 %l4-7: 000000000182ec00 00000000f0000000 0000000001000000 0000000001019a68 00000000fdb7be50 unix:callback_handler+20 (fdb64000, fdc96400, 0, 0, 0, 0) %l0-3: 0000000000000016 00000000fdb7b701 0000000000000002 0000000000000001 %l4-7: 0000060021a25e20 0000000000000000 0000000000000000 000002a100bbdde8 syncing file systems... done dumping to /dev/dsk/c0t0d0s1, offset 108396544, content: kernel At this stage, the domain should restart and a coredump is available for postmortem analysis. Make sure to collect a full snapshot (snapshot -L F) for a proper analysis as well as the domain explorer and corefiles. 4. Try to power-on-reset the domain via the reset command XSCF> reset -d 0 por DomainID to reset:00 Continue? [y|n] :y 00 :Reset The domain will be reset and POST will be invoked for the domain. XSCF> reset -d 0 por DomainID to reset:00 Continue? [y|n] :y 00 :Reset *Note* This command only issues the instruction to reset. The result of the instruction can be checked by the "showlogs power". XSCF> showdomainstatus -a DID Domain Status 00 Initialization Phase Note : the reset command
will XIR the domain whatever the value of the “Secure Mode”. At this stage, the domain should restart if POST does not detect any further problem. Note : if the autoboot XSCF parameter or the auto-boot? OBP parameter is set to off/false the domain will not automatically reboot and will stop at the OK prompt. 'sync' could be invoked to force a core dump when dropped to the OBP. Note : the keyswitch in the Service position would also abort the boot sequence. Make sure to collect a full snapshot (snapshot -L F) for a proper analysis as well as the domain explorer. 5. Power cycle the platform The ultimate action, if none of the previous actions has succeeded, would be to power cycle the platform. Of course, this will impact all of the running domains in the platform. Make sure to collect a full snapshot (snapshot -L F) for a proper analysis as well as the domain explorer. 6. Post-mortem analysis 6.1 - Data collection Whatever the procedure used to recover the domain from the hang-up situation, this will require a post-mortem analysis in order to understand what happened to the domain. The minimum data to be collected is :
If a coredump generation was successful from the previous steps,
then the corefile must be collected. Internal Comments The following information can be useful during internal troubleshooting of a hung or unresponsive domain: XSCF will operate based on the id_code value (Monitoring Target ID Code indicates the component of monitoring target). This value is updated during the domain poweron sequence. /******************* Alive id_code ******************/ #define CMEM_ALIVE_ID_POST 0x1 /**< Alive_watch POST */ #define CMEM_ALIVE_ID_OBP 0x2 /**< Alive_watch OBP */ #define CMEM_ALIVE_ID_SCFDRV 0x10 /**< Alive_watch SCF Driver*/ While checking an explorer/snapshot, since the scfd.conf is not collected, it's possible to determine if the Alive check is enabled or not by dumping the id_code BDB value for the domain. Use the dbdump tool available in the toolset. Examples : * The domain 0 is running with scf-alive-check-function="off" or is currently running OBP : bash-3.00$ dbdump -l cmem.current.current_domain_info[0].id_code = 02 * The domain 1 is running with scf-alive-check-function="on" : bash-3.00$ dbdump -l cmem.current.current_domain_info[1].id_code = 10 Note : The keyswith position and Secure mode setting is obviously available in the snapshot. When the Alive check is enabled, XSCF is using some timeout parameters to monitor the domains. Those parameters are configurable in the scfd.conf file : Note : it may not be appropriate to change this default setting * scf-alive-interval-time : The interval time that the service processor (XSCF) periodically monitors Solaris. Specify this parameter in minutes. The range is 1 - 10 minutes. The default is 2 minutes. scf-alive-interval-time=2 Note: The Interrupt interval scf-alive-interval-time must be less than the monitoring timeout scf-alive-monitor-time. * scf-alive-monitor-time : The time that the service processor (XSCF) detects Solaris[TM] hang-up. The service processor (XSCF) executes OS panic by timeout of this timer. Specify this parameter in minutes. The range is 3 - 30 minutes. The default is 6 minutes. scf-alive-monitor-time=6; Note: The value of scf-alive-monitor-time should be bigger than the scf-alive-interval-time value. * scf-alive-panic-time : the time that the service processor (XSCF) detects OS panic hang-up. The service processor (XSCF) executes the system reset (XIR) by timeout of this timer. Specify this parameter in minutes. The range is 30 - 360 minutes. The default is 30 minutes. scf-alive-panic-time=30; More information on which software defects the HCP software can detect is available at http://re.west/menus/SW_Projects/Current/OPL-SP/builds/nightly/ppc/testFF_P/col2sun/build/noarch/docs/fm/scf.html/sw.html The levels reported in the showlogs error output for a domain hangup, are defined as follows: /************************************ Alive Watch Level ***********************************/ #define CMEM_ALIVE_PATH_CHANGE 0x00 /**< SCFI path change */ #define CMEM_ALIVE_LV_1_PANIC 0x01 /**< Panel Request */ #define CMEM_ALIVE_LV_2_XIR 0x02 /**< Xir */ #define CMEM_ALIVE_LV_3_RESET 0x03 /**< reset */ #define CMEM_ALIVE_LV_4_FPOFF 0x04 /**< F-POFF */ #define CMEM_ALIVE_NO_ERROR 0xFF /**< Alive No error */ So for each step described above (msg-fail, panic-fail etc...), the system hangup level will be incremented. This information is also available in the alive_level BDB field. bash-3.00$ % dbdump -l cmem.current.current_domain_info[0].alive_level cmem.current.current_domain_info[0].alive_level = ff This section provides some more internal information for Sun employees to investigate hung domains. The Red log can be a very valuable information at time of diagnosing a hang-up situation. In some cases, probably in some rare situations, going through the SCF Traces may help to understand what happened. Note also that, using the Snapshot Analysis Toolset, you may use the off-platform viewer to read the 'showlogs obp' or 'showlogs detail' output. Using the Snapshot Analysis Toolset, it will also possible to read the Redlog which could contain some helpful information in the context of a Solaris[TM] hang. Redlog alone may not be very helpful but combined with a corefile, this might be decisive to determine the rootcause of a hang. The redlog information records the detail information at the time occurred the RED State exception. Reset traps like WDR, XIR, RED cause CPUs to enter RSTV(Reset Vector). When the RED_state trap or Watchdog Reset is generated while OS is operating, OBP records all content of various internal registers of CPU and TLB in SRAM. XSCF keeps the log only one generation for each CPU chip. So, it will be overwritten by next RED occurrence. The information collected is equivalent to the following OBP commands : * show-cpu-registers * show-regs&stack-all The RED log will be saved in the XSCF filesystem. There is one redlog file per chip (up to 8 strands) : red_log_xx_yy where : * xx: CMU number(0x00-0x0F) * yy: CPU Chip number in the CMU (0x00-0x03) Those files are collected by snapshot. Note : Only full snapshots (obtained by snapshot -L F) contain redlog information. Example from a M5000 snapshot : bash $ ls xscf_logs/scf/log/red_log_0* xscf_logs/scf/log/red_log_00_00 xscf_logs/scf/log/red_log_00_03 xscf_logs/scf/log/red_log_01_02 xscf_logs/scf/log/red_log_00_01 xscf_logs/scf/log/red_log_01_00 xscf_logs/scf/log/red_log_01_03 xscf_logs/scf/log/red_log_00_02 xscf_logs/scf/log/red_log_01_01 In each file, XSCF adds the header to the data sent by OBP as below: * 0x00-0x07: Timestamp * 0x08-0x09: LOG-ID * 0x0A-0x0F: Reserve * 0x0010- 0x200F: Strand0 RED log data sent by OBP * 0x2010- 0x400F: Strand1 RED log data sent by OBP * 0x4010- 0x600F: Strand2 RED log data sent by OBP * 0x6010- 0x800F: Strand3 RED log data sent by OBP * 0x8010- 0xA00F: Strand4 RED log data sent by OBP * 0xA010- 0xC00F: Strand5 RED log data sent by OBP * 0xC010- 0xE00F: Strand6 RED log data sent by OBP * 0xE010-0x1000F: Strand7 RED log data sent by OBP The log type can be : * WDR_LOG : Watchdog Reset * RED_LOG : RED state trap * XIR_LOG : XIR The snapshot analysis toolset (CLI and Web versions Off-platform/Showlogs/showlogs redlog) provide an off-platform viewer to read these logs. bash-3.2$ showlogs usage: showlogs [-t time [-T time]|-p timestamp] [-v|-V|-S] [-r] [-M] error [...] showlogs redlog [-chip ] [-d] [-c] [-x] [-l] [-t] [-s] [-v] [-nl] [-nt] [-nx] -d (debug) -c (cpuid) -x (expand TLB) -l (show local registers) -t (show TLB) -s (silent) -v (verbose) -nl (no local registers) -nt (no TLB informations) -nx (no expand TLB) Let's take a look at an example and let's see the information provided for the first 2 strands from the chip 01_02. chip 01_02 is the 3rd chip on CMU#1 CMU#1 Status:Normal; Ver:0101h; Serial:PP06446534 ; + FRU-Part-Number:CA06620-D002 B1 /371-2214-02 ; + Memory_Size:64 GB; [...] CPUM#2-CHIP#0 Status:Normal; Ver:0301h; Serial:PP072701K6 ; + FRU-Part-Number:CA06620-D024 A1 /371-2216-01 ; + Freq:2.400 GHz; Type:16; + Core:2; Strand:2; Where the XSB is assigned to the LSB 01 XSB R DID(LSB) Assignment Pwr Conn Conf Test Fault COD ---- - -------- ----------- ---- ---- ---- ------- -------- ---- 01-0 00(01) Assigned y y y Passed Normal n So the file red_log_01_02 will contain the information from the 4 strands from this CPUM : bash-3.2$ showlogs redlog -chip 01_02 | grep CPUID ******* shoe_oplredlog version 4.21 ******* CPUID : 030 (cpu48) CPUID : 031 (cpu49) CPUID : 032 (cpu50) CPUID : 033 (cpu51) Let's dump the log for the first strand. Note : the presence of the SFSR/SFAR value that can be decoded https://cores2-web.oraclecorp.com/cgi-bin/opltools/oplTools.cgi?SFAR=true https://cores2-web.oraclecorp.com/cgi-bin/opltools/oplTools.cgi?SFSR=true bash-3.2$ showlogs redlog -chip 01_02 ******* shoe_oplredlog version 4.21 ******* File : ./xscf_logs/scf/log/red_log_01_02 File_offset : 0x30 DATE : Oct 08 12:06:04.675 CEST 2008 Log-format : RDLJ_F10 reset-magic : XIR_LOG cpu-bitmap : [..127] f0f0f0f0f0f0f0f0.f0f0f0f000000000 : [..255] 0000000000000000.0000000000000000 : [..383] 0000000000000000.0000000000000000 : [..511] 0000000000000000.0000000000000000 CPUID : 030 (cpu48) HOSTID : 00000000.847c8a8a %tl = 00000000.00000001 %tba = 00000000.01000000 TT TPC TNPC TSTATE TL1: 03 00000000.01218f74 00000000.01218f78 00000000.80001607 TL2: d8 00000000.010077b4 00000000.010077b8 00000044.00001500 TL3: 68 00000000.01005988 00000000.0100598c 00000008.10001504 TL4: 00 00000000.00000000 00000000.00000000 00000000.00000000 TL5: 01 00000000.00000000 00000000.00000000 00000000.00000000 %ecr[10,4c] = 00000000.00000002 ( WEAK_ED ) %isfsr[18,50] = 00000000.00008008 ( No Error ) %isfpar[78,50] = 00000000.00000000 %dsfsr[18,58] = 00000000.00808007 ( FV OW W TM ASI:80 ) %dsfpar[78,58] = 00000000.00000000 %dsfar[20,58] = 00000000.ff343f08 %dfault-adr[30,58] = 00000601.03214000 %afsr[00,4c] = 00000000.00000000 ( No Error) %ugesr[08,4c] = 00000000.00000000 ( No Error) %stchger[18,4c] = 00000000.00000000 ( No Error ) %iiu-insttrap[00,60] = 00000000.00000000 %dev-serial[00,53] = 00009100.a6f0a105 (f-45598-47-3133-0 ) %pstate = 00000000.00000035 %ccr = 00000000.00000044 %asi = 00000000.00000015 %pil = 00000000.00000000 %y = 00000000.00000000 %fprs = 00000000.00000000 %softint = 00000000.00010000 %cwp = 00000000.00000007 %cansave = 00000000.00000005 %canrestore = 00000000.00000001 %otherwin = 00000000.00000000 %wstate = 00000000.0000000e %cleanwin = 00000000.00000007 %ver = 00040006.92000507 %int-vector0[40,7f] = 00000000.00000416 %int-vector2[50,7f] = 00000000.00000000 %int-vector4[60,7f] = 00000000.00000000 %jb-config[00,4a] = 00000000.0000a030 %pcontext[08,58] = 00000000.00000000 %scontext[18,58] = 00000000.00000000 %eidr[00,6e] = 00000000.00002030 %i8k-tsb-ptr[00,51] = 0000034f.e000b840 %i64k-tsb-ptr[00,52] = 0000034f.e000af00 %d8k-tsb-ptr[00,59] = 0000034f.e00090a0 %d64k-tsb-ptr[00,5a] = 0000034f.e000b210 %dcucr[00,45] = 00000000.00000000 %mcntl[08,45] = 00000000.00002000 %itagtarget[00,50] = 00000000.000001eb %itsb[28,50] = 0000034f.e0008001 %itsb-pext[48,50] = 7fffffff.fffff00f %itsb-next[58,50] = 7fffffff.fffff00f %dtagtarget[00,58] = 00000000.0018040c %dtsb[28,58] = 0000034f.e0008001 %dtsb-pext[48,58] = 7fffffff.fffff00f %dtsb-next[58,58] = 7fffffff.fffff00f %dtsb-sext[50,58] = 7fffffff.fffff00f %dtsb-direct[00,5b] = 0000038f.e002ba10 %va-wptr[38,58] = 00000000.00000000 %pa-wptr[40,58] = 00000000.00000000 %l2ctrl[10,6a] = 00000000.00000000 %asi-scratch0[00,4f] = ffffffff.ffffffff %asi-scratch1[08,4f] = 00000000.00000001 %asi-scratch2[10,4f] = 00000000.00000000 %asi-scratch3[18,4f] = 80c003ce.dfc6003f %asi-scratch4[20,4f] = ffffffff.ffffffff %asi-scratch5[28,4f] = 00000000.00000007 %asi-scratch6[30,4f] = 00000000.01218f78 %asi-scratch7[38,4f] = 00000000.00000000 Normal Alternate MMU Vector %g0: 00000000.00000000 00000000.00000000 00000000.00000000 00000000.00000000 %g1: 00000000.00000000 00000000.00000007 000003cf.fb8190a0 00000000.00000040 %g2: 00000000.018b8dd0 00000000.01218f78 00000601.03214000 00000000.00000006 [...] %g7: 000002a1.01a87ca0 00000000.80001600 000003cf.fe600000 00000000.00000030 %o0(CWP=7) = 00000600.f561c480 %l0(CWP=7) = 00000000.00000000 %o1(CWP=7) = 00010000.00000000 %l1(CWP=7) = 00000000.00000014 [...] %o6(CWP=7) = 000002a1.01a87131 %l6(CWP=7) = 00000000.00000030 %o7(CWP=7) = 00000000.0103d924 %l7(CWP=7) = 00000000.018b8f00 [...] %o0(CWP=0) = 00000a0e.5d30566c %l0(CWP=0) = 00000000.80001607 %o1(CWP=0) = 00000300.08a045b8 %l1(CWP=0) = 00000000.00000016 [...] %o6(CWP=0) = 000002a1.01a86fe1 %l6(CWP=0) = 00000000.00000000 %o7(CWP=0) = 00000000.0100ab9c %l7(CWP=0) = 000002a1.01a87890 fITLB SNI CC ### ----TAG--------- -----DATA------- CTX -------VA------- VZFEsw2 ----- PA----- swLPVEPWG 000 00000000f0000000 e00003cfffc00064: 000000000000f0000000 1300000 03cfffc00000 001100100 001 00000000fe9c205b e000034ff4000020: 005b 00000000fe9c2000 1300000 034ff4000000 000100000 [...] 01e 000000010241410a e000034fe5800020: 010a 0000000102414000 1300000 034fe5800000 000100000 01f 0000000001000000 e00003cffe800064: 0000 0000000001000000 1300000 03cffe800000 001100100 fDTLB SNI CC ### ----TAG--------- -----DATA------- CTX -------VA------- VZFEsw2 ----- PA----- swLPVEPWG 000 ffffffffffffc000 800003cfffbde066: 0000 ffffffffffffc000 1000000 03cfffbde000 001100110 001 00000000f0000000 e00003cfffc00066: 0000 00000000f0000000 130000003cfffc00000 001100110 002 00000000fff10000 a00003cfffbf0066: 0000 00000000fff10000 1100000 03cfffbf0000 001100110 [...] 01e 0000000001000000 e00003cffe800064: 0000 0000000001000000 1300000 03cffe800000 001100100 01f 0000000001800000 e00003cffd800066: 0000 0000000001800000 1300000 03cffd800000 001100110 PC: FJSV,SPARC64-VI:cpu_halt_cpu+4 Last leaf: call FJSV,SPARC64-VI:cpu_halt_cpu from unix:cpu_halt+188 0 w %o0-%o7: (600f561c480 1000000000000 f0e0f0f070f0f0e f0e0f0f070f0f0e 0 1b 2a101a87131 103d924 ) jmpl unix:cpu_halt from unix:idle+128 1 w %o0-%o7: (16 18b8dd0 16 30 30008a04000 1 2a101a871e1 105b040 ) jmpl unix:idle from unix:thread_start+4 2 w %o0-%o7: (1832000 030008a04000 ffffffffffffffff 19 1831000 2a101a87291 1046f48 ) alive, check, monitor, hang, domain, panic, hung, scfd.conf, reset, por, xir, unresponsive, Mx000, M3000, M4000, M5000, M8000, M9000 Attachments This solution has no attachment |
||||||||||||
|