x86/x64 Systems With Intel Xeon E7 Family Processors (Westmere-EX) May Panic During an MCE Event

Asset ID:	1-72-1482528.1
Update Date:	2012-08-28
Keywords:

Solution Type Problem Resolution Sure

Solution 1482528.1 : x86/x64 Systems With Intel Xeon E7 Family Processors (Westmere-EX) May Panic During an MCE Event

Applies to:

3rd-Party Hardware - Version Not Applicable to Not Applicable [Release N/A]
Sun Fire X4470 Server - Version Not Applicable to Not Applicable [Release N/A]
Solaris x64/x86 Operating System - Version 10 3/05 to 10 8/11 U10 [Release 10.0]
Oracle Solaris on x86-64 (64-bit)

Symptoms

NOTE: This issue only affects x86/x64 systems using the Intel Xeon E7 (Westmere-EX) Family of processors. SPARC and other Intel/AMD Family Processors are not affected by this issue.

 Use "prtdiag -v" to confirm the processor family, eg:

# prtdiag -v
System Configuration: HP ProLiant DL580 G7
BIOS Configuration: HP P65 05/23/2011
BMC Configuration: IPMI 2.0 (KCS: Keyboard Controller Style)

==== Processor Sockets ====================================

Version                          Location Tag
-------------------------------- --------------------------
 Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz Proc 1
 Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz Proc 2
 Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz Proc 3
 Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz Proc 4

--- 8< ---

When the issue is encountered, the system will panic with the following:

  unix: [ID 836849 kern.notice]
  ^Mpanic[cpu32]/thread=fffffe80011b6c60:
  genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=fffffe80011b6930 addr=c68b9c88 occurred in module "unix" due to an illegal access to a user address
  unix: [ID 100000 kern.notice]
  unix: [ID 839527 kern.notice] sched:
  unix: [ID 753105 kern.notice] #pf Page fault
  unix: [ID 532287 kern.notice] Bad kernel fault at addr=0xc68b9c88
  unix: [ID 243837 kern.notice] pid=0, pc=0xfffffffffb804520, sp=0xfffffe80011b6a28, eflags=0x10282
  unix: [ID 211416 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
  unix: [ID 354241 kern.notice] cr2: c68b9c88 cr3: 1bcba000 cr8: d
  unix: [ID 592667 kern.notice]     rdi:         c68b9c48 rsi:                9 rdx:                0
  unix: [ID 592667 kern.notice]     rcx:                0  r8:                0  r9:                0
  unix: [ID 592667 kern.notice]     rax: fffffe80011b6aec rbx:         c68b9c48 rbp: fffffe80011b6a30
  unix: [ID 592667 kern.notice]     r10: fffffffffb837d98 r11:                0 r12:                0
  unix: [ID 592667 kern.notice]     r13:                0 r14:                9 r15: fffffec307cd2420
  unix: [ID 592667 kern.notice]     fsb:                0 gsb: ffffffffc68b5800  ds:               43
  unix: [ID 592667 kern.notice]      es:               43  fs:                0  gs:              1c3
  unix: [ID 592667 kern.notice]     trp:                e err:                0 rip: fffffffffb804520
  unix: [ID 592667 kern.notice]      cs:               28 rfl:            10282 rsp: fffffe80011b6a28
  unix: [ID 266532 kern.notice]      ss:                0
  unix: [ID 100000 kern.notice]
  genunix: [ID 655072 kern.notice] fffffe80011b6840 unix:die+da ()
  genunix: [ID 655072 kern.notice] fffffe80011b6920 unix:trap+5e6 ()
  genunix: [ID 655072 kern.notice] fffffe80011b6930 unix:_cmntrap+140 ()
  genunix: [ID 655072 kern.notice] fffffe80011b6a30 unix:cmi_hdl_getspecific+0 ()
  genunix: [ID 655072 kern.notice] fffffe80011b6a60 unix:cms_ereport_detector+2c ()
  genunix: [ID 655072 kern.notice] fffffe80011b6b80 cpu.generic:gcpu_ereport_post+182 ()
  genunix: [ID 655072 kern.notice] fffffe80011b6bb0 cpu.generic:gcpu_mca_drain+57 ()
  genunix: [ID 655072 kern.notice] fffffe80011b6bf0 genunix:errorq_drain+f6 ()
  genunix: [ID 655072 kern.notice] fffffe80011b6c00 genunix:errorq_intr+9 ()
  genunix: [ID 655072 kern.notice] fffffe80011b6c40 unix:av_dispatch_softvect+62 ()
  genunix: [ID 655072 kern.notice] fffffe80011b6c50 unix:intr_thread+b4 ()
  unix: [ID 100000 kern.notice]

 
Another example of a panicing thread viewed using the Solaris Crash Analysis Tool:  
Please refer to 'Oracle Solaris Crash Analysis Tool(SCAT) - Information Center (Doc ID 1381679.1)' for download links and further information about this tool.

CAT(vmcore.1/10X)> panic
panic on CPU 44
panic string:   BAD TRAP: type=e (#pf Page fault) rp=fffffe80015c0930 addr=807b7828 occurred in module "unix" due to an illegal access to a user address
==== panic interrupt thread: 0xfffffe80015c0c60  PID: 0  on CPU: 44  affinity CPU: 44 (last_swtch: -0.18s)  PIL: 1 ====
cmd: sched
t_procp: 0xfffffffffbc27460(proc_sched)
  p_as: 0xfffffffffbc290e0(kas)
  zone: global
t_stk: 0xfffffe80015c0c50  sp: 0xfffffe80015c06f0  t_stkbase: 0xfffffe80015bc000
t_pri: 160(SYS)  pctcpu: 0.000000
t_lwp: 0x0  psrset: 0  last CPU: 44
idle: 6714853 ticks (18 hours 39 minutes 8.53 seconds)
start: Thu May 31 16:44:21 2012
age: 585494 seconds (6 days 18 hours 38 minutes 14 seconds)
interrupted (pinned) thread: 0xfffffe800158ac60
tstate: TS_ONPROC - thread is being run on a processor
tflg:   T_INTR_THREAD - thread is an interrupt thread
        T_TALLOCSTK - thread structure allocated from stk
        T_PANIC - thread initiated a system panic
tpflg:  none set
tsched: TS_LOAD - thread is in memory
        TS_DONT_SWAP - thread/LWP should not be swapped
        TS_SIGNALLED - thread was awakened by cv_signal()
pflag:  SSYS - system resident process

pc:      unix:vpanic_common+0x165:  addq   $0xf0,%rsp
startpc: genunix:thread_create_intr+0x0:  pushq  %rbp

unix:vpanic_common+0x165()
unix:0xfffffffffb85c902()
unix:die+0xda(, , , )
unix:trap+0x5e6(, , )
unix:cmntrap+0x140()
-- panic trap data  type: 0xe (Page fault)
  addr        0x807b7828  rp   0xfffffe80015c0930
  trapno     0xe (Page fault)
  err          0 (page not present,read,supervisor)
  %rfl   0x10282 (negative|interrupt enable|resume)
  savbp 0xfffffe80015c0a30
 savip unix:cmi_hdl_getspecific+0x0: movq 0x40(%rdi),%rax

  %rbp  0xfffffe80015c0a30  %rsp  0xfffffe80015c0a28
  %rip  unix:cmi_hdl_getspecific+0x0:  movq   0x40(%rdi),%rax

  0%rdi         0x807b77e8  1%rsi                0x9  2%rdx                  0
  3%rcx                  0  4%r8                   0  5%r9                   0

  %rax  0xfffffe80015c0aec  %rbx          0x807b77e8
  %r10  0xfffffffffb837d98  %r11                   0  %r12                   0
  %r13                   0  %r14                 0x9  %r15  0xfffffec6178126f8
  %cs       0x28 (KCS_SEL)        %ds       0x43 (UDS_SEL)
  %es       0x43 (UDS_SEL)        %ss       0x30 (KDS_SEL)
  %fs          0 (KFS_SEL)        %gs      0x1c3 (LWPGS_SEL)
  fsbase                  0
  gsbase 0xffffffffc81fe000(*unix(data):panic_bound_cpu)
<trap>unix:cmi_hdl_getspecific+0x0()
unix:cms_ereport_detector+0x2c(, , , )
cpu.generic:gcpu_ereport_post+0x182(, , , , )
cpu.generic:gcpu_mca_drain+0x57(, , )
genunix:errorq_drain+0xf6()
genunix:errorq_intr+0x9()
unix:av_dispatch_softvect+0x62()
unix:dosoftint+0x32()
-- end of interrupt thread's stack --

It is also possible to see stack corruption, for example:


CAT(vmcore.0/10X)> panic
panic on CPU 77
panic string:   BAD TRAP: type=e (#pf Page fault) rp=fffffe80020207e0 addr=fba1620a occurred in module "<unknown>" due to an illegal access to a user address
==== panic interrupt thread: 0xfffffe8002020c60  PID: 0  on CPU: 77  affinity CPU: 77 (last_swtch: 0s)  PIL: 1 ====
cmd: sched
t_procp: 0xfffffffffbc27460(proc_sched)
  p_as: 0xfffffffffbc290e0(kas)
  zone: global
t_stk: 0xfffffe8002020c50  sp: 0xfffffe80020205a0  t_stkbase: 0xfffffe800201c000
t_pri: 160(SYS)  pctcpu: 0.000000
t_lwp: 0x0  psrset: 0  last CPU: 77  
idle: 0 ticks (0 seconds)
start: Tue Apr 17 08:31:23 2012
age: 122206 seconds (1 days 9 hours 56 minutes 46 seconds)
tstate: TS_ONPROC - thread is being run on a processor
tflg:   T_INTR_THREAD - thread is an interrupt thread
        T_TALLOCSTK - thread structure allocated from stk
        T_PANIC - thread initiated a system panic
tpflg:  none set
tsched: TS_LOAD - thread is in memory
        TS_DONT_SWAP - thread/LWP should not be swapped
        TS_SIGNALLED - thread was awakened by cv_signal()
pflag:  SSYS - system resident process

pc:      unix:vpanic_common+0x165:  addq   $0xf0,%rsp
startpc: genunix:thread_create_intr+0x0:  pushq  %rbp

unix:vpanic_common+0x165()
unix:0xfffffffffb85c902()
unix:die+0xda(, , , )
unix:trap+0x5e6(, , )
unix:_cmntrap+0x140()
-- panic trap data  type: 0xe (Page fault)
  addr        0xfba1620a  rp   0xfffffe80020207e0
  trapno     0xe (Page fault)
  err       0x10 (exec on NX page,read,supervisor)
  %rfl   0x10246 (parity|zero|interrupt enable|resume)
  savbp 0xfffffe80020208f0
 savip 0xfba1620a (invalid text addr) <<--- This is the address we fail on 

  %rbp  0xfffffe80020208f0  %rsp  0xfffffe80020208d0
  %rip  0xfba1620a (invalid text addr)

  0%rdi 0xfffffe80020209f0  1%rsi 0xfffffec30e433460  2%rdx                  0
  3%rcx 0xfffffec30e433460  4%r8                 0x2  5%r9  0xfffffec3069cdf08

  %rax                   0  %rbx  0xfffffec30bb66178
  %r10  0xfffffe8002020a48  %r11  0xfffffffffbd14760  %r12  0xfffffe80020209f0
  %r13  0xfffffe8002020908  %r14  0xfffffe8002020a48  %r15  0xfffffec3069cdf08
  %cs       0x28 (KCS_SEL)        %ds       0x43 (UDS_SEL)
  %es       0x43 (UDS_SEL)        %ss       0x30 (KDS_SEL)
  %fs          0 (KFS_SEL)        %gs      0x1c3 (LWPGS_SEL)
  fsbase                  0
  gsbase 0xffffffffc7403800(*unix(data):panic_bound_cpu)
<trap>0xfba1620a()
genunix:nvs_native_nvp_size+0x67(, , )
genunix:nvs_getsize_pairs+0x33(, , )
genunix:nvs_operation+0x8a(, , )
genunix:nvs_native+0x5e(, , , )
-- end of interrupt thread's stack --

### Using MDB finds the correct stack

> *panic_thread::findstack
stack pointer for thread fffffe8002020c60: fffffe80020209e0
  fffffe8002020a10 nvlist_common+0xb6()
  fffffe8002020a20 nvlist_size+0x16()
  fffffe8002020a60 fm_ereport_post+0x24()
  fffffe8002020b80 gcpu_ereport_post+0x2ee()
  fffffe8002020bb0 gcpu_mca_drain+0x57()
  fffffe8002020bf0 errorq_drain+0xf6()
  fffffe8002020c00 errorq_intr+9()
  fffffe8002020c40 av_dispatch_softvect+0x62()
  fffffe8002020c50 dosoftint+0x32()

In both scenarios, the system is performing a savip operation on the address which appears to be a 32-bit address instead of a 64-bit address as we'd expect, ie: the top bits are 
all zeros.  In either case, the important part of the stack is:

gcpu_ereport_post()
gcpu_mca_drain()
errorq_drain()
errorq_intr()

 
Leading up to the panic, the system will be reporting Machine Check (MC) Events:

# fmdump -e

May 31 00:01:32.5310 ereport.cpu.intel.mc

May 31 00:06:46.2355 ereport.cpu.intel.mc

May 31 00:19:49.2460 ereport.cpu.intel.mc

May 31 00:20:44.2791 ereport.cpu.intel.mc

May 31 00:25:30.6849 ereport.cpu.intel.mc

May 31 00:50:46.8517 ereport.cpu.intel.mc

May 31 00:50:46.8517 ereport.cpu.intel.mc

May 31 00:50:46.8518 ereport.cpu.intel.mc

May 31 00:50:46.8522 ereport.cpu.intel.mc

All the MC Events will be the same:

# fmdump -eV

May 31 2012 00:50:46.852268480 ereport.cpu.intel.mc

nvlist version: 0

        class = ereport.cpu.intel.mc

        ena = 0x2ab437a482230001

        detector = (embedded nvlist)

        nvlist version: 0

                version = 0x0

                scheme = hc

                hc-list = (array of embedded nvlists)

                (start hc-list[0])

                nvlist version: 0

                        hc-name = motherboard

                        hc-id = 0

                (end hc-list[0])

                (start hc-list[1])

                nvlist version: 0

                        hc-name = chip

                        hc-id = 3

                (end hc-list[1])

                (start hc-list[2])

                nvlist version: 0

                        hc-name = core

                        hc-id = 8

                (end hc-list[2])

                (start hc-list[3])

                nvlist version: 0

                        hc-name = strand

                        hc-id = 0

                (end hc-list[3])



        (end detector)



        compound_errorname = MC_CH_GEN_ERR

        IA32_MCG_STATUS = 0x0

        machine_check_in_progress = 0

        bank_number = 0x9

        bank_msr_offset = 0x424

        IA32_MCi_STATUS = 0xd0000080000a008f

        overflow = 1

        error_uncorrected = 0

        error_enabled = 1

        processor_context_corrupt = 0

        error_code = 0x8f

        model_specific_error_code = 0xa

        threshold_based_error_status = No tracking

        cap_support_recovery = 1

        signal_mce = 0

        attention_to_recover = 0

        __ttl = 0x1

        __tod = 0x4fc641d6 0x32cc95c0

NOTE: This type of MCE is not a hardware error. No hardware needs to be replaced. They occur on systems when the CPU C-State Changes. This is expected behaviour from Intel for this processor family. HP have an advisory for their systems, see http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c03282091

Changes

The prescribed issue occurs more frequently on idle systems or systems with a very low workload. The low workload pattern causes the CPUs to frequently switch between C-State levels. Each time this occurs, an MCE is generated, thus increasing the number of MCE's and further increasing the chances of the system panicing.

Cause

Without the fix described in the 'Solution' section below, during a Machine Check Event (MCE), Solaris will issue an asynchronous cross-call call to the other CPUs. This can sometimes lead to a race condition where the local CPU passes two stack variables to the remote CPU to read its IA32_MSR_MCG_CAP value. The MSR value and function return result can be saved into stack long after the stack frame is no longer valid. This causes the “current” stack location to become corrupted. If the “current”, now corrupted, stack location contained a pointer, use of this pointer will cause a Kernel panic. The data structure pointer is truncated (High 32bits are filled with ZEROs) on the stack, which leads to an invalid kernel address access, resulting in a page fault trap and hence causes system pancs.

Solution

The issue was identified as <SunBug 6991949>. Because all CPUs within the system have the same capabilities, the fix changed the behaviour such that the value of IA32_MSR_MCG_CAP is read once and only once during system boot and stored in a global variable within the Kernel for future reference. This not only prevents the panics by no longer having to do cross-calls, but also improves performance because cross-calls are an expensive operation.

<SunBug 6991949> is fixed in the following releases:

SPARC Platform

Not applicable

x86 Platform

Solaris 10 with patch 144501-10 or later

Unified Storage Appliances (S7000)

Fishworks OS ak-2011.04.24 or later

NOTE: For HP Proliant DL580 G7 systems, HP recommends a minimum BIOS version of 2012.04.20 in addition to the Solaris Patches.

References

Attachments

This solution has no attachment