![]() | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Technical Instruction Sure Solution 1012598.1 : Understanding and Decoding Machine Check Errors on Opteron systems running Solaris[TM] Operating System for x86 Platforms
PreviouslyPublishedAs 217340
Applies to:Sun Java Workstation W1100z - Version: Not Applicable to Not Applicable - Release: N/A to N/ASun Fire X2100 M2 Server - Version: Not Applicable to Not Applicable [Release: N/A to N/A] Sun Java Workstation W2100z - Version: Not Applicable to Not Applicable [Release: N/A to N/A] Sun Fire X2100 Server - Version: Not Applicable to Not Applicable [Release: N/A to N/A] Sun Fire V20z Server - Version: Not Applicable to Not Applicable [Release: N/A to N/A] All Platforms GoalThe machine check mechanism on the AMD64 processor allows it to detect and report hardware errors.When an unrecoverable Machine Check Errors is detected, a Machine Check Exception is generated. When such condition occurs, Solaris[TM] Operating System for x86 Platforms can panic with a Trap of type 0x12 (Machine check exception). When the Solaris{TM} Operating System (OS) receives a Machine check exception, it shows MCE warning messages just before the panic. Examples of the warning: WARNING: MCE: Bank 0: error code 15:addr = cf53c000, model errcode = 0 The meaning of MCE messages depends on processors. This document explains how to understand MCE messages on AMD Opteron based systems. SolutionUnderstanding and Decoding Machine Check Errors on Opteron systems running Solaris[TM] Operating System for x86 Platforms. An MCE message has 3 or 4 values; bank, error code, address (not always exist) and model error code. All values are displayed as hexadecimal numbers regardless of existence of the prefix "0x". WARNING: MCE: Bank 0: error code 15:addr = cf53c000, model errcode = 0 - Bank (THIS HAS NOTHING TO DO WITH MEMORY BANK) Opteron processors have five error reporting banks associated with specific hardware blocks.
These banks correspond to Bank 0-4 in order. - Error code Error code field shows 16bit MCA (Machine Check Architecture) Error Code contained in Machine Check Status Registers. The MCA Error Code has the following format. Error Value Error Type Description Transaction Type Bits (TT) Cache Level Bits (LL) Participation Processor Bits (PP) Time-out Bit (T) Memory Transaction Type Bits (RRRR) Memory or I/O Bits (II) - Address Address field shows the address where the Machine check exception occurs. - Model error code Model error code field shows 4bit Extended Error Code contained in Machine Check Status Registers. The meaning is: DC/IC BU LS Reserved NB Detailed information to decode the message can be obtained from AMD's document "BIOS and Kernel Developer's Guide for AMD Athlon 64 and AMD Opteron Processors" Decode examples: WARNING: MCE: Bank 0: error code 15:addr = cf53c000, model errcode = 0 TLB error at L1 cache detected by Data Cache Unit. WARNING: MCE: Bank 2: error code 152:addr = e748, model errcode = 2 Tag parity error during an instruction fetch at L2 cache detected by Bus Unit. WARNING: MCE: Bank 4: error code 0xf0f, mserrcode = 0x7 Watchdog error detected by Northbridge. References BIOS and Kernel Developer's Guide for AMD Athlon 64 and AMD Opteron Processors AMD64 Architecture Programmer's Manual Volume 2: System Programming Other MCE Related InfoDocs Technical Instruction - How to analyze Memory Errors on x64 Servers running Linux using HERDTechnical Instruction - Sun Fire[TM] V20z/V40z Northbridge Gart TLB Errors in Red Hat/SuSE Product Solaris 9 Operating System for x86 Platforms Solaris 10 Operating System for x86 Platforms Sun Fire X4200 Server Sun Fire X4100 Server Sun Fire X2100 Server Sun Fire V40z Server Sun Fire V20z Server Sun Java Workstation W2100z Sun Ultra 20 Workstation MCE, x86, opteron, x64, amd64 Previously Published As 82833 Change History Date: 2005-12-14 User Name: 97961 Action: Approved Comment: Publishing. No further edits required. Version: 7 Date: 2005-12-14 User Name: 97961 Action: Accept Comment: Version: 0 Date: 2005-12-14 User Name: 105028 Action: Approved Comment: Checked all links, they all work now. Version: 0 References<NOTE:1007700.1> - Sun Fire[TM] V20z/V40z Northbridge Gart TLB Errors in Red Hat/SuSE<NOTE:1019683.1> - How to analyze Memory Errors on x64 Servers running Linux using HERD Attachments This solution has no attachment |
||||||||||||
|