Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1007700.1
Update Date:2009-03-17
Keywords:

Solution Type  Problem Resolution Sure

Solution  1007700.1 :   Sun Fire[TM] V20z/V40z Northbridge Gart TLB Errors in Red Hat/SuSE  


Related Items
  • Sun Fire V20z Server
  •  
  • Red Hat Enterprise Linux x64
  •  
  • Red Hat Enterprise Linux x86
  •  
  • Sun Fire V40z Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>x64 Servers
  •  

PreviouslyPublishedAs
210670


Symptoms
Customer is running Red Hat Linux and is getting Northbridge errors that point to the GART TLB.

The system remains functional and the errors repeat throughout the syslog.

The errors are similar to the following example:

 Oct 13 11:19:14 srdopt3 kernel: CPU 0: Silent Northbridge MCE
Oct 13 11:19:14 srdopt3 kernel: Northbridge status a60000010005001b
Oct 13 11:19:14 srdopt3 kernel:     GART TLB error generic level generic
Oct 13 11:19:14 srdopt3 kernel:     extended error gart error
Oct 13 11:19:14 srdopt3 kernel:     link number 0
Oct 13 11:19:14 srdopt3 kernel:     err cpu1
Oct 13 11:19:14 srdopt3 kernel:     processor context corrupt
Oct 13 11:19:14 srdopt3 kernel:     error address valid
Oct 13 11:19:14 srdopt3 kernel:     error uncorrected
Oct 13 11:19:14 srdopt3 kernel:     previous error lost
Oct 13 11:19:14 srdopt3 kernel:     error address 00000000e7f6a050


Resolution
It is reported as an accepted BUG by Red Hat, which was solved with the kernel version 2.4.21-19. This version equals the Red Hat Enterprise Linux 3 (Update 3). Please make sure that at least this version of Red Hat is running on the machine in question.

If, after the kernel update the error message still appears in the "messages", in the BIOS Advanced menu, there is an option named "No Spec. TLB Reload". By default, this setting is disabled, which allows TLB reload.

If the above error message is encountered ensure that "No Spec. TLB reload" is ENABLED in the Advance BIOS setting.

To accomplish this, reboot the server then press F2 to enter BIOS setup.

Then proceed to Avanced -> Chipset Configuration BIOS menu

Finally use the arrow keys to scroll down and modify the "No Spec. TLB reload" field to Enabled. Press Esc and then F10 to save your changes.

  PhoenixBIOS Setup Utility
Advanced
+------------------------------------------------------------------------------+
|              Chipset Configuration                 |   Item Specific Help    |
|----------------------------------------------------+-------------------------|
|                                                    |                         |
|   Setting items on this menu to incorrect         ^| Override                |
|   values may cause your system to malfunction.    :| Speculative TLB Reload  |
|                                                   :| (Disable to permit TLB  |
|   SRAT Table             [Enabled]                :| speculative reloads).   |
|   Node Interleave:       [Disabled]               :|                         |
|   Bank Interleave:       [Auto]                   :| For best performance,   |
|                                                   :| leave this disabled.    |
|   ECC:                   [Enabled]                :|                         |
|   Dram ECC:              [Enabled]                :|                         |
|   ECC Scrub Redirection  [Enabled]                :|                         |
|   Chip-Kill:             [Enabled]                :|                         |
|   DCACHE ECC Scrub CTL:  [5.12 us]                :|                         |
|   L2 ECC Scrub CTL:      [10.2 us]                :|                         |
|   Dram ECC Scrub CTL:    [163.8 us]               :|                         |
|   No Spec. TLB Reload:   [Enabled]                :|                         |
|                                                    |                         |
+------------------------------------------------------------------------------+
Fsc  Exit  <>  Select Menu  Enter  Select > Sub-Menu  F10  Save and Exit  

This will disallow TLB reloading and avoid the error message.

If the kernel is updated, and the BIOS setting is changed, and still the errors appear in the "messages", the "IOMMU" needs to be booted without AGP support.

After adding "iommu=noagp" (no quotes) as parameter to the kernel boot options (either at boot time or in /boot/grub/menu.lst) reboot the server to check that the message disappeared.



Additional Information
Always ensure that the latest BIOS is installed.

For more information regarding decoding machine check errors. Understanding and Decoding Machine Check Errors on Sun Fire[TM] V20z/Sun Fire[TM] V40z running Red Hat OS



Product
Sun Fire V20z Server
Sun Fire V40z Server
SuSE Linux Enterprise Server 9 x64/x86 Software
SuSE Linux Enterprise Server 8 x86 Software
Red Hat Enterprise Linux 3 x64
Red Hat Enterprise Linux 4
Red Hat Enterprise Linux 3 x86

Internal Comments
For the internal use of Sun Employee's.

Several different but connected solutions for this problem exist.

Please try one after the other for a sure solution.


please see technical instruction < Document: 1018142.1 >

V20z, V40z, Red Hat, Northbridge, GART TLB
Previously Published As
80746

Change History
Date: 2005-05-04
User Name: 71396
Action: Approved
Comment: Performed final review of article.

No changes required.

Publishing.
Version: 7
Date: 2005-04-27
User Name: 71396
Action: Accept
Comment:
Version: 0

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback