Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Problem Resolution Sure Solution 1007700.1 : Sun Fire[TM] V20z/V40z Northbridge Gart TLB Errors in Red Hat/SuSE
PreviouslyPublishedAs 210670 Symptoms Customer is running Red Hat Linux and is getting Northbridge errors that point to the GART TLB. The system remains functional and the errors repeat throughout the syslog. The errors are similar to the following example: Oct 13 11:19:14 srdopt3 kernel: CPU 0: Silent Northbridge MCE Oct 13 11:19:14 srdopt3 kernel: Northbridge status a60000010005001b Oct 13 11:19:14 srdopt3 kernel: GART TLB error generic level generic Oct 13 11:19:14 srdopt3 kernel: extended error gart error Oct 13 11:19:14 srdopt3 kernel: link number 0 Oct 13 11:19:14 srdopt3 kernel: err cpu1 Oct 13 11:19:14 srdopt3 kernel: processor context corrupt Oct 13 11:19:14 srdopt3 kernel: error address valid Oct 13 11:19:14 srdopt3 kernel: error uncorrected Oct 13 11:19:14 srdopt3 kernel: previous error lost Oct 13 11:19:14 srdopt3 kernel: error address 00000000e7f6a050 Resolution It is reported as an accepted BUG by Red Hat, which was solved with the kernel version 2.4.21-19. This version equals the Red Hat Enterprise Linux 3 (Update 3). Please make sure that at least this version of Red Hat is running on the machine in question. If, after the kernel update the error message still appears in the "messages", in the BIOS Advanced menu, there is an option named "No Spec. TLB Reload". By default, this setting is disabled, which allows TLB reload. If the above error message is encountered ensure that "No Spec. TLB reload" is ENABLED in the Advance BIOS setting. To accomplish this, reboot the server then press F2 to enter BIOS setup. Then proceed to Avanced -> Chipset Configuration BIOS menu Finally use the arrow keys to scroll down and modify the "No Spec. TLB reload" field to Enabled. Press Esc and then F10 to save your changes. PhoenixBIOS Setup Utility Advanced +------------------------------------------------------------------------------+ | Chipset Configuration | Item Specific Help | |----------------------------------------------------+-------------------------| | | | | Setting items on this menu to incorrect ^| Override | | values may cause your system to malfunction. :| Speculative TLB Reload | | :| (Disable to permit TLB | | SRAT Table [Enabled] :| speculative reloads). | | Node Interleave: [Disabled] :| | | Bank Interleave: [Auto] :| For best performance, | | :| leave this disabled. | | ECC: [Enabled] :| | | Dram ECC: [Enabled] :| | | ECC Scrub Redirection [Enabled] :| | | Chip-Kill: [Enabled] :| | | DCACHE ECC Scrub CTL: [5.12 us] :| | | L2 ECC Scrub CTL: [10.2 us] :| | | Dram ECC Scrub CTL: [163.8 us] :| | | No Spec. TLB Reload: [Enabled] :| | | | | +------------------------------------------------------------------------------+ Fsc Exit <> Select Menu Enter Select > Sub-Menu F10 Save and Exit This will disallow TLB reloading and avoid the error message. If the kernel is updated, and the BIOS setting is changed, and still the errors appear in the "messages", the "IOMMU" needs to be booted without AGP support. After adding "iommu=noagp" (no quotes) as parameter to the kernel boot options (either at boot time or in /boot/grub/menu.lst) reboot the server to check that the message disappeared. Additional Information Always ensure that the latest BIOS is installed. For more information regarding decoding machine check errors. Understanding and Decoding Machine Check Errors on Sun Fire[TM] V20z/Sun Fire[TM] V40z running Red Hat OS Product Sun Fire V20z Server Sun Fire V40z Server SuSE Linux Enterprise Server 9 x64/x86 Software SuSE Linux Enterprise Server 8 x86 Software Red Hat Enterprise Linux 3 x64 Red Hat Enterprise Linux 4 Red Hat Enterprise Linux 3 x86 Internal Comments For the internal use of Sun Employee's. Several different but connected solutions for this problem exist. please see technical instruction < Document: 1018142.1 > V20z, V40z, Red Hat, Northbridge, GART TLB Previously Published As 80746 Change History Date: 2005-05-04 User Name: 71396 Action: Approved Comment: Performed final review of article. No changes required. Publishing. Version: 7 Date: 2005-04-27 User Name: 71396 Action: Accept Comment: Version: 0 Attachments This solution has no attachment |
||||||||||||
|