Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type FAB (standard) Sure Solution 1019279.1 : False CPU Thermal Trip errors are being seen on Sun Blade X6250.
PreviouslyPublishedAs 238104 Bug Id <SUNBUG: 6671702>, <SUNBUG: 6691825> Product Sun Blade X6250 Server Module Date of Resolved Release 23-May-2008 Thermal Trip errors on Sun Blade X6250 Modules (see details below). Affected X-Options: X4511A 1.86 GHz CPU, Xeon E5320 Quad-Core, 80W X4512A 1.60 GHz CPU, Xeon L5310 Quad Core, 50W X4513A 2.33 GHz CPU, Xeon E5345 Quad Core, 80W X4514A 2.66 GHz CPU, Xeon X5355 Quad Core, 120W X4515A 3.00 GHz CPU, Xeon X5365 Quad Core, 120W X4517A 2.50 GHz CPU, Xeon L5420 Quad-Core, 50W X4518A 2.33 GHz CPU, Xeon E5410 Quad Core, 80W X4519A 2.83 GHz CPU, Xeon E5440 Quad Core, 80W X4520A 3.16 GHz CPU, Xeon X5460 Quad Core, 120W Affected Parts: All Intel 5300/5400 series processors (Clovertown/Harpertown) CPUs are affected. ImpactSystems are exhibiting false Thermal Trip errors on CPU modules. On some systems there will just be false errors within the ELOM SEL log, with others the CPU will also falsely show disabled within the ELOM cli/gui. Please make note that if experiencing this issue all of these signs noted above are false, and the host OS will still show all CPUs and cores online and functioning normally.To help determine a real event from a false event the below paragraph is a description of how a system will behave if a real Thermal Event took place. As far as the ELOM is concerned the error messages will be the same as a false Thermal Trip except you should notice that the host system is rebooting right after the time of the Thermal Trip error, and not before. Where in these false events the Thermal Trip errors take place just after a normal reboot of the host, or randomly while the host system is up and running with no reboot associated. The host in the presence of a real event will reboot unexpectedly, and the OS will not show the CPUs as available if the CPU was truly disabled by the BIOS. This has become a service issue because engineers have been replacing Blades incorrectly due to these false failure signs. This has in turn caused a shortage of X6250 Blades in some Regions. Contributing FactorsAll Sun Blade X6250 Server Modules running with earlier than firmware/BIOS SW1.3 are impacted by this issue.SymptomsExample SEL log error listed below;Nonrecoverable ,2008/03/06 15:04:01 ,Processor 0 thermal trip detected Example ipmitool sel elist output below; # ipmitool -H x.x.x.x -U root sel elist 53 | 12/05/2007 | 14:15:43 | Processor Processor 0 | Thermal Trip | Asserted 8b | 01/28/2008 | 08:46:14 | Processor Processor 0 | Thermal Trip | Asserted c6 | 02/26/2008 | 11:47:15 | Processor Processor 0 | Thermal Trip | Asserted On systems that show CPUs disabled within the ELOM you will see; -> show CPU0 /SYS/CPU/CPU0 Targets: Properties: Designation = CPU 0 Manufacturer = Intel Name = Clovertown Speed = 2333MHz Status = disabled Root CauseThere are two root causes to this issue. The first which includes Thermal Trips on Clovertown CPUs is due to an ELOM firmware issue. During Power ON/OFF of the system there is some signal noise on LN93 which ELOM is reading as a false thermal trip. To resolve ELOM has added temperature judgement to fix the false thermal trip readings.The second which includes the Thermal Trips on Harpertown CPUs is due to a BIOS issue. Recent changes within the ELOM command structure have caused BIOS to be out of sync with the structure changes which causes the CPUs to show disabled. This will be resolved by syncing up BIOS and ELOM command structure. Corrective ActionWorkaround:If it is determined the Thermal Trip errors being experienced are false then the errors can be ignored and nothing should be done until the SW release resolving this issue is available. Resolution: This issue will be resolved with the next release of firmware/BIOS SW1.3. References: Escalation ID: 65875681, 1-23543493 For information about FAB documents, its release processes, implementation strategies and billing information, go to the following URL: For Sun Authorized Service Providers go to: In addition to the above you may email: Internal Contributor/submitter [email protected] Internal Eng Responsible Engineer [email protected] Responsible Manager: [email protected] Internal Services Knowledge Engineer [email protected] Internal Eng Business Unit Group NSG (Network Systems Group) Internal Sun Alert & FAB Admin Info 21-May-2008: Completed draft and sent to Extended Review. 23-May-2008: Incorporated feedback from Ext Rvw and sending to Publish. 17-Dec-2009: Replaced Product with Swordfish Nomenclature Attachments This solution has no attachment |
||||||||||||
|