Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type FAB (standard) Sure Solution 1239993.1 : SPARC Enterprise T5440 with at least one memory module having 12 DIMMs may experience intermittent silent HOST Hardware hangs at the OS level.
In this Document
Oracle Confidential (PARTNER). Do not distribute to customers
Applies to:Sun SPARC Enterprise T5440 Server - Version: Not Applicable to Not Applicable - Release: NA to NAInformation in this document applies to any platform. __________ SUNBUG 6856773 SUNBUG 6844624 SUNBUG 6925700 __________ Affected Parts: 541-2551-04 - Memory Module, T5440 541-3791-02 - Memory Module, T5440, 800MHz 541-3908-03 - Service Processor Assembly (SP+) 541-2751-09 - Service Processor Assembly (SP) SymptomsHost not responding and all applications stop.- Ping Host (no response) - Try logging into host (no response) Service Processor (SP) operations are not affected. - Ping Service Processor will succeed - Service Processor login will succeed There will be no indication of this type failure in the output of Service Processor commands such as; sc>showpower sc>showenvironment sc>showfmerptlog sc> showlogs sc>showfaults. A HOST power cycle is required to get the system back to normal operation. Impact Intermittent silent HOST Hardware hangs at the OS level. These silent hangs are random and will cause the HOST to stop functioning (the Service Processor (SP) will not be affected). There is no data that can be captured from any ilom/host log files to indicate a hardware or software problem. ChangesContributing FactorsOnly SPARC Enterprise T5440 systems with at least one memory module having 12 DIMMs are impacted by this issue. Below are the two known configs to date that have experienced this failure mode. - SEVPBJF1Z (602-4158-0x): 2 x 1.2GHz, 8 core, 64GB (16 x 2GB) 667 MHz FB-DIMMs, 2 x 146GB SAS 2.5" HDD, 4 x 1120W PSU, Slim SATA DVD RW, 2 Memory Expansion Boards plus 16x SESY2C1Z to get to a fully loaded 256GB. - SEVPGSF1U (602-4183-0x): 4 x 1.2GHz, 8 Core, 128GB (32 x 4GB) 667 MHz FB-DIMMS, 2 x 146GB SAS 2.5" HDD, 4 x 1120W PSU, Slim SATA DVD RW, 4 Memory Expansion Boards plus 16x SESY2C1Z to get to a fully loaded 256GB. CauseRoot CauseA small number of memory modules as listed in the Affected Parts section above have been seen to generate an intermittent memory module (Power Ok) POK fault, which could lead to the HOST system silently hanging. The root cause of the issue has been isolated to the DC-DC Converters (DC208) on the memory module intermittently reporting false POK glitches, which can in turn lead to the system being reset. Engineering has improved the reporting and resilience in dealing with the false POK glitches by making changes to the FPGA code (4.1.7.4) and SysFW (7.2.9.a) firmware that resides on the systems SP module. New revisions of the two SP modules F541-3908-04 (SP+) and F541-2751-10 (SP) containing the new firmware are now available, although the SysFW for module F541-2751-10 will need to be upgraded manually once this SP is installed. Engineering has also updated the memory module DC-DC converters. New versions of the memory module are F541-2551-05 and F541-3791-03. SolutionWorkaroundNo workaround available - see Resolution section below. Resolution If the HOST is experiencing a hang as described, then either Service Processor (part number F541-3908-04 (SP+) or part number F541-2751-10 (SP)) modules are required to perform fault isolation. The latter Service Processor (part number F541-2751-10) requires manual update to FW 7.2.9.a. With F541-3908-04 (SP+) or F541-2751-10 (SP) with FW 7.2.9.a installed, should a system hang continue to be observed, then in addition to normal troubleshooting you should pay close attention to the output of the 'showlogs' command on the SP, ie; sc>showlogs Mar 22 15:35:25: Chassis |major : "Host has been powered on" Mar 22 15:38:59: Chassis |major : "Host is running" Mar 23 15:13:32: Chassis |minor : "POK Glitch: /SYS/MB/MEM3" If the system hang occurred at the same time as the above messages were reported then memory module #3 should be replaced. In this particular example the memory module should be replaced with either F541-2551-05 or F541-3791-03 modules - which ever is required for the specific customer configuration. If you have any questions regarding implementing of this FAB for a customer under an existing Service Request, please open a CollaborationTask for GL-VSP. Comments This issue was fully evaluated and determined not to meet FCO criteria due to the extremely low failure rate that has been experienced to date. References BugID:6856773, 6844624, 6925700 Escalation ID: IBIS SR 71048600 Resolution Patches: SysFw 7.2.9.a 139446-11 Reference Manual: SPARC Enterprise T5440 Server Service Manual 820-3801-11 ECO:42858, 42844, 42934 GSAP: 5252, 5253, 5268, 5282 Related URL(s): https://support.us.oracle.com/handbook_internal/Devices/Memory/MEM_SE_T5440_Memory_Module.html https://support.us.oracle.com/handbook_internal/Systems/SE_T5440/components.html#SystemServiceProcessor For information about FAB documents, its release processes, implementation strategies and billing information, go to the following URL: * http://tns.central/fab In addition to the above you may email: * [email protected] @Contacts @Contributor: [email protected], [email protected], [email protected], [email protected], [email protected] Responsible Engineer: [email protected] @ Responsible Manager: [email protected] Business Unit Group: Systems Group-SVS (SPARC Volume Systems, Horizontal Systems,(includes T2000/Ontario) Attachments This solution has no attachment |
||||||||||||
|