![]() | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Problem Resolution Sure Solution 1386616.1 : Sun Storage 7000 Unified Storage System: Thermal Events and Ongoing Fan issues on 7410/7110 storage arrays
In this Document Created from <SR 3-5023454061>
Applies to:Sun Storage 7410 Unified Storage System - Version: Not Applicable to Not Applicable - Release: N/A to N/ASun Storage 7110 Unified Storage System - Version: Not Applicable to Not Applicable [Release: N/A to N/A] 7000 Appliance OS (Fishworks) SymptomsThermal shutdown events reported on Sun Storage 7410 Unified Storage and Sun Storage 7110 Unified Storage System arrays.Previous Array history of ongoing Fan issues being reported (usually via ASR) and Service Processor resets attempted to resolve the situation, but issues persist. In extreme situation the Sun Storage 7000 Unified Storage System experiences a Thermal Shutdown event. High temperature alerts logged in storage controller iLOM or ipmp logs via SupportBundle/BUI [logmgr: ID = 7401 : Wed Dec 7 07:02:21 2011 : IPMI : Log : critical : ID = 3a4 : 12/07/2011 : 07:02:21 : Temperature : sys.t_amb : Upper Critical going high : reading 44 > threshold 40 degrees C] Reported incidents of Sun Storage 7000 Unified Storage System controller automatically shutting down due to over temperature warning to prevent system overheating can be confirmed with collecting following data:
iLOM prompt: reset /SP/ ChangesPrevious SR's dealt with Fan issues and a reported Service Processor Memory leak situation that was resolved with an SP reset.Customer will need to confirm the current LED status of the FANS on this unit and confirm if the actual environmental temperature is indeed in an acceptable range within their Data Center. An iLOM snapshot would also be useful at this time. I have included a link to the iLOM Service Processor sunservice data collection script, collectDebugInfo.sh To find the SP-IP adress go to Web Gui: Maintenance --> Hardware-> select the Head-> go to "Show Details" -> go to "SP" Login with sunservice@SP_ip-adress (root password) CauseA number of known issues exist for the 7x10 array, relating to memory leaks on the Service Processor. Over time memory becomes depleted and the Service Processor becomes unresponsive and/or hangs.When present, the issues surfaces somewhere between 30 and 60 days of uptime if running Servcie Processor firmware below BIOS43, therefor this versions of BIOS request a SP reboot every 30 days. In reported cases of Thermal Shutdown Resetting the Service Processor directly from iLOM with # reset /SP/ fails to bring the SP back up and the array stays down. Proceeded to reset the system with # reset /SYS , then try #start /SP/console Note that # reset /SP only restarts the Service Processor and # reset /SYS only restarts the system part of the chassis. Typical ipmp log data from a SupportBundle: 389 | 12/06/2011 | 21:46:52 | Temperature #0x03 | Upper Critical going high SolutionInitial system boot issues were worked around after performing a #reset /SYS and Sun Storage 7410 Unified Storage System or Sun Storage 7110 Unified Storage System rebooted successfully, but within 10 minutes, the customer reported further temperature over heating issues followed by a shut down again.Ongoing thermal issues on Sun Storage 7410 Unified Storage System independent on clustered or not. If part of a clustered Pair, the other node within cluster may well be confirmed as being located below problem array within the same Rack but operating in an optimal state. All Fan and Power components reported as optimal via SupportBundles but temperature warnings received via the iLOM, lead to repeated system shut downs due to over temperature warnings of the affected system. Previous temperature events have only been recovered after performing a # reset /SYS then array boots ok and 10 minutes later shuts down with temperature warnings from the iLOM again. Field Engineer requires on-site to carry out the following: Firstly carry out a full environmental inspection of array in current location.
This is NOT the known memory leak situation as customers array has SP/BIOS fixes for this.. sp_version: '2.0.2.16', fw_version: '0ABMN080', os_version: 'ak/[email protected],1-1.21', We may have to consider possible fan board connector issues to fan boards if it turns out that the SP reset has NO effect and the Array is still reporting temperature and fan issues. 541-2211, FRU Fan Board, RoHS:Y Culprit 541-2213, FRU Connector Board Assembly, PATA DVDVictim The most likely resolution for this situation is provided when both parts, the fan board and connector board are replaced simultaneously. Fan modules remain unaffected. References<NOTE:1004226.1> - ILOM Service Processor sunservice commands for Sun Fire[TM] X4100 Server (applies to SFX4200/SFX4500/SFX4600 also)<NOTE:1267544.1> - Older versions of the Service Processor firmware on Sun Storage 7110, 7210, 7310 and 7410 can leak memory. <BUG:6925325> - ELWOOD (2U) CHASSIS FAN BOARD INTERMITTENTS CAUSING FANS TO "DISSAPPEAR"& TACHOMETER READING ERRORS Attachments This solution has no attachment |
||||||||||||
|