Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Problem Resolution Sure Solution 1003308.1 : Sun Fire[TM]12K/15K/E20K/E25K: esmd warning; A power failure has been detected on a redundant power supply at ...
PreviouslyPublishedAs 204588
Applies to:Sun Fire 12K ServerSun Fire 15K Server Sun Fire E20K Server Sun Fire E25K Server All Platforms SymptomsThe following error messages appear in the system controller platform messagesfile ($SMSLOGGER/platform/messages): Mar 10 11:33:57 2002 orlocn01-sc0 esmd[506]: [2000 171363006794824 ERRSysControl.cc 1371]The same failure can be reported on an Expander Board. esmd[...]: [2000 4153696025929986 ERR SysControl.cc 1581]or esmd[...]: [0 8353244896562214 ERR SysControl.cc 1579]In case this problem occurrs on a IO Board the message will be: esmd[....]: [2000 3006117574587 ERR SysControl.cc 2772] Changes{CHANGE}CauseExpander Boards, Centerplane Support Boards and IO Boards have 2 redundant power supplies, reported in the previous messages as ps0_power_good_l and ps1_power_good_l. Esmd tells you which one is the failing one and the action that needs to be taken.Although the system can survive the loss of one of these power supplies, providing enough power to run, the replacement of the component needs to be scheduled as soon as possible. SolutionIn the case of the power supply warning message, it prescribes the course of action you should take ASAP. In this case, "schedule replacement of CSB at CS1 as soon as possible." or "schedule replacement of EXB at EX7 as soon as possible".Additional notes : When such a failure occurs, the component is automatically added to the ASR Blacklist file. esmd[...]: [0 47269795990296515 NOTICE SysControl.cc 5296] This can be confirmed via the 'showcomponent -a' command. Keep in mind to remove the component from the ASR Blacklist file, via 'enablecomponent -a' command after the replacement. In the case of a defective CSB, the component can be proactively removed from the system via the 'setbus' command and then powered off to prevent any further fatal impact on the system. Refer to the manual for 'setbus' or to the Sun Fire[TM] 12K/15K/E20K/E25K Systems Service Manual to make sure to use this command properly. Important note The behavior is different if the failure is detected during the power on (or setkeyswitch) operation of the component. In this case, the failure of one of the power supply is considered as fatal and the power on aborts. poweron[...]: [6121 1491355281854766 ERR L2PowerControl.cc 325] And the replacement of the component (EX13 in this example) must be scheduled as soon as possible. Internal comments The following is strictly for the use of Sun employees: *Note : Actually, ps0_power_good_l and ps1_power_good_l are the output signals of the 1+1 redundant D116 DC-DC converters; they are set to HIGH when output fails to be able to deliver power or while DC outputs are out of spec. Reference: Troubleshooting <Document 1017705.1> Sun Fire[TM] 12K/15K: Expander Board Power Supply Failure May Cause System or IO Boards to Lose Power [email protected] Product Sun Fire 15K Server Sun Fire 12K Server Sun Fire E25K Server Sun Fire E20K Server Keywords poweron, ps1_power_good_l, ps0_power_good_l, power failure, exb, csb, io Previously Published As 47302 Change History Date: 2010-04-26 User Name: Volkmar Grote 117021 Action: Reviewed for Content Team Comment: minor formatting and typo corrections Attachments This solution has no attachment |
||||||||||||
|