Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Problem Resolution Sure Solution 1003552.1 : Sun Fire[TM] 12K/15K/E20K/E25K: SC POST results: 'Power On Selftest not run on last reset'
PreviouslyPublishedAs 204998
Applies to:Sun Fire 12K ServerSun Fire 15K Server Sun Fire E20K Server Sun Fire E25K Server All Platforms Symptoms{SYMPTOM}The following message is reported in the $SMSVAR/adm/platform/messages (/var/opt/SUNWSMS/adm/platform/messages) file on a Sun Fire[TM] 12K/15K/E20K/E25K System Controller(SC): Aug 04 16:24:46 2004 SC1 ssd[381]: [0 66349511944 NOTICE SSDWorkArea.cc 38] SC POST results: 'Power On Selftest not run on last reset' Changes{CHANGE}Cause{CAUSE}This message means exactly as it states. The System Controller did not run Power On Selftest, POST, upon its last reset or reboot. This means that the SC has not executed basic hardware testing of its own components during a reboot or a reset action. SolutionResolution A reboot or reset could be the result of someone manually rebooting an SC for whatever reasons one reboots a machine, or it could be a more serious issue where a SC panic rebooted or was forced down because of a problem. In any event, if an SC reboots or is reset and comes back up without running basic hardware diagnostics, a possible bad component will not be detected and the SC can become the MAIN SC again. This may result in the platform being monitored and controlled by a possibly defective SC. The SCs must run the basic hardware diagnostics in SSCPOST so that any detected errors on the SC's components are reported. Then SMS can report those errors to the $SMSVAR/adm/platform/messages file as it starts up in /etc/rc3. d/S99sms as well as report the errors to the remote SC. SMS can then take action against the SC startup as needed. This may include preventing SMS from starting up on the SC with problems in sscpost. So, if the system does not run hardware tests on an SC when it reboots or resets it bypasses the checks built into SMS that may keep a suspect SC from managing the platform. Relief/Workaround System Controllers will execute extended POST upon reboot or reset when the following OBP variables are set as such: diag-level=pmax-epvmax diag-switch?=true post-on-sir?=true NOTE: SC1 may have diag-level=pmax-epvmax, while SC0 is set to pmax-epmax. The difference in this setting is that epvmax is extended diagnostics and epmax is normal diagnostics. They are set differently so that when both SCs are powered on and run POST at the same time, SC0 will complete the normal diagnostics before SC1, ultimately meaning that SC0 will become MAIN SC in SMS. It's a race to become MAIN and SC0 is given a head start. To enable SSCPOST from the OBP prompt and then execute it: ok setenv diag-level pmax-epvmax ok setenv diag-switch? true ok setenv post-on-sir? true ok reset To enable SSCPOST from multi-user and then execute it (make sure SC failover is disabled before rebooting the MAIN SC, otherwise the reboot will cause SMS to failover to the SPARE): # eeprom diag-level=pmax-epvmax # eeprom diag-switch?=true # eeprom post-on-sir?=true # reboot Additional Information When SSCPOST is not executed against an SC and it has rebooted or been reset, in addition to the message below you may also notice certain I2c Bus Address warnings in $SMSVAR/adm/messages are occurring: SC POST results: 'Power On Selftest not run on last reset' For example: Aug 4 17:14:31 2004 SC1 hwad[438]: [1123 5036434911859 ERR I2cComm.cc 410] I2c read time out - bus: 23, address: 25 Aug 4 17:15:25 2004 SC1 hwad[438]: [1123 5090842384614 ERR I2cComm.cc 410] I2c read time out - bus: 23, address: 22 Bus 23 maps to System Controller 1. Address 25 and Address 22 are LED control registers. NOTE: Messages may be on Bus 22 if the SC you have just rebooted is SC0. It turns out that a side effect of not running sscpost on a SC upon a reboot or a reset is that the warning LED registers for the SC may start showing false Ambers, and the I2c messages may exist, and be quite numerous. After enabling sscpost, and rebooting the SC (which runs sscpost), these warnings messages and false Ambers go away. Product Sun Fire 15K Server Sun Fire 12K Server Sun Fire E25K Server Sun Fire E20K Server Internal Comments The following is strictly for the use of Sun employees: References: This subject was written based on Radiance case ID 37097773 which exhibited this behavior and had the Additional Information section's additional behavior. Bug ID 4621045 details that it is sscpost which is responsible for resetting the LED registers on the SC. If sscpost isn't executed the LED registers aren't reset, and could result in false LED warnings (Amber) or even I2c warnings. Technical Solution Problem Resolution starcat, 12k, 15k, 20k, 25k SC, POST, System Controller, sscpost, SSCPOST, I2c, SMS Previously Published As 75093 Updated by the ESG Knowledge Content Team 4/2010 Attachments This solution has no attachment |
||||||||||||
|