Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1003552.1
Update Date:2011-11-21
Keywords:

Solution Type  Problem Resolution Sure

Solution  1003552.1 :   Sun Fire[TM] 12K/15K/E20K/E25K: SC POST results: 'Power On Selftest not run on last reset'  


Related Items
  • Sun Fire E25K Server
  •  
  • Sun Fire E20K Server
  •  
  • Sun Fire 12K Server
  •  
  • Sun Fire 15K Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: SF-Exxk
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>High-End Servers
  •  

PreviouslyPublishedAs
204998


Applies to:

Sun Fire E20K Server
Sun Fire E25K Server
Sun Fire 12K Server
Sun Fire 15K Server
All Platforms

Symptoms

The following message is reported in the $SMSVAR/adm/platform/messages (/var/opt/SUNWSMS/adm/platform/messages) file on a Sun Fire[TM] 12K/15K/E20K/E25K System Controller(SC):
Aug 04 16:24:46 2004 SC1 ssd[381]: [0 66349511944 NOTICE SSDWorkArea.cc 38] SC POST results:  'Power On Selftest not run on last reset'


Cause

This message means exactly as it states. The System Controller did not run Power On Selftest, POST, upon its last reset or reboot. This means that the SC has not executed basic hardware testing of its own components during a reboot or a reset action.

Solution

A reboot or reset could be the result of someone manually rebooting an SC for whatever reasons one reboots a machine, or it could be a more serious issue where a SC panic rebooted or was forced down because of a problem.

In any event, if an SC reboots or is reset and comes back up without running basic hardware diagnostics, a possible bad component will not be detected and the SC can become the MAIN SC again. This may result in the platform being monitored and controlled by a possibly defective SC.

The SCs must run the basic hardware diagnostics in SSCPOST so that any detected errors on the SC's components are reported. Then SMS can report those errors to the $SMSVAR/adm/platform/messages file as it starts up in /etc/rc3.d/S99sms as well as report the errors to the remote SC. SMS can then take action against the SC startup as needed. This may include preventing SMS from starting up on the SC with problems in sscpost.

So, if the system does not run hardware tests on an SC when it reboots or resets it bypasses the checks built into SMS that may keep a suspect SC from managing the platform.


Relief/Workaround
System Controllers will execute extended POST upon reboot or reset when the following OBP variables are set as such:
diag-level=pmax-epvmax
diag-switch?=true
post-on-sir?=true


NOTE: SC1 may have diag-level=pmax-epvmax, while SC0 is set to pmax-epmax.

The difference in this setting is that epvmax is extended diagnostics and epmax is normal diagnostics. They are set differently so that when both SCs are powered on and run POST at the same time, SC0 will complete the normal diagnostics before SC1, ultimately meaning that SC0 will become MAIN SC in SMS. It's a race to become MAIN and SC0 is given a head start.

To enable SSCPOST from the OBP prompt and then execute it:
ok setenv diag-level pmax-epvmax
ok setenv diag-switch? true
ok setenv post-on-sir? true
ok reset


To enable SSCPOST from multi-user and then execute it (make sure SC failover is disabled before rebooting the MAIN SC, otherwise the reboot will cause SMS to failover to the SPARE):
# eeprom diag-level=pmax-epvmax
# eeprom diag-switch?=true
# eeprom post-on-sir?=true
# reboot



Additional Information

When SSCPOST is not executed against an SC and it has rebooted or been reset, in addition to the message below you may also notice certain I2c Bus Address warnings in $SMSVAR/adm/messages are occurring:

SC POST results: 'Power On Selftest not run on last reset'

For example:
Aug  4 17:14:31 2004 SC1 hwad[438]: [1123 5036434911859 ERR I2cComm.cc 410] I2c read time out -  bus: 23, address: 25
Aug 4 17:15:25 2004 SC1 hwad[438]: [1123 5090842384614 ERR I2cComm.cc 410] I2c read time out - bus: 23, address: 22


Bus 23 maps to System Controller 1.
Address 25 and Address 22 are LED control registers.

NOTE: Messages may be on Bus 22 if the SC you have just rebooted is SC0.

It turns out that a side effect of not running sscpost on a SC upon a reboot or a reset is that the warning LED registers for the SC may start showing false Ambers, and the I2c messages may exist, and be quite numerous.

After enabling sscpost, and rebooting the SC (which runs sscpost), these warnings messages and false Ambers go away.


Product
Sun Fire 15K Server
Sun Fire 12K Server
Sun Fire E25K Server
Sun Fire E20K Server


Internal Section

References:
  • This subject was written based on Radiance case ID 37097773 which exhibited this behavior and had the Additional Information section's additional behavior.
  • Bug ID 4621045 details that it is sscpost which is responsible for resetting the LED registers on the SC.  If sscpost isn't executed the LED registers aren't reset, and could result in false LED warnings (Amber) or even I2c warnings.
  • Technical Instruction Document 1006092.1: "Sun Enterprise[TM] 12K/15K: EIS standard EEPROM settings"
  • Problem Resolution Document 1008879.1: "Sun Fire[TM] 15K: Running diagnostics on System Controller"
Keywords: starcat, 12k, 15k, 20k, 25k SC, POST, System Controller, sscpost, SSCPOST, I2c, SMS

Previously Published As 75093



Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback