Asset ID: |
1-71-1002259.1 |
Update Date: | 2010-10-13 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1002259.1
:
Sun Enterprise[TM] xx00 (Sun Fire[TM] Classic): WARNING: AC Power failure detected
Related Items |
- Sun Enterprise 3000 Server
- Sun Enterprise 4000 Server
|
Related Categories |
- GCS>Sun Microsystems>Servers>Midrange Servers
|
PreviouslyPublishedAs
203183
Applies to:
Sun Enterprise 3000 Server
Sun Enterprise 4000 Server
All Platforms
Goal
Understanding the Expected Behavior of an AC power Failure for
Sun Enterprise[TM] xx00 (Sun Fire[TM] Classic)
Solution
DescriptionThe purpose of this document is to describe expected behavior for a Sun Enterprise[TM] xx00 (Sun Fire[TM] Classic)
server when it encounters an AC power failure. It details several of the causes to the failure and how to distinguish
which cause is in fact the most likely root cause to the outage itself. The error message that may commonly appear in
such an event as this is:
sysctrl: [ID 712134 kern.warning] WARNING: AC Power failure detected
Unfortunately, this message alone doesn't tell us much. Basically, it tells you what you already know, that a power
failure occurred. But, why or how it occurred is the mystery that needs solving. This document can be used to provide
hints into the common causes of a power failure for this platform. The common causes we investigated were:
1. Power supply failure Simulated by physically removing the unit.
2. Removing the power cord Simulates loss of power feed and also an accidental or purposeful cord removal.
3. Flipping off the power rocker switch Simulates power feed loss and purposeful or accidental switch flip.
4. Changing the keyswitch position to OFF Demonstrates non-power interruption behavior.
At times, it is suspected that power was either accidentally or purposely removed from the server
(pulling a power cord, tripping over the cord, flipping off the switch, etc), but it is hard to prove this without
someone admitting it or seeing it happen.
As this document will show, there is a way to almost prove the cause to be a manual one, depending on the behavior
that a certain outage will show.
NOTE: All observations were made in two different Sun internal labs and observed by two separate teams of Sun personnel,
using two different servers. The separate teams results were identical, so it is an assumption that the servers of this
class will all behave in the same manner. But, we can not know with 100% certainty that in EVERY case, this is true.
This document only applies to situations where a single server has encountered a power failure.
If multiple servers in the environment have had a loss of power, the cause is most likely the power feed to the environment itself.
This document describes situations where a single server mysteriously loses power while others nearby remain unaffected.
Steps to Follow
Removing a Peripheral Power Supply
The console reported the loss of the PPS unit and ultimately it's re-insertion.
NOTE: /var/adm/messages logged the same information as well
# Dec 30 12:15:31 v4u-4500b sysctrl: NOTICE: Core Power Supply 1 Removed
Dec 30 12:15:31 v4u-4500b sysctrl: WARNING: Redundant power lost
Dec 30 12:16:05 v4u-4500b sysctrl: NOTICE: Core Power Supply 1 Installed
Dec 30 12:16:09 v4u-4500b sysctrl: NOTICE: Core Power Supply 1 OK
# Dec 30 12:16:09 v4u-4500b sysctrl: NOTICE: Redundant power available
Dec 30 12:16:32 v4u-4500b sysctrl: NOTICE: Core Power Supply 3 Removed
Dec 30 12:16:32 v4u-4500b sysctrl: WARNING: Redundant power lost
Dec 30 12:16:39 v4u-4500b sysctrl: NOTICE: Core Power Supply 3 Installed
Dec 30 12:16:43 v4u-4500b sysctrl: NOTICE: Core Power Supply 3 OK
Dec 30 12:16:43 v4u-4500b sysctrl: NOTICE: Redundant power available
#
At no point did the system go down, reboot, etc.
Assuming enough redundant power is being supplied the system would remain in operation in such a situation.
These are messages that could be expected (or similar) if a Power Supply failure were to have happened.
In addition, prtdiag following a reboot should show a PPS unit failure and messaging during bootup would
indicate problems with the unit.
An example of a failed Power Supply unit from prtdiag may be:
Detected System Faults
======================
Key Switch Fan failure
Detected Wed May 26 08:44:45 2004
AC Box Fan failure
Detected Wed May 26 08:44:45 2004
AC Power failure
Detected Wed May 26 08:44:45 2004
System 5.0 Volt Precharge failure
Detected Wed May 26 08:44:45 2004
System 3.3 Volt Precharge failure
Detected Wed May 26 08:44:45 2004
Peripheral 12 Volt Precharge failure
Detected Wed May 26 08:44:45 2004
Peripheral 12 Volt Power failure
Detected Wed May 26 08:44:45 2004
Unit 0 Peripheral Power Supply failure
Detected Wed May 26 08:44:45 2004
PROM detected failure
Detected Wed May 26 08:44:45 2004
Removing a power cord
As soon as the power cord was pulled from the system, simulating tripping on the cord,
or simply removing it, the console reported:
# Dec 30 12:17:52 v4u-4500b sysctrl: WARN}Hardware Power ON
# Hardware Power ON
The server immediately rebooted, ran POST (as shown above), and booted back up (assumes auto-boot=true).
Once back into Solaris, the /var/adm/messages file had only one message reflecting the incident :
Dec 30 12:17:52 v4u-4500b sysctrl: [ID 712134 kern.warning] WARNING: AC Power failure detected
So, when power is instantly disrupted to the server, in the case of tripping over or removing the power cord,
the only symptoms should be the single warning message and the system reboots.
Flipping off the power rocker switch
Flipping off the rocker switch behaved exactly identical to removing the power cord.
The result was instant domain reboot and a single message to both the console and /var/adm/messages file.
Console reported:
# Dec 30 12:31:36 v4u-4500b sysctrl: WARN Hardware Power ON
# Hardware Power ON
The domain reboots and /var/adm/messages shows:
Dec 30 12:31:36 v4u-4500b sysctrl: [ID 712134 kern.warning] WARNING: AC Power failure detected
Changing the keyswitch position to OFF
As might be expected, changing the keyswitch does not log any messages concerning a power failure. After all this is
a standard process on this platform, unrelated to power distribution. The console reported the following immediately
after changing the keyswitch position to off:
# Hardware Power ON
The domain ran through POST (on keyswitch ON), and the /var/adm/messages only logged the following message
(nothing in regards to power):
Dec 30 12:48:24 v4u-4500b sysctrl: [ID 273467 kern.info] sysctrl0: Key switch is not in the secure position
Summary
A power supply unit failure should leave evidence of the failure beyond a single message indicating
AC power failure detected. A bad supply should remain bad following a reboot, and ultimately remain
bad until it is replaced. Assuming a system has enough redundant power left without this defective PS unit,
it will continue operating without crashing.
A normal keyswitch operation does not list anything with regards to power in messages.
It merely says that the key switch status has changed.
If the only message we see reported to the console or /var/adm/messages file is,
WARNING: AC Power failure detected it is quite likely that the power feed was instantly disrupted.
If all systems attached to the same power feed are instantly disrupted, root cause is the power source.
If only one machine was effected on a specific power feed, the instant disruption is most likely the result
of accidental or purposeful disruption of power.
ProductSun Enterprise 4000 Server
Sun Enterprise 3000 Server
Internal Comments
Escalation 1-6082530, Radiance case ID 64404887 were the source of this document.
Attachments
This solution has no attachment