Document fins/I0417-1
FIN #: I0417-1
SYNOPSIS: Powering off Alt. Control Brd. in a UE10000 May Result in Total
Platform Shutdown.
DATE: Aug/26/98
KEYWORDS: Powering off Alt. Control Brd. in a UE10000 May Result in Total
Platform Shutdown.
- Sun Proprietary/Confidential: Internal Use Only -
(For Authorized Distribution by SunService)
SYNOPSIS: Powering off Inactive Alternate Control Board in
a UE10000 May Result in Total Platform Shutdown.
PRODUCT_REFERENCE: Ultra Enterprise 10000 Secondary Control Board
PRODUCT CATEGORY: Server / System Board
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
Systems Affected
- E10000 All Ultra Enterprise 10000 Server -
X-Options Affected
X2720A - - E10000 Control Board with Ethernet Hub -
Part Number Description Model
----------- ----------- -----
- - -
BugId: 4149225, 4135766
MANUAL: 805-2917-14 Sun Enterprise 10000 System Service Manual
Ultra Enterprise 10000 systems with an (unused) alternate Control Board
and the Event Detector Daemon (EDD) enabled may be susceptible to total
platform shutdown if the alternate control board is powered off.
Shown below is an example of a warning from the platform messages file
of an E10000 that encountered an alternate control board power down
with the EDD enabled:
procestemp: Warning: The Temperature has exceed 911 temp on control board 0
procestemp: Temperature data for board 0 : cbStarfire5VDCTemp.0 226.88 C,
cbStarfire5VDCPerTemp.0 226.88 C, cbStarfire5VDCFanTemp.0 226.88 C,
poweroff: Shutting down entire system...
procesvolt: Warning: Voltage readings have exceeded the thresholds on
control board 0
procesvolt: Voltage data for board 0 : cbStarfire3p3VDCHK.0 5.00 V,
cbStarfire5VDC.0 10.03 V, cbStarfire5VDCHK.0 10.03 V,
cbStarfire5VDCPer.0 10.03 V, cbStarfire5VDCFan.0 10.03 V,
The example warning message (above) is followed by a total platform shut
down because of false temperature detection which immediately trips the
The delay between the time that the Control Board was powered off and the
breakers tripped was about 2 minutes.
A minor problem on UE10000 Control Boards causes some to be latched in
reset when powered down (correct), while other control boards are not
latched (incorrect). Control Boards that are not latched in reset when
powered down, actually return valid JBC CIDs, which the ssp monitoring
software (via EDD) currently uses to validate temperatures measured onboard.
However, since main power is off for that control board, the temperatures
are actually invalid, but the ssp software believes them to be valid because
the JBC CID can be read. In this scenario, the bogus temperatures exceed
the "911 temp" values and the platform is systematically shutdown.
| | MANDATORY (Fully Pro-Active)
| | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
| X | REACTIVE (As Required)
Enterprise Services field personnel can avoid the above stated problem by
following either the workaround or the recommendation shown below:
1. Prior to powering down the alternate control board of an E10000 it
will be necessary to disable the Event Detector Daemon (EDD)
As user 'ssp' on the SSP, execute the following command:
ssp% edd_cmd -x stop
a. Once the alternate control board has been removed or replaced, it
will then be necessary to turn EDD back ON (enable).
As user 'ssp' on the SSP, execute the following command:
ssp% edd_cmd -x start
*Failure to do so may cause the entire platform to be shut down because
of the false temperatures that are read on the powered-down control board.
2. Install Patch ID#
105683 or higher, which fixes the problem by qualifying
temperatures on a powered down control board by further examining the
power ring voltages.
*Be sure to follow the 'Special Install Instructions' of the patch README.
NOTE: For complete instructions on Control Board Replacement reference:
The Sun Enterprise 10000 System Service Manual p# 805-2917-14
Chapter 2: Component Replacement Procedures,
Section 2.9 Control Board Replacement,
Subsection 2.9.3 Powering Off a Control Board
page 2-26
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to contact
all affected customers to recommend implementation of the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical sup-
port teams will recommend implementation of the FIN (to their respective
accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the need
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
* Access the top level URL of http://cte.corp/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
Supporting documents for FIN/FCOs can be found on Edist. Edist can be
accessed internally at the following URL: http://edist.corp/.
* From there, follow the hyperlink path of "SunService Documentation"
click on "FIN & FCO attachments", then choose the appropriate
folder, FIN or
FCO. This will display supporting directories/files for FINs or FCOs.
Internet Access:
* Access the top level URL of https://infoserver.Sun.COM
Send questions or comments to [email protected]
Copyright (c) 1997-2003 Sun Microsystems, Inc.