Document Audience: | INTERNAL |
Document ID: | I0958-1 |
Title: | Replacement of 900MHz System Boards by 1200MHz boards in Sun Fire 12K/15K platforms may fail if the proper installation procedure is not followed. |
Copyright Notice: | Copyright © 2005 Sun Microsystems, Inc. All Rights Reserved |
Update Date: | 2003-05-02 |
---------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
FIN #: I0958-1
Synopsis: Replacement of 900MHz System Boards by 1200MHz boards in Sun Fire 12K/15K platforms may fail if the proper installation procedure is not followed.Create Date: May/02/03
SunAlert: No
Top FIN/FCO Report: No
Products Reference: Sun Fire 12K/15K
Product Category: Server / Service
Product Affected:
Systems Affected:
-----------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- F12K ALL Sun Fire 12000 -
- F15K ALL Sun Fire 15000 -
X-Options Affected:
-------------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
X4006A - - ASSY CPU 2PROC USIIIP 900+MHZ -
X4007A - - ASSY CPU 4PROC USIIIP 900+MHZ -
Parts Affected:
Part Number Description Model
----------- ----------- -----
540-5051-05 ASSY CPU 2PROC USIIIP 900+MHZ -
540-5052-06 ASSY CPU 4PROC USIIIP 900+MHZ -
References:
N/A
Issue Description:
If the physical replacement of a 900MHz System Board by a 1200MHz
System Board in a Sun Fire 12K/15K system is performed too quickly, the
System Monitoring Software (SMS) 'esmd' daemon will not be able to
properly acknowledge the change. This will result in the 'esmd' daemon
failing the 1200MHz board. This failure could be interpreted as a
hardware fault, resulting in unnecessary replacement of the 1200MHz
System Board.
This issue can occur with any Sun Fire 12K/15K system where a 900MHz
System Board is being upgraded to a 1200MHz System Board.
Upon removal of an existing 900MHz System Board, 'esmd' will
acknowledge and log the event in /var/adm/platform/messages. Upon
insertion of the 1200MHz System Board, 'esmd' will acknowledge and log
the event. The timeframe required by 'esmd' to acknowledge each
individual event is thirty (30) seconds.
For example:
. 900MHz System Board is removed, and the event is logged:
esmd[7167]: [0 4824421445907014 NOTICE Boards.cc 1646] CPU at
SB16 removed
. 1200MHz System Board is inserted, and the event is logged:
esmd[7167]: [0 4824886762342552 NOTICE Cabinet.cc 860] CPU at
SB16 inserted
If the 900MHz System Board is removed, and the 1200MHz board is
inserted in less then 30 seconds, the two events will not be
acknowleged, nor logged, by 'esmd'. After 'poweron' of the new System
Board, 'esmd' will report the following errors:
esmd[23597]: [1919 4873257798654128 ERR DetectorV.cc 448] A low
voltage or power supply has been detected on Core0, located on CPU
at SB2. The voltage detected is 1.36v; should be 1.53v to 1.70v.
PROCPAIR at SB2/PP0 is being removed from the domain and powered
off. Check all hardware for the cause.
esmd[23597]: [1919 4873258029915842 ERR DetectorV.cc 448] A low
voltage or power supply has been detected on Core1, located on CPU
at SB2. The voltage detected is 1.37v; should be 1.53v to 1.70v.
PROCPAIR at SB2/PP0 is being removed from the domain and powered
off. Check all hardware for the cause.
esmd[23597]: [1919 4873258399829778 ERR DetectorV.cc 448] A low
voltage or power supply has been detected on Core2, located on CPU
at SB2. The voltage detected is 1.37v; should be 1.53v to 1.70v.
PROCPAIR at SB2/PP1 is being removed from the domain and powered
off. Check all hardware for the cause.
esmd[23597]: [1919 4873258489747583 ERR DetectorV.cc 448] A low
voltage or power supply has been detected on Core3, located on CPU
at SB2. The voltage detected is 1.36v; should be 1.53v to 1.70v.
PROCPAIR at SB2/PP1 is being removed from the domain and powered
off. Check all hardware for the cause.
esmd[23597]: [0 4873258534792163 NOTICE SysControl.cc 3358]
Component PROCPAIR at SB2/PP0 has been blacklisted
esmd[23597]: [0 4873258549903380 NOTICE SysControl.cc 3358]
Component PROCPAIR at SB2/PP1 has been blacklisted
esmd[23597]: [1930 4873258557519333 NOTICE SysControl.cc 4162]
PROCPAIR at SB2/PP0 has been powered off: ecode=0
POST will confirm that the components have been blacklisted by ASR:
-------------------------------------------------------------------
Reading system ASR blacklist file
/etc/opt/SUNWSMS/config/asr/blacklist ...
portpair 2.0.0 # ESMD Low-Minumum Voltage 0321.0216.42
portpair 2.0.1 # ESMD Low-Minumum Voltage 0321.0216.42
slot 2.0 # ESMD Sensor Read Failure 0321.0230.57
-------------------------------------------------------------------
The Environmental Status Monitoring Daemon (esmd) maintains a look-up
table that is populated with the Vcore values of the resident System
Boards. These values are:
----------------------------------------------------
| CH | US III 750MHz, 900MHz | 1.7100 |
|------+-----------------------------------+---------|
| CH+ | US III Cu 900MHz 1050Mhz | 1.6150 |
|------+-----------------------------------+---------|
| CH++ | US III Cu 1050Mhz 1200MHz | 1.3775 |
----------------------------------------------------
When a 900MHz System Board is removed without 'esmd' acknowledgment,
followed by insertion of a 1200MHz System Board which is also not
logged, the Vcore value in the look-up table is retained at 1.6150.
This will result in the newly installed System Board failing with "The
voltage detected is 1.37v; should be 1.53v to 1.70v."
The failure scenario can be avoided by allowing 'esmd' 30 seconds to
acknowledge each event, the System Board removal, and the new System
Board insertion. These events can be verified in
/var/adm/platform/messages:
esmd[7167]: [0 4824421445907014 NOTICE Boards.cc 1646] CPU at
SB16 removed
esmd[7167]: [0 4824421445907014 NOTICE Boards.cc 1646] CPU at
SB16 inserted
Implementation:
---
| | MANDATORY (Fully Proactive)
---
---
| | CONTROLLED PROACTIVE (per Sun Geo Plan)
---
---
| X | REACTIVE (As Required)
---
Corrective Action:
The following recommendation is provided as a guideline for authorized
Sun Services Field Representatives who may encounter the above
mentioned problem.
Use the following example procedure to correctly replace a 900MHz
System Board with a 1200MHz System Board:
1. >From the SMS command line interface enter the command:
poweroff sb16
2. SB16 is now powered off and ready for removal.
3. Physically remove the System Board from the platform. 'esmd'
will acknowledge and log this event in /var/adm/platform/messages.
It takes 'esmd' 30 seconds to recognize and log this event.
The line below will be logged:
esmd[7167]: [0 4824421445907014 NOTICE Boards.cc 1646] CPU at
SB16 removed
4. Physically install the 1200MHz System Board. 'esmd' will acknowledge
and log this event in /var/adm/platform/messages. It takes 'esmd'
30 seconds to recognize and log this event. The entry below will be
logged:
esmd[7167]: [0 4824886762342552 NOTICE Cabinet.cc 860] CPU at
SB16 inserted
5. Board replacement is complete.
If the failure scenario described above does occur, do not assume the
newly inserted 1200MHz System Board is faulty. Instead, remove the
System Board, verify that esmd has logged the removal event, then
insert the System Board and verify that esmd has logged the insertion
event. Finally, utilize the 'enablecomponent' command to remove the
components from the ASR blacklist.
Comments:
None.
============================================================================
Implementation Footnote:
i) In case of MANDATORY FINs, Sun Services will attempt to contact
all affected customers to recommend implementation of the FIN.
ii) For CONTROLLED PROACTIVE FINs, Sun Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Sun Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.central/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Central/
* From there, select the appropriate link to browse the FIN or FCO index.
Internet Access:
----------------
* Access the top level URL of https://spe.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to [email protected]
--------------------------------------------------------------------------