Document Audience: | INTERNAL |
Document ID: | I0834-2 |
Title: | Domains on Sun Fire 12K/15K systems may suffer domain stops (DSTOP) with a signature of "CP arbiter lockstep consistency check error". Sun Alert: Yes |
Copyright Notice: | Copyright © 2005 Sun Microsystems, Inc. All Rights Reserved |
Update Date: | 2002-12-24 |
---------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by Enterprise Services)
FIN #: I0834-2
Synopsis: Domains on Sun Fire 12K/15K systems may suffer domain stops (DSTOP) with a signature of "CP arbiter lockstep consistency check error". Sun Alert: YesCreate Date: Dec/20/02
SunAlert: Yes
Top FIN/FCO Report: Yes
Products Reference: Sun Fire 15K/12K
Product Category: Server / SW Admin
Product Affected:
Systems Affected
----------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- F12K ALL Sun Fire 12K -
- F15K ALL Sun Fire 15K -
X-Options Affected
------------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- - - - -
Parts Affected:
Part Number Description Model
----------- ----------- -----
- - -
References:
BugId: 4671526 - ibPower needs to clear board test status when boards
are reset .
4671531 - libKeyswitch needs to deconfigure L1 boards before
the expander.
4724771 - LibPower should send events sychronously.
4712287 - EXB asic LBIST needs to be skipped when CP is in use.
4699827 - Deconfigure L1 boards should reset Darb ports if
necessary.
FIN: I0834-1
PatchId: 112481-06 (or higher): SMS 1.2: fomd, hwad, esmd, pcd patch.
112488-06 (or higher): SMS 1.2: hpost, redx, libxcpost Patch.
ESC: 536181 - F15K domain was dstoped on SMS1.2.
536183 - SF15K ???#1 dstop??????.
536638 - F15K SB1 will not power up after adding SB0 to domain
sapr0115.
537720 - Dstop still occurs after applying 112481-02/112827-01;
(lockstep consistency err).
537670 - When hpost in domain C; Domain D dstop.
537851 - SF15k: at least 4 of 7 domains crashed, one remained up
and running.
Sun Alert: 44627
URL: http://sunsolve.Central/cgi/retrieve.pl?doc=intsrdb%2F48223
http://pts-americas.west/esg/hsg/starcat/ tools/cp-ports-dl.html
Issue Description:
CHANGE HISTORY:
---------------
I0834-2
DATE MODIFIED: Dec/20/2002
UPDATE: REFERENCES, PROBLEM DESCRIPTION, CORRECTIVE ACTION
. PROBLEM DESCRIPTION: has been modified to reflect the revision
change on the patchId 112481 from -02 to -06
and replaced another with patchId 112827-01 to
112488-06.
. CORRECTIVE ACTION: has been modified to reflect the revision
change on the patchId 112481 from -02 to -06
and replaced another with patchId 112827-01 to
112488-06.
. REFERENCES: PatchID 112481 revision change from -02 to -06, and
replace patchId 112827-01 with 112488-06.
------------------------------
One or more domains in a multiple domain F12K or F15K system may suffer
a service interruption due to a Domain Stop (DSTOP) or a failure during
POST. This issue can be recognized by the error message "CP arbiter
lockstep consistency check error".
This issue occurs on Sun Fire 12K/15K systems running SMS 1.1 software,
or on systems running SMS 1.2 software without patches 112481-06 (or
higher) and 112488-06 (or higher).
In cases observed thus far, failing configurations have multiple
domains. At the time of failure, one or more domains were executing a
'setkeyswitch standby' or 'setkeyswitch on' process, or an Expander
Board was being installed into the platform.
The failure signature can be seen by using redx to examine the
DStop/hardware dump state file. The failure signature is similar to
the following:
SDI EX02/S0 Master_Stop_Status0[31:0] = 3000000F
MStop0[3:0]: All SDI logic is DStopped + Recordstopped.
SDI EX02/S0 Dstop0[31:0] = 00428040
Dstop0[17]: D DARB texp requests Slot0 Dstop (M)
Dstop0[22]: D 1E SDI internal CP port requested Dstop
SDI EX02/S0 CP_Error0[31:0] = 2000A000 Mask = 580067FF
CPErr0[29]: D 1E CP arbiter lockstep consistency check error (M)
cp0_{dembusp,texp,unload,demand[1:0]} = 00
cp1_{dembusp,texp,unload,demand[1:0]} = 14
FAIL EXB EX2: Dstop/Rstop detected by SDI EX2/S0.
Primary service FRU is EXB EX2.
The failure above could either be a Dstop of a running domain or an
xcstate dump during a POST run.
There are two different contributors to the failures:
o Insufficient cleanup of DARB ports when an Expander is no longer
active in a domain. Later, when boards are again tested, the error
state is seen and causes the DARB to report them in a fashion that
results in a loss of lockstep.
o The LBIST tests on the SDI (executed by POST) leave the logic
structures in a state where after LBIST completes, the SDI can
drive signals to the CP ASICs such that the DARBs detect a
"Tslot parity error from SDI" or "GDTransID parity error from
SDI", leading to a "CP arbiter lockstep" DStop on another domain.
This issue is addressed by SMS 1.2 patches 112481-06 (or higher) and
112488-06 (or higher). No patches are planned for SMS 1.1. Customers
are advised to upgrade to SMS 1.2 and to apply the patches.
Implementation:
---
| | MANDATORY (Fully Proactive)
---
---
| X | CONTROLLED PROACTIVE (per Sun Geo Plan)
---
---
| | REACTIVE (As Required)
---
Corrective Action:
The following recommendation is provided as a guideline for authorized
Sun Services Field Representatives who may encounter the above
mentioned problem.
1) Apply SMS 1.2 patches 112481-06 (or higher) and 112488-06 (or higher)
to both System Controllers. Refer to the patch READMEs for special
installation instructions.
2) For additional information, refer to Knowledge Base Article 88000 on
SunSolve for detailed steps to identify the specific cause of one of
these failures. The 'cp-ports' utility may also be used for this
purpose, and can be downloaded from:
http://pts-americas.west/esg/hsg/starcat/tools/cp-ports-dl.html
Comments:
None
============================================================================
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to [email protected]
--------------------------------------------------------------------------