Document Audience: | INTERNAL |
Document ID: | A0210-1 |
Title: | 400MHz and 464/466MHz UltraSPARC II Modules in Enterprise Server platforms may experience system reboots or system panics. |
Copyright Notice: | Copyright © 2007 Sun Microsystems, Inc. All Rights Reserved |
Update Date: | Mon Sep 08 00:00:00 MDT 2003 |
----------------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
----------------------------------------------------------------------------
FIELD CHANGE ORDER
(For Authorized Distribution by Sun Services)
FCO #: A0210-1
Status: inactive
Synopsis: 400MHz and 464/466MHz UltraSPARC II Modules in Enterprise Server platforms may experience system reboots or system panics.Date: Sep/08/2003
SunAlert: Yes
Top FIN/FCO Report: No
Products Reference: 400/464/466MHz UltraSPARC II Modules
Product Category: Server / System Component
Product Affected:
Systems Affected:
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- E10000 ALL Ultra Enterprise 10000 -
- E3500 ALL Ultra Enterprise 3500 -
- E4500 ALL Ultra Enterprise 4500 -
- E5500 ALL Ultra Enterprise 5500 -
- E6500 ALL Ultra Enterprise 6500 -
X-Options Affected:
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
X2590A - ALL 464/466MHz UltraSPARC II Module -
X2580A - ALL 400MHz UltraSPARC II Module -
Parts Affected:
Part Number Description Model
----------- ----------- -----
501-5816-xx CPU Module 464/466MHz UltraSPARC II -
501-6009-xx CPU Module 400MHz UltraSPARC II -
501-5814-xx CPU Module 400MHz UltraSPARC II -
501-5798-xx CPU Module 464/466MHz UltraSPARC II -
501-5815-xx CPU Module 400MHz UltraSPARC II -
(SCSI Devices)
Type Vendor Model SerialNumber(Min) SerialNumber(Max) Firmware
---- ------ ------- ----------------- ----------------- --------
N/A
References:
ECO: WO_25950
ECO: WO_25900
ECO: WO_25898
ECO: WO_25949
ECO: WO_25971
ECO: WO_25957
LEAP: 2211
WWStopShip: P001-20091
BugID: 4795832
FIN: I0896-1
FIN: I0755-1
FIN: I0616-1
SunAlert: 50474
Issue Description:
Certain 400MHz and 464/466MHz UltraSPARC II Modules supported on
Ultra Enterprise 10000, and Ultra Enterprise 3500 - 6500 systems may
experience early life failures resulting in system reboots or system
panics due to Uncorrectable Memory Errors. This issue is related to
a socket problem on the module.
A limited number of systems manufactured between August of 2002 and
December 23, 2002 may contain affected CPU modules. The probability
of experiencing the described issue is considered low at less than
or equal to 10%.
Modules installed before August of 2002 tend not to be affected by
this issue. Therefore, customers that match this criteria should
not take any action unless their system is experiencing errors as
described in this FCO.
Modules that were installed after August of 2002 and before December
of 2002, but have been in operation for 120 days or more tend not
be affected by this issue. Affected modules will typically show
symptoms within the first 30 to 90 days of operation. Customers
that have seen no failures after 120 days operation should not take
any action. Therefore, as of November 30, 2003 all systems should
be past this 120 day window and will no longer be subject to this
situation.
When this issue occurs the system may reboot with the following type of error:
p5 UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0203 UDBL.ESYND 0x03
p5 UDBL Syndrome 0x3 Memory Module Board 6 J3101 J3201 J3301
J3401 J3501 J3601 J3701 J3801
p5 unix: WARNING: [AFT1] errID 0x00027f15.a5d8bc56 Syndrome 0x3
indicates that this may not be a memory module problem
p5 unix: [AFT2] errID 0x00027f15.a5d8bc56 PA=0x00000003.3d5ade08
p5 E$tag 0x00000000.18c067ab E$State: Exclusive E$parity 0x0c
p5 unix: [AFT2] E$Data (0x00): 0x20202020.20202020
p5 unix: [AFT2] E$Data (0x08): 0x20202020.20202022 *Bad* PSYND=0x00ff
p5 unix: [AFT2] E$Data (0x10): 0x20202020.20202020
p5 unix: [AFT2] E$Data (0x18): 0x20202020.20202020
p5 unix: [AFT2] E$Data (0x20): 0x20202020.20202020
p5 unix: [AFT2] E$Data (0x28): 0x20202020.20202020
p5 unix: [AFT2] E$Data (0x30): 0x20202020.20202020
p5 unix: [AFT2] E$Data (0x38): 0x20202020.20202020
p5 unix: NOTICE: Scheduling clearing of error on page 0x00000003.3d5ac000
p5 unix: [AFT3] errID 0x00027f15.a5d8bc56 Above Error is in User Mode
p5 and is fatal: will reboot
p5 unix: WARNING: [AFT1] initiating reboot due to above error in pid 9744
(java)
Systems may also experience panics due to Uncorrectable Memory Errors:
WARNING: [AFT1] EDP event on CPU1 Data access at TL=0, errID
0x00000093.6323e6f8
AFSR 0x00000000.80408000 AFAR 0x00000000.06901980
AFSR.PSYND 0x8000(Score 95) AFSR.ETS 0x00 Fault_PC 0x78128a84
UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
panic[cpu1]/thread=30000ae5000: [AFT1] errID 0x00000093.6323e6f8 EDP Error(s)
Other Possible Error Messages - (THIS IS NOT AN EXHAUSTIVE LIST):
Error Messages Examples:
- EDP Event - Ecache Data Parity Event
- WP Event - Writeback Data Parity Error
- CP Event - Copyout Data Parity Error
- UE Event - Uncorrectable Memory Error
- BERR Event - Bus Error
- CE Event - Correctable Memory Error
E10K specific error examples:
Arbstop dump
- UPA Fatal Error
- ETP Event
Recordstop
- WP Event - Writeback Data Parity Error
- CP Event - Copyout Data Parity Error
- LDP Event - Read Data Parity Error
- UE ECC Error - Uncorrectable memory error
- ldat error
For additional detail and examples of symptoms please go to the
below Sun internal URL;
http://onestop/ecache
Through failure and root cause analysis, it has been determined
that the socket manufacturing process variation widened, resulting in
a small, but unacceptable, number of sockets being manufactured outside
of the product design specification. This process drift and resulting
variation in quality of the CPU socket and has been identified as the
root cause of the CPU module symptoms listed above.
It is important to note that:
1) This is not a CPU, CPU socket or CPU module design problem.
2) While this variation in quality did widen, the vast majority of sockets
are still within design specification and will function as expected.
3) The widened variation in the manufacturing process has resulted in
the potential of a small number of out of specification parts
in any batch or lot of sockets rather than some good batches and
some bad batches. Therefore, Sun is not able to track the out of
specification sockets to specific CPU modules or date codes.
All affected products were placed on Stopship as of December 23, 2002.
According to the StopShip/Purge document (P001-20091.A11) all affected
systems and xoptions were taken off Stopship and began shipping again
as of March 6, 2003.
Corrective action was officially made available by ECO releasing new
part numbers (as shown in the Corrective Action section of this FCO)
which incorporated the new socket.
Corrective action was made available in Sun Services by releasing
LEAP 2211 on February 28, 2003.
Parts Affected:
November 30, 2003
Implementation:
---
| | MANDATORY (Fully Pro-Active)
---
---
| | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| X | UPON FAILURE
---
Replacement Time Estimate:
0.25 hours
Special Considerations:
For additional detail and examples of symptoms please see http://onestop/ecache
which is a Sun internal webpage.
Corrective Action:
Upon Failure or upon Gold/Platinum customer need based on the 120 day
criteria listed above, replace as follows;
- replace 501-6009-xx with 501-6619-01 (or above) or with 501-6622-01 (or above)
or with 501-6624-01 (or above)
- replace 501-5814-xx with 501-6619-01 (or above) or with 501-6622-01 (or above)
or with 501-6624-01 (or above)
- replace 501-5815-xx with 501-6619-01 (or above) or with 501-6622-01 (or above)
or with 501-6624-01 (or above)
- replace 501-5816-xx with 501-6620-01 (or above) or with 501-6623-01 (or above)
- replace 501-5798-xx with 501-6620-01 (or above) or with 501-6623-01 (or above)
Comments:
None
Billing Type:
Warranty: Sun will provide parts at no charge under Warranty
Service. On-Site Labor Rates are based on how the
system was initially installed.
Contract: Sun will provide parts at no charge. On-Site Labor Rates
are based on the type of service contract.
Non Contract: Sun will provide parts at no charge. Installation by
Sun is available based on the On-Site Labor Rates
defined in the Price List.
--------------------------------------------------------------------------
Implementation Footnote:
________________________
i) In case of Mandatory FCOs, Sun Services will attempt to contact
all known customers to recommend the part upgrade.
ii) For controlled proactive swap FCOs, Sun Services mission critical
support teams will initiate proactive swap efforts for their respective
accounts, as required.
iii) For Replace upon Failure FCOs, Sun Services partners will implement
the necessary corrective actions as and when they are required.
--------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
______________
* Access the top level URL of http://sdpsweb.Central/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
_______________________
* Access the SunSolve Online URL at http://sunsolve.Central/
* From there, select the appropriate link to browse the FIN or FCO index.
Internet Access:
_______________
* Access the top level URL of https://spe.sun.com
--------------------------------------------------------------------------
General:
________
Send questions or comments to [email protected]
---------------------------------------------------------------------------