Document Audience: | INTERNAL |
Document ID: | A0258-1 |
Title: | Mitsubishi 256MB DIMMs (Sun p/n 501-5658) showing significantly lower than expected reliability. |
Copyright Notice: | Copyright © 2007 Sun Microsystems, Inc. All Rights Reserved |
Update Date: | Fri Feb 10 00:00:00 MST 2006 |
__________________________________________________________________
*** Sun Confidential: Internal Use and Authorized VARs Only ***
__________________________________________________________________
This message including any attachments is confidential information
of Sun Microsystems, Inc. Disclosure, copying or distribution is
prohibited without permission of Sun. If you are not the intended
recipient, please reply to the sender and then delete this message
__________________________________________________________________
FIELD CHANGE ORDER
(For Authorized Distribution by Sun Services)
FCO #: A0258-1
Status: active
Synopsis: Mitsubishi 256MB DIMMs (Sun p/n 501-5658) showing significantly lower than expected reliability.Date: Feb/10/2006
Top FIN/FCO Report: No
PRODUCT REFERENCE: Memory / DIMM
Product Category: Server / System component
Product Affected:
Platform Description
-------- -----------
E3000 Ultra Enterprise 3000
E3500 Sun Enterprise 3500
E4000 Ultra Enterprise 4000
E4500 Sun Enterprise 4500
E5000 Ultra Enterprise 5000
E5500 Sun Enterprise 5500
E6000 Ultra Enterprise 6000
E6500 Sun Enterprise 6500
X-Options Affected
--------- -------
Mkt_ID Platform Model Description
------ -------- ----- -----------
Parts Affected:
Part Number Description
----------- -----------
501-5658-xx 256MB DIMM, DRAM, 32MX72, 60NS
(Mitsubishi only)
References:
Escalations: 1-9854659, 1-9043236
DPCO: 486
GSAP: 3149
Issue Description:
Mitsubishi 256MB DIMMs (Sun p/n 501-5658) have a much lower reliability
than other vendor DIMMs under the same Sun part number. This reliability
issue manifests itself in the form of UEs (Uncorrectable Errors), and
sometimes with CEs (Correctable Errors), originating from the DIMMs.
MTBFR and MCF analysis indicate higher cumulative failure rate trends
over time. No specific ship vintages are performing significantly worse
than others.
The root cause of this issue is related to three specific buffer chips on
the DIMMs.
No unique symptom will be experienced due to this issue, other than higher
than expected UEs and CEs which can result in system Panics.
Sample error messages, include, but are not restricted to:
Example 1
---------
May 2 17:05:01 E4000 SUNW,UltraSPARC-II: [ID 244460 kern.warning]
WARNING: [AFT1] Uncorrectable Memory Error on CPU14
Instruction access at TL=0, errID 0x0018b3b9.ecc18f2b
Example 2
---------
May 11 18:50:56 E5000 SUNW,UltraSPARC-II: [ID 757754 kern.warning]
WARNING: [AFT1] Uncorrectable Memory Error on CPU9 Instruction access
at TL=0, errID 0x00000193.36bfd291
Example 3
---------
WARNING: [AFT1] Uncorrectable Memory Error on CPU4 Data access at TL=0, errID 0x00000065.0025e610
AFSR 0x00000000.80200000 AFAR 0x00000001.e6dda800
AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x1014f6c8
UDBH 0x02d3 UDBH.ESYND 0xd3 UDBL 0x0000 UDBL.ESYND 0x00
UDBH Syndrome 0xd3 Memory Module Board 0 J3100 J3200 J3300 J3400 J3500 J3600 J3700 J3800
[AFT2] errID 0x00000065.0025e610 PA=0x00000001.e6dda800
E$tag 0x00000000.1ec03cdb E$State: Exclusive E$parity 0x0f
[AFT2] E$Data (0x00): 0x0006dc00.00000000
[AFT2] E$Data (0x08): 0x000700b0.00000030
[AFT2] E$Data (0x10): 0x0004efb4.00000001
[AFT2] E$Data (0x18): 0x00000000.00000000
[AFT2] E$Data (0x20): 0x000700b0.0006dab0
[AFT2] E$Data (0x28): 0x00000030.00000000
[AFT2] E$Data (0x30): 0x0006df24.0006df4c
[AFT2] E$Data (0x38): 0xffbee888.00018cf8
panic[cpu4]/thread=30002739100: [AFT1] errID 0x00000065.0025e610 UE Error(s)
See previous message(s) for details
Example 4
---------
During POST
2,0> Data Access Error from address 00000000.03fffff0
2,0> tl tt tstate tpc tnpc
2,0> 01 32 00000000.15000500 000001ff.f0001bc0 000001ff.f0001bc0
2,0> AFSR 00000000.88000000 AFAR 00000000.03fffff0
2,0> (PRIV) Privileged Code
2,0> (TO) Time Out Error
Corrective action was made available by replacing the three affected
buffer chips on the DIMMs via DPCO #486 on January 7, 2005.
IMPLEMENTATION TYPE:
---
| | MANDATORY (Fully Pro-Active)
---
---
| | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| X | UPON FAILURE
---
IMPLEMENTATION TARGET COMPLETION DATE: September 30, 2007
Replacement Time Estimate:
less than 2 hours
(depending on platform)
Special Considerations:
The aim of this FCO is to create awareness of this issue in the field
and to provide the repair vendors with the instructions to repair the
DIMMs when they are cycled through the repair loop. A severe spare
shortage of these DIMMs in logistics doesn't make pro-active
remediation possible. Doing mass remediations is likely to hamper
normal FRU remediation. Therefore, an exception process has been set
up if more than 8 DIMMs are required to remediate a single platform.
In such cases you are NOT allowed to order the parts without first
receiving approval.
Ultimately, it is the responsibility of the Timezone Services
Vice Presents (VPs) to approve any exception to the FCO. Whenever
the account team feels an exception is warranted, they should...
. For EMEA and APAC: contact the local services escalation manager.
This escalation manager will check material availability with local
logistics, and then decide whether or not to bring forward a request
for approval by the time zone Services Vice President.
. For US and LACR: send the following information to the alias
[email protected]:
1) Customer Name
2) Case History
3) System S/N /s
4) Business Justification
The criteria for approving exceptions will include but not be limited
to the following:
1. Demonstrated failure rate of greater than 1.5% for the last 6 months
2. Key customer + business needs
3. Potential loss of business
Upon receiving approval to remediate a large customer, the account
team is to then work with their local Logistics Analysts to plan for
and support the quantity needed. This will ensure that the Timezones
have an opportunity to proactively plan for these remediations without
affecting Level of Availability (LOA).
Affected System Identification:
------------------------------
The only way to identify if a system is experiencing this issue is
through visual inspection of the DIMMs. Each DIMM has three buffer
chips, two on the front and one on the back side.
A picture of a buffer chip can be found via the below URL...
http://webhome.holland/remcol/FCO/mitsubishi/buffer_chip.jpg
Pictures of Mitsubishi DIMMs can be found via the below URL...
http://webhome.holland/remcol/FCO/mitsubishi/backside.jpg
http://webhome.holland/remcol/FCO/mitsubishi/frontside.jpg
256MB Mitsubishi DIMMs with the following markings can be considered
suspect...
Device Top Marking P01XX to P06xx
P07XX to P15XX
Corrective Action:
Hot Swappable: No
Upon Failure replace 501-5658-xx with 501-5658-xx.
NOTE: Do NOT use the 501-6901 as a replacement part for this FCO,
even though it is an authorized substitute and can be used
to replace the 501-5658 in non-FCO related remedial service.
If using Mitsubishi replacement DIMMs for this remediation, these DIMMs
must have been reworked to remove the suspect components. This can be
determined by finding the DPCO 486 or GSAP 3149 label on the DIMMs. If
using non-Mitsubishi DIMMs, the label is not relevant.
Example DPCO label photos can be found at:
http://webhome.holland/remcol/FCO/DPCO486/
Due to limited availability of these parts, only failing DIMMs
are to be replaced. Other non-failing systems or DIMMs should
*not* be remediated without explicit approval through the
exception process noted above in the "Special Considerations"
section.
It is understood that in case of UE failures the specific failing
DIMM can be extremely difficult to identify. In such cases, and
only if the specific failing DIMM can NOT be identified, the full
failing bank of DIMMs can be replaced.
It is also recommended that the customer be informed of system
upgrade options instead of the remediation due to the EOL status
of these DIMMs. Cost effective upgrade options to migrate from
UltraSPARC II based servers to UltraSPARC IV, with trade-in
discounts of up to 20% as part of the standard UAP trade-in
program. Please contact your local sales person or go to the
following web site for more detail on such upgrades:
http://www.sun.com/ibb/promos/USIVpromo.html?cid=119
Send email to [email protected] for questions or comments about
this Field Change Order.
Comments:
Important! Write "FCO A0258-1" on all Defective Material Tags (DMT)
to ensure proper return processing and always quote the
FCO number in the Radiance case entry.
CHANGE HISTORY:
Sep/23/05 - Changed Implementation Type from Controlled Proactive
to Upon Failure (editing error). Document should have
been released as Upon Failure.
Oct/19/05 - Changed the exception process in Special Considerations
section.
Feb/10/06 - Added a Note in the Corective Action section not to usee
the 501-6901 as a replacement part in the implementation
of this FCO.
Feb/10/06 - Added mention of the "GSAP 3149 label" as a means to
determine if the DIMM has been repaired to the Corrective
Action section.
________________________________________________________________________
NOTE: FCO Tracking Instructions for Radiance/SPWeb:
--------------------------------------------------
If a Radiance case involves the application of an FCO to solve a customer
issue, please complete the following steps in Radiance/SPWeb prior to
closing the case:
o Select "Field Change Order" in the REFERENCE TYPE field.
o Enter FCO ID number in the REFERENCE ID field.
For example; A0222-1.
If possible, include additional details in the REFERENCE SUMMARY field
(ie. Upgrade complete, customer declined, etc.)
________________________________________________________________________
Implementation Notes
--------------------
In case of "Mandatory" FCOs, Sun Services will attempt to contact
all known customers to recommend proactive implementation.
For "Controlled Proactive" FCOs, Sun Services mission critical
support teams will initiate proactive implementation efforts for
their respective accounts, as required.
For "Upon Failure" FCOs, Sun Services and partners will implement
the necessary corrective actions as the need arises.
The CIC process must be used for proactive hardware replacement
requests when an FCO is classified as "Upon Failure".
Billing Information
-------------------
Warranty: Sun will provide parts at no charge under Warranty
Service. On-Site Labor Rates are based on specified
Warranty deliverables for the affected product.
Contract: Sun will provide parts at no charge. On-Site Labor Rates
are based on the type of service contract.
Non Contract: Sun will provide parts at no charge. Installation by
Sun is available based on the On-Site Labor Rates
defined in the Price List.
________________________________________________________________________
All FCO documents are accessible via Internal SunSolve. Type "sunsolve"
in a browser and follow the prompts to Search Collections.
For questions on this document, please email:
[email protected]
The FCO homepage is available at:
http://tns.central/FCO/
For more information on how to submit a FCO, go to:
http://pronto.central/fco.html
To access the Service Partner Exchange, use:
https://spe.sun.com
________________________________________________________________________