Document Audience: | INTERNAL |
Document ID: | A0253-1 |
Title: | A sub-population of DIMMs that shipped between 2001 and 2002 on the below platforms are showing significantly lower reliability than expected. |
Copyright Notice: | Copyright © 2007 Sun Microsystems, Inc. All Rights Reserved |
Update Date: | Thu Aug 18 00:00:00 MDT 2005 |
__________________________________________________________________
*** Sun Confidential: Internal Use and Authorized VARs Only ***
__________________________________________________________________
This message including any attachments is confidential information
of Sun Microsystems, Inc. Disclosure, copying or distribution is
prohibited without permission of Sun. If you are not the intended
recipient, please reply to the sender and then delete this message
__________________________________________________________________
FIELD CHANGE ORDER
(For Authorized Distribution by Sun Services)
FCO #: A0253-1
Status: active
Synopsis: A sub-population of DIMMs that shipped between 2001 and 2002 on the below platforms are showing significantly lower reliability than expected.Date: Aug/18/2005
Top FIN/FCO Report: Yes
PRODUCT REFERENCE: Memory / DIMM
Product Category: Server / Desktop / System Component
Product Affected:
Mkt_ID Platform Model Description
------ -------- ----- -----------
- F12K All Sun Fire 12K
- F15K All Sun Fire 15K
- S8 All Sun Fire 3800
- S12/S12i All Sun Fire 4800/4810
- S24 All Sun Fire 6800
- A40 All Sun Fire V1280
- A35 All Sun Fire SF280R
- N28 All Netra 20
- A28 All Sun Blade 1000
- A29 All Sun Blade 2000
- A37 All Sun Fire V480
- A30 All Sun Fire V880
Parts Affected:
Part Number Description
----------- -----------
501-5401-xx ASSY,SDRAM,DIMM,256MB,18X8MX16
501-6175-xx ASSY,WS NGDIMM,256MB
501-5030-xx ASSY,SDRAM,DIMM,512MB
501-6174-xx ASSY,WS NGDIMM,512MB
501-5031-xx ASSY,SDRAM,DIMM,1GB
501-6109-xx ASSY,SDRAM,DIMM,1GB SF HES
501-6173-xx ASSY,WS NGDIMM,1GB
References:
Sun Alerts: 57757
BugIDs: 5034665
Escalations: 1-833911, 1-1482139
DPCOs/LEAPs: DPCO #483, GSAP #3037, GSAP #3111.B
URL: FAQ http://onestop/qco/pllDIMM/index_pllDIMM.shtml
Issue Description:
Sun has determined that a limited subset of DIMMs shipped in 2001 and
2002 (less than one percent of the installed base) may begin to show
reduced reliability after approximately two years of operation. This
reliability issue manifests itself in the form of UEs (Uncorrectable
Errors), sometimes with CEs (Correctable Errors), originating from the
DIMMs. The reliability of these DIMMs is normal for approximately the
first two years of use, after which they may start to degrade below the
expected level.
The root cause of this issue is related to a PLL device on the DIMMs.
This sub-population of DIMMs has PLL devices with a date code range
between 0049 and 0215 inclusive.
No unique symptom will be experienced due to this issue, other than
higher than expected UEs and CEs. A DIMM lookup tool has been
developed to assist in identifying suspect DIMMs.
Impacted Platforms
------------------
It has been determined that the following platforms if shipped
between Jan/01/2001 and Dec/31/2002 could be impacted:
F12K, F15K, 3800, 4800, 4810, 6800, V1280, V480, V880,
SF 280R, Netra 20, Sun Blade 1000 and 2000
Example System Messages
-----------------------
WARNING: [AFT1] Uncorrectable system bus (UE) Event detected by CPU0
Privileged Data Access at TL=0, errID 0x00000019.4558db40
AFSR 0x00100004.0000000c AFAR 0x00000040.e78fe750
Fault_PC 0x10033c24 Esynd 0x000c Slot B: J7900 J7901 J8001 J8000
[AFT1] errID 0x00000019.4558db40 Two Bits were in error
WARNING: [AFT1] EDU Event detected by CPU0 at TL=0, errID
0x00000019.4558db40
AFSR 0x00200028.0000000c AFAR 0x00000040.e78fe750
AMBIGUOUS
Fault_PC 0x10033c24 Esynd 0x000c AMBIGUOUS
[AFT1] errID 0x00000019.4558db40 Two Bits were in error
NOTICE: Scheduling clearing of error on page 0x00000040.e78fe000
WARNING: [AFT1] WDU Event detected by CPU0 at TL=0, errID
0x00000019.4558db40
AFSR 0x00200028.0000000c AFAR 0x00000040.e78fe750
AMBIGUOUS
Fault_PC 0x10033c24 Esynd 0x000c AMBIGUOUS
[AFT1] errID 0x00000019.4558db40 Two Bits were in error
NOTICE: Scheduling clearing of error on page 0x00000040.e78fe000
panic[cpu0]/thread=30002e32b20: [AFT1] errID 0x00000019.4558db40
UE EDU WDU
Error(s)
Parts Affected:
AMER: August 30, 2007
APAC: December 31, 2007
EMEA: January 31, 2008
Implementation:
---
| | MANDATORY (Fully Pro-Active)
---
---
| X | CONTROLLED PRO-ACTIVE (proactively implement on systems
--- under Gold or above contracts)
---
| | UPON FAILURE
---
Replacement Time Estimate:
less than 2 hours
(depending on platform type)
Special Considerations:
This FCO will have a time zone phased release based on material
readiness as follows:
Readiness Date
--------------
US/Canada READY
Ltn America READY
EMEA Sep/01/2005
APAC READY
ANZO READY
Japan READY
The above dates represent when each time zone has determined that
it will be materially ready to support this FCO. All dates are
estimates. Please check with your Logistics Representative for
more information with regard to material availability.
Note: To order DIMMs for remediation, follow your TZ FCO parts ordering process.
Due to limitations on parts, it is requested that when a failure is
identified in the field, that only the failed DIMM be replaced at that
time. Run the tool to identify other suspected DIMM issues and identify
the actual affected DIMMs during the failure replacement. However,
replacement of the non-failed, affected DIMMs should be scheduled at
a time in the future and ordered through the FCO process. For more
information contact your TZ or Area FCO Representative with any questions
per the following;
. EMEA: Contact the FCO country manager per the following list;
http://finfco.emea/ORGANIZATION/EMEA/country_fco.html
. North America: use the following alias to communicate proactive requirements;
[email protected]
. Latin America: use the following alias to communicate proactive requirements;
[email protected]
. APac: Please follow standard process to order parts. If in doubt, please
contact your local country FCO Representative or Tech Ops Mgr.
There are many causes for the occurrence of UEs and CEs in memory. In fix
on fail situations, if the system no longer experiences the condition after a
monitoring period you can assume the issue was caused by the PLL issue. But
if the system continues to experience the issue, the system likely has other
DIMMs with this or some other issue. If no affected DIMMs are flagged by the
lookup tool, the Field Engineer should continue with the debug process as
normal.
Corrective Action:
Hot Swappable: No
Replace according to the following part swap table, and per the
Details section below:
Replace With
----------- -----------
501-5401-xx 501-5401-03 (or above)
501-6175-xx 501-6175-02 (or above) or 501-5401-03 (or above)
501-5030-xx 501-5030-03 (or above)
501-6174-xx 501-6174-02 (or above)
501-5031-xx 501-6109-02 (or above)
501-6109-xx 501-6109-02 (or above)
501-6173-xx 501-6109-02 (or above)
Note: If the above non-RoHS part numbers are not available, RoHS parts may
be used in their place. Please refer to the Sun System Handbook for
all RoHS part numbers. For example, the 501-7385 may be used in
place of the 501-5030.
More information about the RoHS Program in general can be viewed by going
to Field Information Notice (FIN) 102250.
You may also reference the Worldwide Sub-List which can be viewed using
your Sun employee number and LDAP password via the below URL;
http://roca.central/clrepair/lists/WWsublist.txt
Identifying Suspect DIMMs
-------------------------
To determine if a system has suspect DIMMs within the affected
date code range:
- First use the PLL DIMM Lookup Tool. Run explorer ('prtfru -x') on
the system to test, and use the lookup tool on...
http://pts-appl-z1.holland:8080/PLLManualLookup/
and Prtfru Scanner on...
http://pts-appl-z1.holland/pll.html
to check the output file for affected DIMMs on the system.
Below is a link to the commandline version of the PLL Lookup tool...
http://pts-appl-z1.holland/pll_commandline.html
You can find additional information and an FAQ (item 9 on that page)
via the below URL...
http://onestop/qco/pllDIMM/index_pllDIMM.shtml
Note: Every effort has been made to ensure the lookup tool has a
complete list of the suspect DIMMs. However, due to issues
with traceability of DIMM serial numbers, the lookup tool also
has a small set of DIMMs that may not have the PLL within the
suspect range. Therefore, the instructions in the following
step are necessary to ensure only DIMMs with the target PLLs
are remediated.
- Second, upon system shutdown, Sun Field Representative should
verify that all DIMMs to be replaced have a PHILIPS PLL Device
and the device has a date code that falls within the date code
range of 0049 and 0215 inclusive. For those that do not but
were identified by the PLL DIMM Lookup Tool, they should be
installed back in the system. The Sun Field Engineer needs to
capture the list of DIMM part numbers and serial numbers that
were flagged by the tool but did not have the suspect PLLs.
The Sun Field Engineer should then e-mail this information to
the feedback alias [email protected] so the lookup
tool can be updated.
Note: Whenever a system architecture and the customer setup allow
it, Dynamic Reconfiguration (DR) can be used to remove a
board from a system to inspect and replace DIMMs flagged
by the tool.
Whenever possible, remediation efforts related to this FCO should
be coordinated with remediation efforts for FCO A0248. If you do
so, it is recommended that you replace faulty Uniboards and DIMMs
in one step.
Details:
-------
- For Gold and above accounts, recommend proactive check and
replacement of all affected DIMMs based on the date code range
of 0049 through 0215 inclusive.
Note: The PLL lookup tool should be used to identify if the system
has any of the suspect DIMMs.
Note: If one or more DIMMs are reported suspect by the tool it is
highly recommended to verify the date code of all the DIMMs
on that board and replace the DIMMs that are in the affected
date code ranges.
- For all others, upon failure of one DIMM, use the lookup tool to
check the system and verify if it has the affected DIMMs. The
lookup tool will identify all affected DIMMs. Use the instructions
in the Special Considerations section above to replace all affected
DIMMs in ONLY the system that has experienced the UE or CE event.
The replacement should go beyond the affected DIMM, but not beyond
the affected system.
Note: What to do if a machine shows memory errors that could point to "PLL"
related problems, but the tools do not find any suspect DIMMs.
- The first step in the actions plan should be, visual inspection of
the DIMMs for suspect DIMMs that are missed by the tool. Before
replacing any other parts.
- Replace the DIMMs that fall into the date codes (see instructions
for visual inspection below).
- Visual inspection of DIMMs for systems and boards that do not
display problems is discouraged. This to minimize handling for
parts.
** Findaftt/FindUE can be used to assist in determining if possible
"PLL" related issues are on going, and can be found via the
below URLs;
http://systems-tsc/twiki/bin/view/Tools/ToolPageFindaft
http://systems-tsc/twiki/bin/view/Tools/ToolPageFindUE
** See the FAQ for possible reasons for the tools missing suspect
DIMMs, which can be found via the below (internal only) URL;
http://onestop/qco/pllDIMM/docs/PLL_FAQ.pdf
** Additional information may be found via the below (internal
only) URL;
http://onestop/qco/pllDIMM/index_pllDIMM.shtml
To determine what DIMMs are within the suspect date code range you must find
the PLL chip on the DIMMs themselves. This will be a small square chip on the
DIMM side facing the outside of the system. The chip will be near the center
of the DIMM. See the first URL for an example chip marked in red, and the
other two URLs for close-ups of the chip itself.
Internal only links:
http://pts-platform/twiki/pub/Products/ProdIssuesSunFireV880/NG_Dimm_front_PLL.pdf
http://pts-platform/twiki/pub/Products/ProdIssuesSunFireV880/PLL_closer_look.jpg
http://pts-platform/twiki/pub/Products/ProdIssuesSunFireV880/PLL_up_close.jpg
http://onestop/qco/pllDIMM/docs/PLL_FAQ.pdf
http://onestop/qco/pllDIMM/index_pllDIMM.shtml
Partner viewable links:
http://sdpsweb.central/FIN_FCO/FCO/A0253-1/SPE/NG_Dimm_front_PLL.pdf
http://sdpsweb.central/FIN_FCO/FCO/A0253-1/SPE/PLL_closer_look.jpg
http://sdpsweb.central/FIN_FCO/FCO/A0253-1/SPE/PLL_up_close.jpg
The only PLL chips that are suspect are manufactured by Philips. If a DIMM
has a PLL chip manufactured by any other vendor (Motorola, Agere, etc..) then
it is not suspect.
To determine who manufactured the PLL chip, look at the markings on the chip
itself. A Philips PLL chip will have three to four lines of information
similar to the below information.
PCK953BD
CA6936
TS204B
PCK953BD
CA6936
TS
0204B
The top line is the Philips P/N (always PCK953BD). Therefore the first
step in the visual check will be to look for the Philips P/N (PCK953BD)
on the chip markings.
The second line down is the wafer lot number (for example, CA6936), and
the third or fourth lines will contain a number that includes the date
code (for example TS204B or TS0204B).
NOTE: The printed information is quite small and reading it may require
the use of a magnifying glass.
The date code can be determined by the bottom line on the Philips chip
"TS204B" or "0204B". Ignore the letters and just focus on the
numbers. Both versions above can be read in the same way. 204 stands
for manufacturing year 2002 "2", in the 4th week "04", or in the second
example above year 2002 "02", in the 4th week "04". You may see both
datecode versions in the field "YYWW" or "YWW".
Any Philips PLL called out by the lookup tool that falls within the
manufacturing date code range of year 2000 week 49 through year 2002
week 15 inclusive, should be replaced as suspect. The example chips
above would need to be removed from the system and replaced because
they fall within the year 2000 week 49 "049" or "0049" to year 2002
week 15 "215" or "0215" date code range. Let's give a few more
examples to be ensure understanding.
PCK953BD
CA6936
TS209B
or
PCK953BD
CA6936
TS
0209B
This chip should also be replaced as it falls within the year 2002
"2" or "02" week 9 "09" range.
PCK953BD
CA6936
TS322B
or
PCK953BD
CA6936
TS
0322B
This chip is not within the suspect range as it falls in the year
2003 "3" or "03" week 22 "22" range.
Comments:
Note: Some DIMMs are manufactured by Elpida and/or Hitachi. These DIMMs have
a metal cover on them that covers the PLL device. These DIMMs are not
impacted by this FCO.
Important! Put an "x" in the Purge/FCO box and write "FCO A0253-1" on all
Defective Material Tags (DMT) to ensure proper return processing,
and always quote the FCO number in the Radiance case entry. When
DIMM is proactively replaced under this FCO, ie; did not fail
prior to replacement, also clearly write "proactive replacement"
on the tag.
When mass remediating multiple DIMMs proactively, you may package
all DIMMs into one shipping box, and only mark the box ONCE with
one DMT labeled as noted above.
Send email to [email protected] for questions or comments about
this Field Change Order.
CHANGE HISTORY:
Aug/18/2005 - change all affected part number dash levels to -xx.
- added GSAP 3111.B to References section.
Aug/22/2005 - moved Identifying Suspect DIMMs section from Special Considerations
to Corrective Action section.
- Added TZ contact information in the Special Considerations section.
Aug/25/2005 - republished to distribution alias with above changes and date
change.
Oct/25/2005 - corrected outdated link to EMEA Contact information in SPECIAL
CONSIDERATIONS section.
Dec/15/2005 - added sentence to the end of Important! section under COMMENTS
requesting field put "proactive replacement" on DMT when DIMM
is proactively replaced and wasn't a failed unit.
Jan/25/2006 - added partner viewable links to DIMM pictures in the Corrective
Action section.
Apr/07/2006 - added information in Corrective Action section that RoHS parts
could be used when non-RoHS is not available.
Nov/15/2006 - additional instructions, notes and URLs added to the "Details"
under "Indentifying Suspect DIMMs" in the Corrective Action
section.
May/03/2007 - Updated Target Completion Dates by Timezone.
________________________________________________________________________
NOTE: FCO Tracking Instructions for Radiance/SPWeb:
--------------------------------------------------
If a Radiance case involves the application of an FCO to solve a customer
issue, please complete the following steps in Radiance/SPWeb prior to
closing the case:
o Select "Field Change Order" in the REFERENCE TYPE field.
o Enter FCO ID number in the REFERENCE ID field.
For example; A0222-1.
If possible, include additional details in the REFERENCE SUMMARY field
(ie. Upgrade complete, customer declined, etc.)
________________________________________________________________________
Implementation Notes
--------------------
In case of "Mandatory" FCOs, Sun Services will attempt to contact
all known customers to recommend proactive implementation.
For "Controlled Proactive" FCOs, Sun Services mission critical
support teams will initiate proactive implementation efforts for
their respective accounts, as required.
For "Upon Failure" FCOs, Sun Services and partners will implement
the necessary corrective actions as the need arises.
The CIC process must be used for proactive hardware replacement
requests when an FCO is classified as "Upon Failure".
Billing Information
-------------------
Warranty: Sun will provide parts at no charge under Warranty
Service. On-Site Labor Rates are based on specified
Warranty deliverables for the affected product.
Contract: Sun will provide parts at no charge. On-Site Labor Rates
are based on the type of service contract.
Non Contract: Sun will provide parts at no charge. Installation by
Sun is available based on the On-Site Labor Rates
defined in the Price List.
________________________________________________________________________
All FCO documents are accessible via Internal SunSolve. Type "sunsolve"
in a browser and follow the prompts to Search Collections.
For questions on this document, please email:
[email protected]
For more information on the FCO Program, go to:
http://tns.central/fco
To access the Service Partner Exchange, use:
https://spe.sun.com