Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type FAB (standard) Sure Solution 1019389.1 : Hardware: A limited number of AMD Opteron "Rev F" CPUs in certain systems can cause instability in some specific configurations.
PreviouslyPublishedAs 239146 Bug Id <SUNBUG: 6587404>, <SUNBUG: 6466941>, <SUNBUG: 6651341> Product Sun Ultra 40 M2 Workstation Sun Fire X4100 M2 Server Sun Fire X4200 M2 Server Sun Netra X4200 M2 Server Sun Fire X4600 M2 Server Sun Blade X6220 Server Module Sun Blade X8420 Server Module Sun Blade X8440 Server Module Date of Resolved Release 20-Jun-2008 A limited number of AMD Opteron CPUs can cause instability in system operation (see details below). ImpactA limited number of AMD Opteron "Rev F" Model 2xxx and 8xxx series (e.g. 2210, 8222, etc.) CPUs manufactured prior to March 2008, under very specific conditions, can cause instability in system operation. Instability in this case refers to the following symptoms:
AMD CPUs contain a PowerNow feature, which when enabled and activated may cause some systems to manifest the above symptoms. Disabling the PowerNow feature has proven to stabilize systems affected by this issue. IMPORTANT NOTE: The symptoms above can also be caused by issues beyond the issue addressed by this FAB. For example, DIMM problems can also generate the same or similar symptoms. The steps outlined in the Corrective Action section MUST be followed to determine if the system instability is caused by the issue addressed by this FAB.Contributing FactorsSun system designs took advantage of certain memory/CPU performance tuning features in the CPU architecture. However, the AMD manufacturing process did not adequately screen for this issue in all tuning configurations until March 2008. SymptomsSystems could experience any or all of the following symptoms if PowerNow is enabled. (Disabling the PowerNow feature has proven to stabilize systems affected by this issue.)
Root CausePowerNow is a power-saving technology within AMD processors. The CPU speed and Vcore are decreased while the system is under low load or idle to save power and to reduce heat and noise. PowerNow must be enabled and must become activated to encounter this issue, but PowerNow is not the source of the issue.The AMD Opteron processor makes use of DLLs (Delay Locked Loops) to control the precise timing of memory address, command and control signals relative to memory clocks. This allows the guarantee of optimal timing margin across processor, voltage, timing and frequency variations. Sun is one of very few companies using these settings to optimize memory performance to deliver the best possible performance for its customers. It was discovered that AMD manufacturing changes, made to optimize processor yields, resulted in an encroachment into those DLL settings reducing the level of margin within which the system could be guaranteed to function optimally. In certain memory configurations with a unique combination of selected timing characteristics some systems may experience some instability when PowerNow becomes enabled. Not all systems will experience this syndrome. In March 2008, AMD implemented a factory screen to recover the margins expected by Sun's DLL tuning implementations. This factory screen ultimately prevents the issue. Note: Because of the timing margin aspects of this issue, and the inter-operational relationship with motherboards & memory, not all CPUs manufactured prior to March 2008 will manifest this issue. CPUs manufactured prior to March 2008 are only more -susceptible- to encountering the issue; The projected rate of encounter within the entire field population is actually quite low. Stable systems tend to remain stable. Disabling the PowerNow feature has proven to stabilize systems with the above symptoms, regardless of when the CPU(s) were manufactured or if they have been factory screened. Corrective ActionReplacement Time Estimate: 10 minutes (per CPU)Hot Swappable: No Resolution: 1. First, if BIOS is not up to date, update the system BIOS to the latest version available from the sun.com website. There have been many recent modifications to memory timing on various platforms and BIOS update may provide the only correction that the system needs. Systems must be running the latest BIOS version to be considered eligible for further remediation under this FAB. If the customer has experienced service interruption as described in this FAB, and the BIOS needs updating, Sun's recommendation to update the BIOS and also disable PowerNow per resolution item #2, below. 2. If the system is already running the latest BIOS and the instability issue(s) are still occurring, disable PowerNow in the system BIOS. Refer to applicable product documentation for procedures describing how to do this on your specific platform. Refer to the COMMENTS section of this FAB for links to Product Documentation. 2.1. If disabling PowerNow stabilizes the system then suggest the customer to leave PowerNow disabled as a long term solution. Disabling PowerNow may be preferable when compared to the inconvenience of swapping CPUs. Sun's recommendation is to update the system BIOS and disable PowerNow to avoid this issue wherever possible. If customer agrees to leaving PowerNow disabled, STOP; The rest of the instructions in this FAB are not necessary. 2.2. If the system remained unstable after updating the BIOS and disabling PowerNow, STOP; This FAB does not apply to your situation. 3. If updating the BIOS and disabling PowerNow resulted in system stability but the customer refuses to disable PowerNow as a long term solution, follow the instructions below for replacement of the CPUs in the problematic system. 3.1. Verify system is programmed with the most recent BIOS revision. 3.2. Verify that the system exhibits any or all of the symptom(s) with PowerNow enabled and verify the system DID NOT exhibit the symptom(s) with PowerNow disabled (Capture system event/error logs as evidence). 3.3. Request "Advance Replacement" CPUs per the procedure outlined below: Important: Do not remove CPUs or heatsinks from the system until you have received replacement CPUs. Note: It may not always be possible to identify a specific problematic CPU on multi-CPU platforms. If that is the case, the total number of CPUs that are installed on the platform may be ordered. 3.3.1. Complete the applicable 'CPU Request' and 'CPU Tracking Data' portions of the DLL-CPU-tracker.ods template, which is available via the below link; http://sdpsweb.central/FIN_FCO/FAB/239146/SPE/DLL-CPU-tracker.ods 3.3.2. Create an email with the following information: Address the email to [email protected] Enter Subject line: 'DLL RMA Request for [customer name, case ID#]' Enter in email body: Customer Company Name Customer Contact Name Customer Location Sun Contact Name Sun Contact Phone Sun Contact email Complete Ship-to Address Complete OPN Part Number & Quantity of Affected CPUs (*) Complete OPN Part Number & Quantity of CPUs Requested (*) (*) Note: In some cases, the replacement CPU will differ from the orignial CPU. Refer to the 'Replacement Matrix' tab in the DLL CPU Tracker to determine which replacement CPU to order. 3.3.3. Attach the partially completed DLL CPU Tracker document 3.3.4. Attach supporting event/error logs 3.3.5. Send the email 3.3.6. Upon validation of your event/error feedback, AMD will ship: - Replacement CPUs that have had the screen applied - thermal grease - alcohol wipes - Return shipping instructions & documentation via email response, including -- RMA number -- An updated DLL CPU Tracker template 3.3.7. Once new CPUs arrive on-site: 3.3.8. Read & adhere to the AMD packaging & handling guidelines: AMD-DLL-CPU-HandlingPackagingGuidelines.pdf available via the below link; http://sdpsweb.central/FIN_FCO/FAB/239146/SPE/AMD-DLL-CPU-HandlingPackagingGuidelines.pdf 3.3.9. Identify 'potentially affected' versus 'not-affected' CPUs: It is the FE's responsibility to ensure that only "potentially affected" CPUs are replaced in accordance with this FAB. CPUs that have already been factory screened will bear one or both of the following markings on the cover of the CPU (refer to the CPU Reference 'Photos' tab in the DLL CPU Tracker.) If the CPU bears a "-" etch mark following the OPN (on the first line of alphanumeric text on the CPU cover) the CPU has been screened and is not affected by this issue. Do not replace CPUs that bear this mark. If the CPU bears a "P" as the first character in the second line of alphanumeric text on the CPU cover, the CPU has been screened and is not affected by this issue. Do not replace CPUs that bear this mark. If the CPU does not bear either of the markings described above, it has not been factory screened and it is potentially affected by this issue. This CPU may be replaced in a problematic system. 3.3.10. Remove suspect CPUs from the system (refer to applicable product documentation for instructions.) 3.3.11. Use the alcohol wipes provided by AMD to thoroughly clean the used thermal grease from the bottom of the heatsink and lid of the CPU. Each thermal grease syringe provided by AMD has sufficient grease for the application of (1) CPU. 3.3.12. Capture all CPU and slot information in the 'CPU Tracking Data' tab in the DLL CPU Tracker. 3.3.13. Apply thermal grease, re-attach the heatsink to the replacement CPU and reinstall the CPU in the system. 3.3.14. Return the suspect CPU, any other replaced CPUs taken from other slots and any unused CPUs to AMD per RMA instructions. Be sure to follow proper packaging and handling as well as labeling guidelines per the AMD CPU Handling & Packaging document. To ship suspect CPUs back to AMD, label the package with the RMA number provided by AMD and ship to: AMD 5900 East Ben White Blvd, M/S 574 Austin, TX 78741 DLL RMA#: _______ (provided earlier by AMD) QTY: ____ Attention: Ed Zahradnik 3.3.15 Send an email containing the AWB number and the updated DLL CPU Tracker file to [email protected]. It is recommended to reply to previous email threads for continuity and ease of tracking. Note: All original suspect CPUs and unused replacements MUST be returned on a 1 for 1 basis. Failure to return CPUS will result in the appropriate Sun Field Service organization to be billed for the advance replacement CPUs. Comments & Special ConsiderationsProduct Documentation Links:Sun Blade X6220 Server Module Documentation Sun Blade 8000 Modular System Documentation Sun Fire X4100 M2 Server Documentation Sun Fire X4200 M2 Server Documentation Sun Fire X4600 M2 Server Documentation Netra X4200 M2 Server Documentation Sun Ultra 40 M2 Workstation Documentation This issue was evaluated as, and determined not to meet criteria for, an FCO due to the low potential of exposure involving very specific configurations and because all CPUs are to be acquired through AMD. In some cases, the customer may be a Sun Field Engineer responsible for servicing the customer account who is handling the receipt/return of CPUs. AMD will pay for the freight for each movement to and from the customer site. There will be no charge to customers for any on-site activities or materials used related to this Field Action Bulletin. Replacement CPUs will not be stored at Sun RSLs. Instead, AMD will provide the logistics support and CPU shipments directly to/from customer sites. This FAB will remain effective and AMD will provide CPUs in support of this activity until June 30, 2009. For replacement materials sent from AMD to the customer site: Shipment terms: CIP (Carriage and Insurance Paid to customer destination) Exporter of record: AMD Importer of record*: Sun Microsystems Declared value of the shipment: AMD's current market price for the respective Ordering Part Number (OPN) For replaced material returning to AMD: Shipment terms: FCA (Free Carrier - customer pick up location) Exporter of record: Sun Microsystems Importer of record*: AMD Declared value of the shipment: AMD's current market price for the respective OPN * Importer or Record Pays VAT, if applicable References: Escalation ID: 44303982 Radiance Cases: 38069399 Other FABs: 231245 Sun Alerts: 201246 Stop Ship Purge: P001-20507 Related URL(s): http://sdpsweb.central/FIN_FCO/FAB/239146/SPE/DLL-CPU-tracker.ods http://sdpsweb.central/FIN_FCO/FAB/239146/SPE/AMD-DLL-CPU-HandlingPackagingGuidelines.pdf For information about FAB documents, its release processes, implementation strategies and billing information, go to the following URL: For Sun Authorized Service Providers go to: In addition to the above you may email: Internal Contributor/submitter [email protected], [email protected] Internal Eng Responsible Engineer [email protected] Internal Services Knowledge Engineer [email protected] Internal Eng Business Unit Group SSG WGS Internal Sun Alert & FAB Admin Info 18-Jun-2008: Finalized FAB draft and sent to Extended Review. 20-Jun-2008: Incorporated feedback from Ext Rvw - sending to Publish. 17-Dec-2009: Replaced Product with Swordfish Nomenclature Attachments This solution has no attachment |
||||||||||||
|