![]() | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type FAB (standard) Sure Solution 1463634.1 : A small population of T4-4 systems may experience C2C FMA SUN4V-8002-KQ Faults that can be repaired with updated coherency link tuning values.
Affected X-Options: 7101695 - Processor Module, 3.0GHz, T4-4 Affected Parts: 7019789 - FRU, Processor Module Assy, UltraSPARC T4, 8-Core 3.0GHz (7015550) In this Document
Oracle Confidential (PARTNER). Do not distribute to customers. Applies to:SPARC T4-4 - Version Not Applicable to Not Applicable [Release N/A]Information in this document applies to any platform. __________ Affected X-Options: 7101695 - Processor Module, 3.0GHz, T4-4 Affected Parts: 7019789 - FRU, Processor Module Assy, UltraSPARC T4, 8-Core 3.0GHz (7015550) SymptomsA small population of systems shipped may experience an elevated rate of chip-to-chip (C2C) link replays that will trigger an FMA SUN4V-8002-KQ fault. The fault can be seen by executing "fmadm faulty", where the resulting fault will appear as sample that follows: --------------- ------------------------------------ -------------- --------- Host : ssccn4-m1 Action : Use 'fmadm faulty' to provide a more detailed view of this event. ------------------------------------- end of fault.cpu.generic-sparc.c2c report --------------------------------- The above is a system generated report and references an outdated/dead URL. For more information on this subject refer to the following internal only links; http://events2.us.oracle.com/msg/FMA/SUN4V/82 https://support.us.oracle.com/oip/faces/secure/km/DocumentDisplay.jspx?id=1452064.1 NOTE: The presence of ereports for C2C replays is normal and an expected part of normal system operation. Ereports for C2C replays will appear as follows: ereport.cpu.generic-sparc.c2c-link The existence of the above c2c ereports does not indicate improper or unexpected system operation. FMA will assess the rate of C2C replays and post a SUN4V-8002-KQ fault as noted above should the rate of replays become excessive. Impact The system will remain operational. C2C replays are link retries that are successful and therefore pose no issue with data integrity. The elevated level of replays that results in FMA SUN4V-8002-KQ fault only indicates a link degraded in performance to a level that we do not normally expect in a properly running system, and not an actual failure. FMA SUN4V-8002-KQ fault is triggered in an attempt at pre-emptive hardware failure detection. In systems with sub-optimal tuning, the issue is not related to the hardware actually degrading, but one of marginal link tuning. The Sun_SPARC_T4-4_PM_E0010556.pkg will install tuning parameters that are optimally tuned. ChangesContributing Factors This issue is not specific to any particular configuration. The rate of C2C replays may vary with the system configuration (2P vs 4P) and from power cycle to power cycle. In addition, all system Processor Module (PM) replacements done for any reason, also require the Sun_SPARC_T4-4_PM_E0010556.pkg to be applied in order to ensure that all processor modules installed have the new tuning values. This is particularly important for 4P systems with two PM modules to ensure that the tuning values for links that run between both processor modules (PM) are identical. CauseThe root cause of this fault stems from non-optimal link tuning values originally set that did not allow the C2C link circuitry to make the needed dynamic adjustments as material characteristics changed across production lots. The tuning parameters originally programmed proved to be sub-optimal to handle component variation within the expected design margins, stressing the dynamic tuning capabilities of the hardware, resulting in an increase of C2C replays. The new link tuning values offer more margin that will allow operation across the entire process environment as was originally intended. SolutionWorkaround No workaround available - see Resolution section below.
Installation of the Sun_SPARC_T4-4_PM_E0010556.pkg will rectify the C2C FMA SUN4V-8002-KQ fault resolution. Below are STEP-BY-STEP instructions for applying the patch which is available in Reference DocID 1452064.1 via the below URL; https://support.us.oracle.com/oip/faces/secure/km/DocumentDisplay.jspx?id=1452064.1 STEP #1: (Applying Sun_SPARC_T4-4_PM_E0010556.pkg) The Sun_SPARC_T4-4_PM_E0010556.pkg is applied as follows: a) Transfer patch to a local FTP or HTTP server. b) login into the Service Processor via ILOM cli c) The host must be powered off to apply the patch. (From ILOM cli : stop /SYS) d) Load the patch using the ILOM cli "load command". From ILOM cli: load -source tftp://localFTPserver/Sun_SPARC_T4-4_PM_E0010556.pkg - or - e) The load command will automatically restart/reboot the service processor(ILOM) with Once a PM module has been updated, the presence of the patch can be checked as follows: -> show /SP/logs/event/list NOTE: The above is an example of the ILOM event log output where 2 PMs were updated with STEP #3: (Clear all prior FMA SUN4V-8002-KQ Faults) Once the Sun_SPARC_T4-4_PM_E0010556.pkg has been successfully loaded and verified, then Clear any existing SUN4V-8002-KQ faults from the OS level. Use fmadm to obtain the uuid of any SUN4V-8002-KQ faults. Using ILOM, clear any faults on PM0 and PM1: -> set /SYS/PM1 clear_fault_action=true NOTE 1: Applying the Sun_SPARC_T4-4_PM_E0010556.pkg will resolve C2C FMA SUN4V-8002-KQ However, it is possible that C2C FMA SUN4V-8002-KQ faults are due to degraded hardware in which case they will not be remedied by Sun_SPARC_T4-4_PM_E0010556.pkg. Therefore, if C2C FMA SUN4V-8002-KQ fault is seen after applying the Sun_SPARC_T4-4_PM_E0010556.pkg, then normal hardware debug and replacement process should be followed. Mention in the SR that C2C FMA SUN4V-8002-KQ fault was seen after applying new C2C link tuning patch hence HW replacement is now planned. Identification of Affected Parts (how to) All T4-4 Processor Modules with the following Part Numbers that are experiencing C2C FMA SUN4V-8002-KQ faults require the Sun_SPARC_T4-4_PM_E0010556.pkg to be applied: 7015550 FRU,PM-Module,3.0G,T4-4 A Processor Module part number can be identified by typing the following at the ILOM prompt: -> show /SYS/PM0 fru_part_number /SYS/PM0 Note: PM modules with later production part numbers will NOT have Sun_SPARC_T4-4_PM_E0010556.pkg entries in the ILOM event log output as they were shipped from the factory with the new link training tuning settings, and hence do not require updating. Processor Modules with new link training tuning settings: 7051795 References BugID: 7110931: SSC RQT Fault: fault.cpu.generic-sparc.c2c MOS DocID: 1452064.1 Contacts
Contributor: [email protected], [email protected] Attachments This solution has no attachment |
||||||||||||||||||
|