![]() | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Technical Instruction Sure Solution 1002526.1 : Sun SPARC Enterprise[TM] M3000/M4000/M5000/M8000/M9000 Servers: FMA Specified FRU Replacements
PreviouslyPublishedAs 203504 Applies to:Sun SPARC Enterprise M5000 Server - Version Not Applicable and laterSun SPARC Enterprise M3000 Server - Version Not Applicable and later Sun SPARC Enterprise M4000 Server - Version Not Applicable and later Sun SPARC Enterprise M9000-32 Server - Version Not Applicable and later Sun SPARC Enterprise M9000-64 Server - Version Not Applicable and later All Platforms GoalInvestigating a Sun SPARC[TM] Enterprise M3000/M4000/M5000/M8000/M9000 FMA specified FRU indictment. This document details how to initiate a Service Action Plan to investigate whether a hardware component should be replaced as implicated by the Predictive Self-Healing Diagnosis Engine (FMA DE) on a Sun SPARC Enterprise M3000/M4000/M5000/M8000/M9000 system. NOTE: The implicated hardware component(s) is referred as a Field Replaceable Unit (FRU) throughout this document.
FixThis document makes a few assumptions:
To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - M Series Servers
1. Collect the FMA Fault Message. The output can be displayed using fmdump -m on the XSCF console. Example output is as follows: MSG-ID: SCF-8001-4X, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Tue Mar 20 21:23:54 UTC 2007 PLATFORM: SPARC-Enterprise, CSN: 7860000772, HOSTNAME: genericff2 SOURCE: sde, REV: 1.12 EVENT-ID: d93a7654-f414-46aa-aaf5-21f70a5af931 DESC: The number of uncorrectable and correctable errors on single DIMM exceeds an acceptable threshold. This fault is detected while running POST. Refer to http://www.sun.com/msg/SCF-8001-4X for more information. AUTO-RESPONSE: The memory associated with the memory bank containing the errors is deconfigured. IMPACT: POST is restarted after the memory associated with the memory bank has been deconfigured. REC-ACTION: Schedule a repair action to replace the affected Field Replaceable Unit (FRU), the identity of which can be determined using fmdump -v -u EVENT_ID. Please consult the detail section of the knowledge article for additional information. 2. Collect the "fmdump -v -u " output relating to the fault event. Example (uses the same event as Step 1's example - note the event ID is in bold ): xscf> fmdump -v -u d93a7654-f414-46aa-aaf5-21f70a5af931 TIME UUID MSG-ID Mar 20 21:23:54.0192 d93a7654-f414-46aa-aaf5-21f70a5af931 SCF-8001-4X 100% fault.chassis.SPARC-Enterprise.memory.bank.err Problem in: hc:///chassis=0/cmu=0/mem=0 Affects: hc:///chassis=0/cmu=0/mem=0 FRU: hc://:product-id=SPARC-Enterprise:chassis-id=7860000772: server-id=san-ff2-21-0:serial=04126711: part=72T128000HR3.7A:revision=252b/component=/MBU_B/MEMB#0/MEM#3A 3. Collect the fault information to prepare to log a Service Request:
4. Contact Oracle Support Services or your local service representative and open a "Service Request". 5. Review FRU Replacement Methods information to prepare your configuration for the FRU replacement.
6. A Oracle Support Services Engineer may need additional data to be collected. If so they will specify the data to collect. Please assist in capturing requested data so Oracle can resolve your issue with as little delay as possible. The most likely data requested will be:
Reference <Document: 1008229.1> Running explorer on Sun SPARC Enterprise[TM] M3000/M4000/M5000/M8000/M9000 (OPL) Servers if required for help with either Explorer or Snapshot data requests.
FMA data is maintained on the XSCFU to provide error history and to enhance troubleshooting of current issues. There are no regular mode utilities provided to clear the information.
Confirm the fault event message, fmdump output, and all data are from the same date and implicate the same FRU component. 2. Verify that "fmdump -v -u Event-ID" contains the list of FRU indictments for this fault event. The list of FRUs is displayed in the order in which they are intended to be replaced (percentage of likelihood). 3. Verify the FRU replacement method that can be used for the specific FRU requiring service and the configuration in question. The customer may have specified a desired method to use, so verify if it the method desired is possible. 4. Create the Service Action Plan and report the recommendations to the Customer/End User. Use the Action Plan Creator Tool to create the Service Action Plan. 5. Dispatch the replacement to the appropriate field resources and choose the appropriate Canned Action Plan in ATR. Reference the Service Manual for the Platform type and FRU in question if needed: Confirm that the FRU replacement resolved the issue and no errors have repeated for at least 24 hours. 7. If the exact same fault event repeats, go back to step 2 and replace the next likeliest FRU listed in fmdump output. If the same error persists and all FRUs in the list have been replaced or you are unsure of the next steps, collaborate with the next level of support for further investigation. Additional troubleshooting Information
References<NOTE:1332409.1> - How to repair FMA module errors seen in 'fmadm faulty'<NOTE:1008229.1> - Gathering diagnostic data for SPARC Enterprise M3000/M4000/M5000/M8000/M9000 (OPL) Servers @<NOTE:1012818.1> - Sun SPARC[R] Enterprise M3000/M4000/M5000/M8000/M9000 (OPL): Information & Troubleshooting certain MAC faults @<NOTE:1012820.1> - Sun SPARC[R] Enterprise M3000/M4000/M5000/M8000/M9000 (OPL): Information & Troubleshooting certain rci faults. @<NOTE:1012821.1> - Sun SPARC(R) Enterprise M3000/M4000/M5000/M8000/M9000: Information & Troubleshooting certain software (sw) faults. <NOTE:1012954.1> - Sun SPARC Enterprise[TM] M3000/M4000/M5000/M8000/M9000: Information & Troubleshooting fmsp faults. <NOTE:1007101.1> - Sun SPARC(R)Enterprise M3000/M4000/M5000/M8000/M9000 (OPL) Servers: Fault clearing and LEDs behavior @<NOTE:1008208.1> - Sun SPARC[R] Enterprise M3000/M4000/M5000/M8000/M9000 (OPL): Information & Troubleshooting certain FLP faults @<NOTE:1008211.1> - Sun SPARC[R] Enterprise M8000/M9000: Information & Troubleshooting certain XB (Crossbar chip) faults. @<NOTE:1017763.1> - Sun SPARC[R] Enterprise M3000/M4000/M5000/M8000/M9000 : Information & Troubleshooting certain JTAG faults @<NOTE:1002809.1> - Sun SPARC Enterprise M3000/M4000/M5000/M8000/M9000: Information & Troubleshooting certain MBC faults <NOTE:1003993.1> - Sun SPARC Enterprise[TM] M3000/M4000/M5000/M8000/M9000: Field Replaceable Unit (FRU) Replacement Methods @<NOTE:1004117.1> - Sun SPARC(R) Enterprise M3000/M4000/M5000/M8000/M9000: Information & Troubleshooting certain DIMM faults @<NOTE:1004122.1> - Sun SPARC[R] Enterprise M3000/M4000/M5000/M8000/M9000 (OPL): Information & Troubleshooting certain Thermal faults. @<NOTE:1005335.1> - Sun SPARC[R] Enterprise M3000/M4000/M5000/M8000/M9000 (OPL): Information & Troubleshooting certain MADM faults. @<NOTE:1006871.1> - Sun SPARC[R] Enterprise M3000/M4000/M5000/M8000/M9000 (OPL): Information & Troubleshooting certain IOC faults @<NOTE:1006872.1> - Sun SPARC Enterprise(R) M3000/M4000/M5000/M8000/M9000: Information & Troubleshooting certain Power faults @<NOTE:1006992.1> - Sun SPARC[R] Enterprise M3000/M4000/M5000/M8000/M9000 (OPL): Information & Troubleshooting certain CPU Faults @<NOTE:1002629.1> - Sun SPARC Enterprise M4000/M5000/M8000/M9000: Information & Troubleshooting certain SC chip faults. @<NOTE:1002730.1> - Sun SPARC[R] Enterprise M3000/M4000/M5000/M8000/M9000 (OPL): Information & Troubleshooting certain Clock Unit faults Attachments This solution has no attachment |
||||||||||||
|