Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1009467.1
Update Date:2011-06-07
Keywords:

Solution Type  Problem Resolution Sure

Solution  1009467.1 :   How to clear faults in FMA after component replacement on Sun Fire[TM] servers.  


Related Items
  • Sun Fire E6900 Server
  •  
  • Sun Fire V480 Server
  •  
  • Sun Fire 280R Server
  •  
  • Sun Blade 2000 Workstation
  •  
  • Sun Fire V880z Visualization Server
  •  
  • Sun Fire 3800 Server
  •  
  • Sun Fire 6800 Server
  •  
  • Sun Netra 1280 Server
  •  
  • Sun Fire V890 Server
  •  
  • Sun Fire E4900 Server
  •  
  • Solaris SPARC Operating System
  •  
  • Sun Fire V880 Server
  •  
  • Sun Fire 4800 Server
  •  
  • Sun Fire V1280 Server
  •  
  • Sun Fire E2900 Server
  •  
  • Sun Fire V490 Server
  •  
  • Sun Fire 4810 Server
  •  
  • Sun Blade 1000 Workstation
  •  
Related Categories
  • GCS>Sun Microsystems>Desktops>Workstations
  •  
  • GCS>Sun Microsystems>Servers>Entry-Level Servers
  •  
  • GCS>Sun Microsystems>Servers>Midrange Servers
  •  
  • GCS>Sun Microsystems>Operating Systems>Solaris Operating System
  •  
  • GCS>Sun Microsystems>Servers>NEBS-Certified Servers
  •  

PreviouslyPublishedAs
213078


Applies to:

Solaris SPARC Operating System - Version: 8.0 and later   [Release: 8.0 and later ]
Sun Netra 1280 Server
Sun Fire V1280 Server
Sun Fire V480 Server
Sun Fire V490 Server
All Platforms

Symptoms

Solaris[TM] 10 FMD (Fault Management Daemon) reports a failure or suspect component (called a FRU). The component is replaced but it may still reported as faulty or suspect in fmdadm output for Solaris 10 or the system still prints a self-healing message during boot.

Cause

There are three cases in which you have to clear the fault manually:

  1. The component has no fruid/serial number support (e.g. PCI cards)
  2. The fruid/serial number support of the given platform wasn't implemented into fma for this part (e.g Sun Fire 3800 - Sun Fire[TM] E25k and memory)
  3. A self-healing message is printed during boot even though the fmadm faulty list is empty (caused by CR 6369961 fmd emits identical diagnosis after repair when case was never closed).

Solution

Procedure:

As the root user on the domain in question, run the following commands:

  • fmadm faulty
    • This will display a list of components and their associated resource/uuid's that are categorized as faulty or degraded.
    • The resource/uuid is required in order to clear the fault tags.
  • fmadm repair
    • This will clear the suspect or fault tags associated with the resource/uuid's in the faulty list.

The following is an example of how to clear the fault tags on a Host Bus Adapter (HBA) in a Sun Fire[TM] 6800 that has been replaced but is still reporting in FMA as degraded:

 

# fmadm faulty
STATE RESOURCE / UUID
-------- ----------------------------------------------------------------------
degraded dev:////ssm@0,0/pci@19,700000
         47b86ff0-6743-ceff-ba0d-b452d09b0b65 -------- ---------------------------------------------------------------------- degraded dev:////ssm@0,0/pci@19,700000/lpfc@1
  47b86ff0-6743-ceff-ba0d-b452d09b0b65
-------- ----------------------------------------------------------------------
degraded mod:///mod-name=lpfc/mod-id=54
        47b86ff0-6743-ceff-ba0d-b452d09b0b65
-------- ---------------------------------------------------------------------- degraded mod:///mod-name=pcisch/mod-id=25
      47b86ff0-6743-ceff-ba0d-b452d09b0b65
-------- ----------------------------------------------------------------------

NOTE: Once you see the faulty components, run the fmadm repair command to clear the fault.

 

# fmadm repair dev:////ssm@0,0/pci@19,700000

NOTE: After you have run the repair command on each component that has been replaced, re-run the fmadm faulty command to ensure that the fault has been cleared.  If there are no faults, you will not see any output other than the column headings:

# fmadm faulty
STATE RESOURCE / UUID
-------- ----------------------------------------------------------------------
#



Product
Solaris 10 Operating System
Solaris 10 01/06 Operating System
Solaris 10 3/05 Hardware 1 Operating System



Internal Comments

For Internal Oracle users only

See Bug ID 6229087 for more information about the missing FMA implementation on Serengeti and Starcat regarding DIMMs (fixed in Nevada).
FMA, fmd, Sun Fire, Solaris 10, fault, management, architecture, fmadm faulty, replaced, component, still failed, failed, faulty, suspect, swapped, offline, disabled, missing
Previously Published As
82357

Change History
Date: 2011-05-19
User name: Silvana Villamil Merlini
Action: process comments
Comments: audited/updated by Silvana Villamil Merlini, Entry Level SPARC Content Team Member



Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback