Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1018813.1
Update Date:2012-07-30
Keywords:

Solution Type  Problem Resolution Sure

Solution  1018813.1 :   Sun Fire [TM] SF3800/SF4800/SF4810/SF6800 - E4900/E6900 Server: Domains running firmware 5.15.x or later with hang-policy set to "notify" may lose critical troubleshooting data  


Related Items
  • Sun Fire 6800 Server
  •  
  • Sun Fire 3800 Server
  •  
  • Sun Fire E6900 Server
  •  
  • Sun Fire 4800 Server
  •  
  • Sun Fire E4900 Server
  •  
  • Sun Fire 4810 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: Exx00
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>Midrange Servers
  •  

PreviouslyPublishedAs
230603


Applies to:

Sun Fire 3800 Server
Sun Fire 4800 Server
Sun Fire 4810 Server
Sun Fire 6800 Server
Sun Fire E4900 Server - Version: Not Applicable and later    [Release: N/A and later]
All Platforms

Symptoms

Symptoms
Starting with firmware level 5.15.0, ScApp detects and, depending on the
setting of the domain hang-policy variable, can attempt to reset a hung domain.
Systems initially installed with 5.15.0 or later will have the hang-policy
default to "reset", which will attempt to reset a hung domain.

The hang-policy variable was also present in earlier firmware versions.
However, systems that were initially installed with an earlier firmware version
will have the hang-policy set to "notify" by default. When these systems are
upgraded to 5.15.0 or later, the current value of hang-policy,
and all other existing domain and platform settings are left intact. This will
cause two issues.

First, the SC will not attempt to automatically reset a domain with hang-
policy=notify, negating the effects of this new feature in ScApp.

Second, and possibly more importantly, the new features in 5.15.x will cause
the SC to log that it noticed the hung domain. It will log this notice each time
it polls the domain to determine if it is active. The SC will log this notice
both on the loghost server, and in its internal log buffers, which are used to
display data via the showlogs command. This internal buffer is circular - as a
new entry is made, it removes the oldest entry still present in the buffer. The
end result is that a domain hang with hang-policy set to notify will overflow
the circular buffer and eliminate any useful data from "showlogs -d x" that
would indicate the initial condition that caused the hang. An example of these
messages:
...
Aug 09 07:55:12 sunfire-sc0 Domain-C.SC: [ID 180731 local0.notice] Domain C is
active again
Aug 09 07:55:12 sunfire-sc0 Domain-C.SC: [ID 690470 local0.error] Domain
watchdog timer expired.
Aug 09 07:55:12 sunfire-sc0 Domain-C.SC: [ID 398807 local0.notice]
hang-policy is NOTIFY. Not resetting domain.
Aug 09 07:55:13 sunfire-sc0 Domain-C.SC: [ID 180731 local0.notice] Domain C is
active again
Aug 09 07:55:13 sunfire-sc0 Domain-C.SC: [ID 690470 local0.error] Domain
watchdog timer expired.
Aug 09 07:55:13 sunfire-sc0 Domain-C.SC: [ID 398807 local0.notice]
hang-policy is NOTIFY. Not resetting domain.
...

If there is not a working loghost configured for the domain, the failure
cannot be troubleshot.

Cause

Systems initially installed with 5.15.0 or later will have the hang-policy
default to "reset", which will attempt to reset a hung domain.

Solution


Resolution
Use the "setupdomain" command to set hang-policy to "reset" on all platforms that are upgraded to 5.15.x or later. Always configure a working loghost for all Sun Fire 3800/4800/4810/6800 platforms and domains.

See Sun Fire Midframe & Entry-Level Servers Best Practices Update for Firmware 5.20.x for reference


Product
Sun Fire 6800 Server
Sun Fire 4810 Server
Sun Fire 4800 Server
Sun Fire 3800 Server






Bug 4906714 has been submitted re: hang-policy=notify behavior rolling the SC logs.

An RFE may be forthcoming to have hang-policy on each domain changed to reset upon the first upgrade to a FW level >= 5.15.x.





Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback