Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Technical Instruction Sure Solution 1009071.1 : SunFire[TM] 12K/15K/20K/25K: During POST Cycle, "lock_retries" Messages Appear
PreviouslyPublishedAs 212508
Applies to:Sun Fire 12K ServerSun Fire 15K Server Sun Fire E20K Server Sun Fire E25K Server All Platforms GoalThe HOST POST (hpost) application is responsible for probing, testing, andconfiguring the hardware of a Sun Fire[TM] 12K-25K domain, preparing it for use by the OpenBoot[TM] PROM and the Solaris[TM] Operating Environment (Solaris[TM] OS). The Sun Fire 12K/15K/20K/25K platform's /opt/SUNWSMS/bin/hpost executable houses both the power-on self-test(POST) as well as the necessary logic involved in sequencing POST's operations. SolutionOn occasion, during the course of a POST run on a specific domain/SB (through hpost's application, its "-d <Domain_Id_or_Tag>" and/or "-H<exp>.<slot>" options), the following messages are captured in the resulting POST logs (located at /var/opt/SUNWSMS/SMS/adm/<domain-tag>/post):stage cpu_lpost: Test all L1 CPU boards... In general, the System Management Services (SMS) subsystem employs the services of the hardware access daemon (hwad) to access specific hardware and is normally expected to lock the JTAG/I2C/ Bootbus master, board, or system with which it is currently communicating to prevent multiple SMS services (that is, POST and the environmental status monitoring daemon (ESMD)) from interfering with each other. The locking is usually facilitated through software mutexes enabled through services provided by the SMS hwad libraries. The "lock_retries" messages (listed previously) arise when the hpost application attempts to acquire a lock (controlled by a software mutex on the System Controller's SMS subsystem) to a specific hardware (for example, SB0). All SMS utilities and services (including hpost) need to adhere to software controlled locks (maintained as a hierarchy of mutex operations) to prevent inter-process deadlocks. In the previous example, hpost requested that the lock requests be serviced by a NOWAIT mode; that is, if it cannot acquire the bootbus lock on a specific proc, hpost check another proc (instead of waiting). This procedure not only helps to optimize the polling cycle but also avoids potential pitfalls in serialization behind other processes that are holding onto the lock, or locks, at the same time. In addition, while maintaining a retry count on these NOWAIT requests, facilities are provided for hpost to 'backoff' on a request for the lock with a TIMEOUT request (in the event that the same proc fails with the NOWAIT requests five times). The default timeout of three minutes is a .postrc tunable. For example, hwad_lock_timeout_secs value specifies the timeout for requests to lock resources with respect to hwad operations overriding the default values. The default timeout value almost never fails in a typical environment and further tuning of the preceding timeout value is strongly discouraged. In conclusion, the "lock_retries" messages were reported when hpost attempted to acquire access to SB0 while some other process (for example, SMS's ESMD routine board temperature/voltage reporting tasks) was also in the process of accessing that same resource. The sample POST log's excerpt shows that hpost successfully backed off to a TIMEOUT regime, and the POST run was allowed to complete successfully. RESOLUTION: In such cases as described above, the "lock_retries" messages can be safely disregarded. Product Sun Fire E25K Server Sun Fire E20K Server Sun Fire 15K Server Sun Fire 12K Server Internal Section Keywords: hpost, post, lock_retries, starcat, hwad, mutex, libraries Previously Published As 75786 Attachments This solution has no attachment |
||||||||||||
|