Asset ID: |
1-72-1019625.1 |
Update Date: | 2010-08-02 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1019625.1
:
fmd event transport unable to support more than 6 running domains
Related Items |
- Sun SPARC Enterprise M8000 Server
- Sun SPARC Enterprise M9000-32 Server
- Sun SPARC Enterprise M9000-64 Server
|
Related Categories |
- GCS>Sun Microsystems>Servers>OPL Servers
|
PreviouslyPublishedAs
242526
SymptomsBooting a 7th domain will cause the XSCF event-transport module to fail.
See CR 6716103
Impact:
----------
When the fma 'event-transport' module is failed, fault events won't be exchanged between domains and the xscf. Hence, this will prevent the XSCF from taking the appropriate action for faults diagnosed on the domain (e.g., offlining a cpu or deconfiguring a dimm due to too many correctable errors)
- On the XSCF
- fma module "event-transport" in failed status
- Example:
XSCF> fmadm config
MODULE VERSION STATUS DESCRIPTION
case-close 1.0 active Case-Close Agent
event-transport 2.0 failed Event Transport Module
faultevent-post 1.0 active Gate Reaction Agent for errhandd
flush 1.10 active Resource Cache Flush Agent
fmd-self-diagnosis 1.0 active Fault Manager Self-Diagnosis
iox_agent 1.0 active IO Box Recovery Agent
reagent 1.16 active Reissue Agent
sde 1.16 active Simple Diagnosis Engine
snmp-trapgen 1.0 active SNMP Trap Generation Agent
sysevent-transport 1.0 active SysEvent Transport Agent
syslog-msgs 1.0 active Syslog Messaging Agent
- ereport.fm.fmd.module with msg = event-transport request to create an auxiliary thread exceeds module thread limit (8)
- Example:
From 'fmdump -eV errlog':
Sep 09 2008 15:53:17.274465420 ereport.fm.fmd.module
nvlist version: 0
version = 0x0
class = ereport.fm.fmd.module
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = fmd
authority = (embedded nvlist)
nvlist version: 0
version = 0x0
product-id = SPARC Enterprise M9000
chassis-id = 2020643005
server-id = san-dc2-1-0
(end authority)
mod-name = event-transport
mod-version = 2.0
(end detector)
ena = 0x5090126c9883c001
msg = event-transport request to create an auxiliary thread exceeds module thread limit (8)
__ttl = 0x1
__tod = 0x48c6fe5d 0x105c028c
- On the Domains
- hundreds of ereport.fm.fmd.module with msg = Failed to read S_HELLO from dev:///sp0: Resource temporarily unavailable
and/or ereport.fm.fmd.module with msg = Failed to write C_HELLO to dev:///sp0: Transport endpoint is not connected - Example:
From "fmdump -eV"
TIME CLASS
Sep 10 2008 17:33:01.422924350 ereport.fm.fmd.module
nvlist version: 0
version = 0x0
class = ereport.fm.fmd.module
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = fmd
authority = (embedded nvlist)
nvlist version: 0
version = 0x0
product-id = SUNW,SPARC-Enterprise
chassis-id = 2020643005
server-id = san-dc2-1-g
(end authority)
mod-name = event-transport
mod-version = 2.0
(end detector)
ena = 0x5e60bdf957002401
&
- In the TWO files (on the active AND standby XSCF),
/hcp1/scfprog/init/scf_initrc/11cmemready/S29setfmconf
/hcp0/scfprog/init/scf_initrc/11cmemready/S29setfmconf
change the last line of S29setfmconf (using the 'vi' command):
exit 0
to
${EGREP} -n 'setprop *client.thrlim' ${FILE} >/dev/null 2>&1
if [ $? -ne 0 ]; then
echo "setprop client.thrlim 48" >> ${FILE}
fi
exit 0 - Run rebootxscf on the active XSCF unit.
- Check that the thread limit is now 48 with the following command:
XSCF> fmstat -a -m event-transport
NAME VALUE DESCRIPTION
error_drop_read 0 Dropped read messages
error_post_filter 0 Post filter errors
...
fmd.thrlimit 48 limit on number of auxiliary threads - Reboot all running domains.
ProductSun SPARC Enterprise M8000
Sun SPARC Enterprise M9000
event-transport, ereport.fm.fmd.module, thread limit, S_HELLO
Product_uuid
2eb6b8a2-ce94-11db-9135-080020a9ed93
51e8feab-ce93-11db-9135-080020a9ed93
Attachments
This solution has no attachment