Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1004922.1
Update Date:2010-07-06
Keywords:

Solution Type  Technical Instruction Sure

Solution  1004922.1 :   Sun Fire[TM] servers: Trouble-shooting RCM failures events in DR operations  


Related Items
  • Sun Fire E6900 Server
  •  
  • Sun Fire 3800 Server
  •  
  • Sun Fire 6800 Server
  •  
  • Sun Netra 1280 Server
  •  
  • Sun Fire E4900 Server
  •  
  • Sun Fire 4800 Server
  •  
  • Sun Fire V1280 Server
  •  
  • Sun Fire E2900 Server
  •  
  • Sun Netra 1290 Server
  •  
  • Sun Fire 4810 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>Midrange V and Netra Servers
  •  
  • GCS>Sun Microsystems>Servers>Entry-Level Servers
  •  
  • GCS>Sun Microsystems>Servers>Midrange Servers
  •  

PreviouslyPublishedAs
206907


Description
This document describes Sun Fire[TM] servers: Trouble-shooting RCM failures events in DR operations.

Steps to Follow
Dynamic reconfiguration (DR), which is provided as part of the Solaris[TM]
Operating Environment, enables you to safely add and remove CPU/Memory
boards and I/O assemblies while the system is still running. DR controls
the software aspects of dynamically changing the hardware used by a domain,
with minimal disruption to user processes running in the domain.
The DR software uses the cfgadm command, which is a command-line interface
for configuration administration. Specifically, the cfgadm_sbd plugin
provides dynamic reconfiguration functionality  for  connecting,
configuring, unconfiguring, and disconnecting class sbd system boards : i.e.,
On a platform employing UltraSPARC(R) III / UltraSPARC(R) III+ CPUs :
# cfgadm -s "select=class(sbd)"
Ap_Id                          Type         Receptacle   Occupant     Condition
N0.IB8                         PCI_I/O_Boa  connected    configured   ok
N0.SB2                         CPU_V2       connected    configured   ok
N0.SB4                         CPU_V2       connected    configured   ok
On a platform employing UltraSPARC(R) IV CPUs :
# cfgadm -s "select=class(sbd)"
Ap_Id                          Type         Receptacle   Occupant     Condition
N0.IB6                         PCI_I/O_Boa  connected    configured   ok
N0.SB0                         CPU_V3       connected    configured   ok
N0.SB2                         CPU_V3       connected    configured   ok
Inherent to the above DR architecture is the Reconfiguration Coordination
Manager ( RCM ) which provides a framework that facilitates 'external'
software applications' interaction with DR operations -- i.e.,
Reconfiguration and Coordination Manager (RCM) is a framework designed to
coordinate device consumers during Solaris[TM] Dynamic Reconfiguration (DR)
The purpose of this document is to detail a procedure underwhich a user can
trouble-shoot RCM failure events through the course of a DR operation --
i.e., getting the appropriate data collected pertaining to the RCM fault
thats causing the DR operation's failure . The RCM interface allow device
consumers, such as application vendors or site administrators, to act
before and after DR operations take place by providing RCM scripts. For
example, RCM scripts can be used to shut down applications, or to cleanly
release the devices from your applications during dynamic remove
operations. An RCM script is an executable perl script, a shell script or a
binary.
A simple example of RCM faults resulting in the failure of a DR operations
might be as follows : i.e.,
# cfgadm -s "select=class(sbd)"
Ap_Id                          Type         Receptacle   Occupant     Condition
N0.IB6                         PCI_I/O_Boa  connected    configured   ok
N0.SB0                         CPU_V3       connected    configured   ok
N0.SB2                         CPU_V3       connected    configured   ok
# cfgadm -v -c disconnect N0.SB2
request delete capacity (8 cpus)
notify add capacity (8 cpus)
cfgadm: Library error: RCM request delete capacity failed for N0.SB2
A useful approach to trouble-shooting the above RCM faults & their
corresponding DR failures is enclosed as follows :
a. The libcfgadm plugin for System Board (slot0) DR - the cfgadm_sbd plugin
(which resides in /usr/platform/sun4u/lib/cfgadm) provides DR functionality
for connecting, configuring, unconfiguring and disconnecting class sbd
system boards. It also enables you to connect or disconnect a system board
from a running system without having to reboot the system.
SBD plugin debugging is enabled by the environment variable SBD_DEBUG. In
general, the value assigned to the env variable is the file name to which
debugging information will directed to & no value assigned would indicate
that debug data will be directed to stdout.
b. In concert with the returned debug data off the SBD plugin, it is also
useful to initiate the rcm_daemon in debug mode via the following command
in a separate window:
/usr/lib/rcm/rcm_daemon -d100
PS: It is imperative any existing / running rcm_daemon process(es) be
kill'ed before the above rcm_daemon run is executed .
The following reconstructs the above trouble-shooting approach in the event
of a RCM fault originating a DR failure event : i.e.,
# cfgadm -al
Ap_Id                          Type         Receptacle   Occupant     Condition
N0.IB6                         PCI_I/O_Boa  connected    configured   ok
N0.IB6::pci0                   io           connected    configured   ok
N0.IB6::pci1                   io           connected    configured   ok
N0.IB6::pci2                   io           connected    configured   ok
N0.IB6::pci3                   io           connected    configured   ok
N0.SB0                         CPU_V3       connected    configured   ok
N0.SB0::cpu0                   cpu          connected    configured   ok
N0.SB0::cpu1                   cpu          connected    configured   ok
N0.SB0::cpu2                   cpu          connected    configured   ok
N0.SB0::cpu3                   cpu          connected    configured   ok
N0.SB0::memory                 memory       connected    configured   ok
N0.SB2                         CPU_V3       connected    configured   ok
N0.SB2::cpu0                   cpu          connected    configured   ok
N0.SB2::cpu1                   cpu          connected    configured   ok
N0.SB2::cpu2                   cpu          connected    configured   ok
N0.SB2::cpu3                   cpu          connected    configured   ok
N0.SB2::memory                 memory       connected    configured   ok
N0.SB4                         unknown      empty        unconfigured unknown
c0                             scsi-bus     connected    configured   unknown
c0::dsk/c0t0d0                 disk         connected    configured   unknown
c0::dsk/c0t6d0                 CD-ROM       connected    configured   unknown
c1                             scsi-bus     connected    unconfigured unknown
# cfgadm -v -c disconnect N0.SB2
request delete capacity (8 cpus)
notify add capacity (8 cpus)
cfgadm: Library error: RCM request delete capacity failed for N0.SB2
On the above DR failure, initiate the trouble-shooting approach documented
above :
1. SBD plugin data --
# setenv SBD_DEBUG
# cfgadm -v -c disconnect N0.SB2
Debug started, pid=1535
path=</devices/ssm@0,0:N0.SB2> drv=<ssm> inst=0 minor=<N0.SB2> target=<N0.SB2>
cid=<> cname=<> cnum=-1
tgt=1 opts=80000000
ap_stat(/devices/ssm@0,0:N0.SB2)
open(/devices/ssm@0,0:N0.SB2)
ioctl(3 SBD_CMD_GETNCM, 0x26c38)
ncm(0)=5
ncm=5
ioctl(3 SBD_CMD_STATUS, sc=0x27080 sz=5892 flags=2)
ap_stat()=0
tgt=1
ap_rcm_init(267b0)
Looking for /usr/lib/librcm.so
/usr/lib/librcm.so found
ap_capinfo(267b0)
ap_cm_capacity(0)=(8 520 5)
ap_cm_capacity(1)=(9 521 5)
ap_cm_capacity(2)=(10 522 5)
ap_cm_capacity(3)=(11 523 5)
ap_cm_capacity(4)=(2097152 5)
cmd=disconnect(13) tmask=0x2 cmask=0x2 omask=0x80000189
ap_seq(3, 5, 13, ffbff01c, ffbff018) = (7, 15)
exec suspend check
ap_ioc(8)
ap_ioc(8)=0x0
ap_ioc(9)
ap_ioc(9)=0x0
ap_ioc(10)
ap_ioc(10)=0x0
ap_ioc(11)
ap_ioc(11)=0x445208
ap_ioc(12)
ap_ioc(12)=0x0
ap_ioc(13)
ap_ioc(13)=0x445209
ap_ioc(14)
ap_ioc(14)=0x445204
ap_ioc(15)
ap_ioc(15)=0x445202
exec request suspend
exec request delete capacity
ap_rcm_ctl(267b0)
ap_rcm_request_cap(267b0)
ap_rcm_cap_cpu(267b0)
getsyscpuids
syscpuids: 0 1 2 3 512 513 514 515 8 520 9 521 10 522 11 523
oldcpuids: 0 1 2 3 512 513 514 515 8 520 9 521 10 522 11 523
change   : 8 520 9 521 10 522 11 523
newcpuids: 0 1 2 3 512 513 514 515
ap_msg(267b0)
<0><request delete capacity><(8 cpus)>
request delete capacity (8 cpus)
ap_err(267b0)
<request delete capacity><N0.SB2>ap_rcm_info(267b0)
<Interrupted system call><><>
ap_seq_exec: rcm_cap_del failed
Sequencing recovery: first = 6, last = 6
exec notify add capacity
ap_rcm_ctl(267b0)
ap_rcm_notify_cap(267b0)
ap_capinfo(267b0)
ap_cm_capacity(0)=(8 520 5)
ap_cm_capacity(1)=(9 521 5)
ap_cm_capacity(2)=(10 522 5)
ap_cm_capacity(3)=(11 523 5)
ap_cm_capacity(4)=(2097152 5)
cm=0 valid=1 type=5, prevos=5 os=5
cm=1 valid=1 type=5, prevos=5 os=5
cm=2 valid=1 type=5, prevos=5 os=5
cm=3 valid=1 type=5, prevos=5 os=5
cm=4 valid=1 type=3, prevos=5 os=5
ap_rcm_cap_cpu(267b0)
getsyscpuids
syscpuids: 0 1 2 3 512 513 514 515 8 520 9 521 10 522 11 523
ap_rcm_cap_cpu: CPU capacity, old = 8, new = 16
oldcpuids: 0 1 2 3 512 513 514 515
change   : 8 520 9 521 10 522 11 523
newcpuids: 0 1 2 3 512 513 514 515 8 520 9 521 10 522 11 523
ap_msg(267b0)
<0><notify add capacity><(8 cpus)>
notify add capacity (8 cpus)
ap_err(267b0)
recovery complete!
ap_rcm_fini(267b0)
cfgadm: Library error: RCM request delete capacity failed for N0.SB2
From the above, we can clearly observe that the following fault originated
the DR detach failure event :
ap_seq_exec: rcm_cap_del failed    <--
i.e., the DR failure originated from a RCM fault which stems from an
internal library call to the RCM framework initiating a delete operations
to current capacity failing.
2. rcm_daemon debug data --
Based on the above acquired SBD plugin debug information, we can reasonably
assume that the RCM framework itself is originating the DR failure. The
next phase of data collection would typically involve the following
operations :
# ps -ef|grep -i rcm_daemon
root  1547   907  0 11:45:57 console  0:00 grep -i rcm_daemon
#
# /usr/lib/rcm/rcm_daemon -d100
enter_daemon_lock: lock file = /var/run/rcm_daemon_lock
rcm_daemon started, debug level = 100
rcmd_db_init(): initialize database
rn_alloc(/, 0)
search directory /usr/lib/rcm/modules/
cli_module_hold(SUNW_cluster_rcm.so)
module_load(name=SUNW_cluster_rcm.so)
module_attach(name=SUNW_cluster_rcm.so)
cli_module_rele(name=SUNW_cluster_rcm.so)
cli_module_hold(SUNW_dump_rcm.so)
module_load(name=SUNW_dump_rcm.so)
module_attach(name=SUNW_dump_rcm.so)
add_resource_client(SUNW_dump_rcm.so, /dev/dsk/c0t0d0s1, 0, 0x1000)
rn_node_find(/dev/dsk/c0t0d0s1, 0x1)
rn_find_child(parent=/, child=SYSTEM, 0x1, 0)
rn_alloc(SYSTEM, 0)
rn_find_child(parent=SYSTEM, child=devices, 0x1, 1)
rn_alloc(devices, 1)
rn_find_child(parent=devices, child=ssm@0,0, 0x1, 1)
rn_alloc(ssm@0,0, 1)
rn_find_child(parent=ssm@0,0, child=pci@18,700000, 0x1, 1)
rn_alloc(pci@18,700000, 1)
rn_find_child(parent=pci@18,700000, child=pci@1, 0x1, 1)
rn_alloc(pci@1, 1)
rn_find_child(parent=pci@1, child=scsi@2, 0x1, 1)
rn_alloc(scsi@2, 1)
rn_find_child(parent=scsi@2, child=sd@0,0, 0x1, 1)
rn_alloc(sd@0,0, 1)
rn_find_child(parent=sd@0,0, child=b, 0x1, 1)
rn_alloc(b, 1)
rsrc_client_find(SUNW_dump_rcm.so, 0, 30f70)
rsrc_node_add_user(b, /dev/dsk/c0t0d0s1, SUNW_dump_rcm.so, 0, 0x1000)
rsrc_client_find(SUNW_dump_rcm.so, 0, 30f70)
rsrc_client_alloc(/dev/dsk/c0t0d0s1, SUNW_dump_rcm.so, 0)
cli_module_hold(SUNW_dump_rcm.so)
rsrc_client_add: /dev/dsk/c0t0d0s1, SUNW_dump_rcm.so, 0
registered /dev/dsk/c0t0d0s1
cli_module_rele(name=SUNW_dump_rcm.so)
cli_module_hold(SUNW_filesys_rcm.so)
module_load(name=SUNW_filesys_rcm.so)
module_attach(name=SUNW_filesys_rcm.so)
FILESYS: register()
FILESYS: registering /dev/dsk/c0t0d0s0
add_resource_client(SUNW_filesys_rcm.so, /dev/dsk/c0t0d0s0, 0, 0x1000)
rn_node_find(/dev/dsk/c0t0d0s0, 0x1)
rn_find_child(parent=/, child=SYSTEM, 0x1, 0)
rn_find_child(parent=SYSTEM, child=devices, 0x1, 1)
rn_find_child(parent=devices, child=ssm@0,0, 0x1, 1)
rn_find_child(parent=ssm@0,0, child=pci@18,700000, 0x1, 1)
rn_find_child(parent=pci@18,700000, child=pci@1, 0x1, 1)
rn_find_child(parent=pci@1, child=scsi@2, 0x1, 1)
rn_find_child(parent=scsi@2, child=sd@0,0, 0x1, 1)
rn_find_child(parent=sd@0,0, child=a, 0x1, 1)
rn_alloc(a, 1)
rsrc_client_find(SUNW_filesys_rcm.so, 0, 31010)
rsrc_node_add_user(a, /dev/dsk/c0t0d0s0, SUNW_filesys_rcm.so, 0, 0x1000)
rsrc_client_find(SUNW_filesys_rcm.so, 0, 31010)
rsrc_client_alloc(/dev/dsk/c0t0d0s0, SUNW_filesys_rcm.so, 0)
cli_module_hold(SUNW_filesys_rcm.so)
rsrc_client_add: /dev/dsk/c0t0d0s0, SUNW_filesys_rcm.so, 0
cli_module_rele(name=SUNW_filesys_rcm.so)
cli_module_hold(SUNW_ip_rcm.so)
module_load(name=SUNW_ip_rcm.so)
IP: mod_init
module_attach(name=SUNW_ip_rcm.so)
IP: register
IP: update_cache
IP: scanning IPv4 interfaces
IP: update_ipifs
IP: update_pif(lo0)
IP: DLPI style2 (lo0)
IP: if ignored (lo0)
IP: update_pif(ce0)
IP: DLPI style2 (ce0)
IP: cache lookup(SUNW_network/ce0)
IP: adding lifs to ce0
IP: update_pif: (SUNW_network/ce0) success
IP: scanning IPv6 interfaces
IP: update_ipifs
IP: update_pif(lo0)
IP: DLPI style2 (lo0)
IP: if ignored (lo0)
IP: update_pif(ce0)
IP: DLPI style2 (ce0)
IP: cache lookup(SUNW_network/ce0)
IP: cache lookup success(SUNW_network/ce0)
IP: adding lifs to ce0
IP: update_pif: (SUNW_network/ce0) success
add_resource_client(SUNW_ip_rcm.so, SUNW_network/ce0, 0, 0x1000)
rn_node_find(SUNW_network/ce0, 0x1)
rn_find_child(parent=/, child=ABSTRACT, 0x1, 0)
rn_alloc(ABSTRACT, 0)
rn_find_child(parent=ABSTRACT, child=SUNW_network, 0x1, 3)
rn_alloc(SUNW_network, 3)
rn_find_child(parent=SUNW_network, child=ce0, 0x1, 3)
rn_alloc(ce0, 3)
rsrc_client_find(SUNW_ip_rcm.so, 0, 310f0)
rsrc_node_add_user(ce0, SUNW_network/ce0, SUNW_ip_rcm.so, 0, 0x1000)
rsrc_client_find(SUNW_ip_rcm.so, 0, 310f0)
rsrc_client_alloc(SUNW_network/ce0, SUNW_ip_rcm.so, 0)
cli_module_hold(SUNW_ip_rcm.so)
rsrc_client_add: SUNW_network/ce0, SUNW_ip_rcm.so, 0
IP: registered SUNW_network/ce0
add_resource_client(SUNW_ip_rcm.so, SUNW_event/resource/new/network, 0, 0x2000)
rn_node_find(SUNW_event/resource/new/network, 0x1)
rn_find_child(parent=/, child=ABSTRACT, 0x1, 0)
rn_find_child(parent=ABSTRACT, child=SUNW_event, 0x1, 3)
rn_alloc(SUNW_event, 3)
rn_find_child(parent=SUNW_event, child=resource, 0x1, 3)
rn_alloc(resource, 3)
rn_find_child(parent=resource, child=new, 0x1, 3)
rn_alloc(new, 3)
rn_find_child(parent=new, child=network, 0x1, 3)
rn_alloc(network, 3)
rsrc_client_find(SUNW_ip_rcm.so, 0, 31170)
rsrc_node_add_user(network, SUNW_event/resource/new/network,
SUNW_ip_rcm.so, 0, 0x2000)
rsrc_client_find(SUNW_ip_rcm.so, 0, 31170)
rsrc_client_alloc(SUNW_event/resource/new/network, SUNW_ip_rcm.so, 0)
cli_module_hold(SUNW_ip_rcm.so)
rsrc_client_add: SUNW_event/resource/new/network, SUNW_ip_rcm.so, 0
IP: registered SUNW_event/resource/new/network
cli_module_rele(name=SUNW_ip_rcm.so)
cli_module_hold(SUNW_mpxio_rcm.so)
module_load(name=SUNW_mpxio_rcm.so)
MPXIO: rcm_mod_init()
module_attach(name=SUNW_mpxio_rcm.so)
MPXIO: register()
MPXIO: found 0 clients.
cli_module_rele(name=SUNW_mpxio_rcm.so)
cli_module_hold(SUNW_network_rcm.so)
module_load(name=SUNW_network_rcm.so)
module_attach(name=SUNW_network_rcm.so)
add_resource_client(SUNW_network_rcm.so, SUNW_resource/new, 0, 0x2000)
rn_node_find(SUNW_resource/new, 0x1)
rn_find_child(parent=/, child=ABSTRACT, 0x1, 0)
rn_find_child(parent=ABSTRACT, child=SUNW_resource, 0x1, 3)
rn_alloc(SUNW_resource, 3)
rn_find_child(parent=SUNW_resource, child=new, 0x1, 3)
rn_alloc(new, 3)
rsrc_client_find(SUNW_network_rcm.so, 0, 31250)
rsrc_node_add_user(new, SUNW_resource/new, SUNW_network_rcm.so, 0, 0x2000)
rsrc_client_find(SUNW_network_rcm.so, 0, 31250)
rsrc_client_alloc(SUNW_resource/new, SUNW_network_rcm.so, 0)
cli_module_hold(SUNW_network_rcm.so)
rsrc_client_add: SUNW_resource/new, SUNW_network_rcm.so, 0
NET: /devices/ssm@0,0/pci@18,700000/pci@1/network@0 is new resource
NET: /devices/ssm@0,0/pci@18,700000/pci@1/network@1 is new resource
NET: ignoring pseudo device /pseudo/clone@0
NET: ignoring pseudo device /pseudo/clone@0
NET: registering /devices/ssm@0,0/pci@18,700000/pci@1/network@1
add_resource_client(SUNW_network_rcm.so,
/devices/ssm@0,0/pci@18,700000/pci@1/network@1, 0, 0x1000)
rn_node_find(/devices/ssm@0,0/pci@18,700000/pci@1/network@1, 0x1)
rn_find_child(parent=/, child=SYSTEM, 0x1, 0)
rn_find_child(parent=SYSTEM, child=devices, 0x1, 1)
rn_find_child(parent=devices, child=ssm@0,0, 0x1, 1)
rn_find_child(parent=ssm@0,0, child=pci@18,700000, 0x1, 1)
rn_find_child(parent=pci@18,700000, child=pci@1, 0x1, 1)
rn_find_child(parent=pci@1, child=network@1, 0x1, 1)
rn_alloc(network@1, 1)
rsrc_client_find(SUNW_network_rcm.so, 0, 312f0)
rsrc_node_add_user(network@1,
/devices/ssm@0,0/pci@18,700000/pci@1/network@1, SUNW_network_rcm.so, 0, 0x1000)
rsrc_client_find(SUNW_network_rcm.so, 0, 312f0)
rsrc_client_alloc(/devices/ssm@0,0/pci@18,700000/pci@1/network@1,
SUNW_network_rcm.so, 0)
cli_module_hold(SUNW_network_rcm.so)
rsrc_client_add: /devices/ssm@0,0/pci@18,700000/pci@1/network@1,
SUNW_network_rcm.so, 0
NET: registered /devices/ssm@0,0/pci@18,700000/pci@1/network@1 (as
SUNW_network/ce1)
NET: registering /devices/ssm@0,0/pci@18,700000/pci@1/network@0
add_resource_client(SUNW_network_rcm.so,
/devices/ssm@0,0/pci@18,700000/pci@1/network@0, 0, 0x1000)
rn_node_find(/devices/ssm@0,0/pci@18,700000/pci@1/network@0, 0x1)
rn_find_child(parent=/, child=SYSTEM, 0x1, 0)
rn_find_child(parent=SYSTEM, child=devices, 0x1, 1)
rn_find_child(parent=devices, child=ssm@0,0, 0x1, 1)
rn_find_child(parent=ssm@0,0, child=pci@18,700000, 0x1, 1)
rn_find_child(parent=pci@18,700000, child=pci@1, 0x1, 1)
rn_find_child(parent=pci@1, child=network@0, 0x1, 1)
rn_alloc(network@0, 1)
rsrc_client_find(SUNW_network_rcm.so, 0, 31310)
rsrc_node_add_user(network@0,
/devices/ssm@0,0/pci@18,700000/pci@1/network@0, SUNW_network_rcm.so, 0, 0x1000)
rsrc_client_find(SUNW_network_rcm.so, 0, 31310)
rsrc_client_alloc(/devices/ssm@0,0/pci@18,700000/pci@1/network@0,
SUNW_network_rcm.so, 0)
cli_module_hold(SUNW_network_rcm.so)
rsrc_client_add: /devices/ssm@0,0/pci@18,700000/pci@1/network@0,
SUNW_network_rcm.so, 0
NET: registered /devices/ssm@0,0/pci@18,700000/pci@1/network@0 (as
SUNW_network/ce0)
cli_module_rele(name=SUNW_network_rcm.so)
cli_module_hold(SUNW_swap_rcm.so)
module_load(name=SUNW_swap_rcm.so)
module_attach(name=SUNW_swap_rcm.so)
add_resource_client(SUNW_swap_rcm.so, /dev/dsk/c0t0d0s1, 0, 0x1000)
rn_node_find(/dev/dsk/c0t0d0s1, 0x1)
rn_find_child(parent=/, child=SYSTEM, 0x1, 0)
rn_find_child(parent=SYSTEM, child=devices, 0x1, 1)
rn_find_child(parent=devices, child=ssm@0,0, 0x1, 1)
rn_find_child(parent=ssm@0,0, child=pci@18,700000, 0x1, 1)
rn_find_child(parent=pci@18,700000, child=pci@1, 0x1, 1)
rn_find_child(parent=pci@1, child=scsi@2, 0x1, 1)
rn_find_child(parent=scsi@2, child=sd@0,0, 0x1, 1)
rn_find_child(parent=sd@0,0, child=b, 0x1, 1)
rsrc_client_find(SUNW_swap_rcm.so, 0, 30f70)
rsrc_node_add_user(b, /dev/dsk/c0t0d0s1, SUNW_swap_rcm.so, 0, 0x1000)
rsrc_client_find(SUNW_swap_rcm.so, 0, 30f70)
rsrc_client_alloc(/dev/dsk/c0t0d0s1, SUNW_swap_rcm.so, 0)
cli_module_hold(SUNW_swap_rcm.so)
rsrc_client_add: /dev/dsk/c0t0d0s1, SUNW_swap_rcm.so, 0
registered /dev/dsk/c0t0d0s1
cli_module_rele(name=SUNW_swap_rcm.so)
cli_module_hold(SUNW_ttymux_rcm.so)
module_load(name=SUNW_ttymux_rcm.so)
TTYMUX: mod_init:
no node for ttymux
module_attach(name=SUNW_ttymux_rcm.so)
cli_module_rele(name=SUNW_ttymux_rcm.so)
cli_module_hold(SUNW_svm_rcm.so)
module_load(name=SUNW_svm_rcm.so)
SVM: cache_all_devices,max sets = 4
SVM: cache_all_devices_in_set
SVM: cache_all_devices no set: setno 1
SVM: exit cache_all_devices
module_attach(name=SUNW_svm_rcm.so)
SVM: register
cli_module_rele(name=SUNW_svm_rcm.so)
cli_module_hold(SUNW_pool_rcm.so)
module_load(name=SUNW_pool_rcm.so)
Segmentation Fault (core dumped)
Two observations can be reasonably concluded based on the above data :
a. The rcm_daemon dumping core off a Segmentation Fault event originated
the DR detach failure ;
&
b. The most likely cause of the rcm_daemon process encountering SEGV would
be the following :
SVM: register
cli_module_rele(name=SUNW_svm_rcm.so)
cli_module_hold(SUNW_pool_rcm.so)
module_load(name=SUNW_pool_rcm.so)
Segmentation Fault (core dumped)
i.e., the inherent Sun[TM] Volume Manager RCM module is most likely responsible
for the rcm_daemon core dump'ing.
In conclusion, the pertinent information available for furthering any
investigations would include :
SBD plugin debug data + rcm_daemon debug data ;
and
core file off the rcm_daemon SEGV -- i.e.,
# pwd
/usr/lib/rcm
# file core
core:           ELF 32-bit MSB core file SPARC Version 1, from 'rcm_daemon'


Product
Sun Fire 6800 Server
Sun Fire 3800 Server
Sun Fire 4810 Server
Sun Fire 4800 Server
Sun Fire V1280 Server
Sun Fire E2900 Server
Sun Fire E4900 Server
Sun Fire E6900 Server
Netra 1280 Server
Sun Netra 1290 Server


Internal Comments
For the internal use of Sun Employee's.

See Bug ID: 5052373
debug, rcm_daemon, SBD_DEBUG, SUNW_svm_rcm.so, DR, dynamic, reconfiguration, rcm_cap_del, librcm, sbd, cfgadm_sbd, cfgadm
Previously Published As
76613

Change History
Date: 2009-11-30
User Name: Josh Freeman
Action: Rubber Stamp
Comment: Made no changes to the article at all - just a "rubber stamp".
Date: 2004-06-29
User Name: 25440
Action: Approved
Comment: Publishing
Version: 0
Date: 2004-06-29
User Name: 25440
Action: Accepted
Comment:
Version: 0

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback