Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1384448.1
Update Date:2012-04-24
Keywords:

Solution Type  Problem Resolution Sure

Solution  1384448.1 :   Cellsrv fails to start ORA-07445: exception encountered: core dump _Z17raiseStartupErrorP5kgsmpPKcj10assertType11restartType()  


Related Items
  • Exadata Database Machine V2
  •  
Related Categories
  • PLA-Support>Database Technology>Engineered Systems>Oracle Exadata>DB: Exadata_EST
  •  
  • .Old GCS Categories>ST>Server>Engineered Systems>Exadata>Hardware
  •  




In this Document
Symptoms
Changes
Cause
Solution


Created from <SR 3-5007576011>

Applies to:

Exadata Database Machine V2 - Version Not Applicable and later
Information in this document applies to any platform.

Symptoms

one of the exadata  storage is not able to startup the cellsrv services.


/etc/init.d/celld status
rsStatus: running
msStatus: running
cellsrvStatus: stopped


cell alert history shows:

2011-12-02T19:45:28+01:00 critical "ORA-07445: exception encountered: core dump [_Z17raiseStartupErrorP5kgsmpPKcj10assertType11restartType()+317] [11] [0x000000000] [] [] []"


Cell alert log shows:

[RS] Started monitoring process /opt/oracle/cell11.2.2.3.2_LINUX.X64_110520/cellsrv/bin/cellrssmt with pid 9922
Fri Dec 02 19:45:28 2011
Successfully setting event parameter -
CELLSRV process id=9919
CELLSRV cell host name=fraartou01cel05.de.db.com
CELLSRV version=11.2.2.3.2,label=OSS_11.2.0.3.0_LINUX.X64_110520,Fri_May_20_05:23:52_PDT_2011
No Infiniband device found.
CELLSRV could not initialize infiniband context
CELLSRV failed to start due to the error 15 ((null))
Errors in file /opt/oracle/cell11.2.2.3.2_LINUX.X64_110520/log/diag/asm/cell/fraartou01cel05/trace/svtrc_9919_0.trc (incident=17):
ORA-07445: exception encountered: core dump [_Z17raiseStartupErrorP5kgsmpPKcj10assertType11restartType()+317] [11] [0x000000000] [] [] []
Incident details in: /opt/oracle/cell11.2.2.3.2_LINUX.X64_110520/log/diag/asm/cell/fraartou01cel05/incident/incdir_17/svtrc_9919_0_i17.trc


The diag trace shows:


Trace file /opt/oracle/cell11.2.2.3.2_LINUX.X64_110520/log/diag/asm/cell/fraartou01cel05/trace/svtrc_9919_0.trc
ORACLE_HOME = /opt/oracle/cell11.2.2.3.2_LINUX.X64_110520
System name: Linux
Node name: fraartou01cel05.de.db.com
Release: 2.6.18-194.3.1.0.4.el5
Version: #1 SMP Sat Feb 19 03:38:37 EST 2011
Machine: x86_64
CELL SW Version: OSS_11.2.0.3.0_LINUX.X64_110520

*** 2011-12-02 19:45:28.441
skgzib_load_ib_symbols: loaded infiniband libraries from libibumad.so.2.0.0 and libibmad.so.2.2.0
skgzib_ini: SKGZIB is running in debug mode.
skgzib_query_node_guid: No Infiniband device found.
skgzib_ini: Fail to query GUIDs on the current system.
ossp_initskgzib: Fail to init skgzib context, retcode = 56881
Writing message type OSS_PIPE_ERR_DNR to OSS->RS pipe
Writing message type OSS_PIPE_ERR_FAILED_STARTUP_RESTART to OSS->RS pipe
DDE: Flood control is not active
Incident 17 created, dump file: /opt/oracle/cell11.2.2.3.2_LINUX.X64_110520/log/diag/asm/cell/fraartou01cel05/incident/incdir_17/svtrc_9919_0_i17.trc
ORA-07445: exception encountered: core dump [_Z17raiseStartupErrorP5kgsmpPKcj10assertType11restartType()+317] [11] [0x000000000] [] [] []
Writing message type OSS_PIPE_ERR_FAILED_STARTUP_RESTART to OSS->RS pipe



/var/log/messages :


Dec 2 21:50:41 fraartou01cel05 kernel: Registered RDS/iwarp transport
Dec 2 21:50:41 fraartou01cel05 kernel: Registered RDS/infiniband transport
Dec 2 21:50:41 fraartou01cel05 kernel: Ethernet Channel Bonding Driver: v3.2.3 (December 6, 2007)
Dec 2 21:50:41 fraartou01cel05 kernel: bonding: Warning: either miimon or arp_interval and arp_ip_target module parameters must be specified, otherwise bonding will not detect link failures! see bonding.txt for details.
Dec 2 21:50:41 fraartou01cel05 kernel: bonding: bondib0 is being created...
Dec 2 21:50:41 fraartou01cel05 kernel: bonding: bondib0: setting mode to active-backup (1).
Dec 2 21:50:41 fraartou01cel05 kernel: bonding: bondib0: Setting MII monitoring interval to 100.
Dec 2 21:50:41 fraartou01cel05 kernel: bonding: bondib0: Setting down delay to 5000.
Dec 2 21:50:41 fraartou01cel05 kernel: bonding: bondib0: Setting up delay to 5000.
Dec 2 21:50:41 fraartou01cel05 kernel: ADDRCONF(NETDEV_UP): bondib0: link is not ready
Dec 2 21:50:41 fraartou01cel05 kernel: bonding: bondib0: Adding slave ib0.
Dec 2 21:50:41 fraartou01cel05 kernel: bonding: bondib0: Interface ib0 does not exist!
Dec 2 21:50:41 fraartou01cel05 kernel: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX


Run of IB commands fails:



# ibstatus
Fatal error: device '*': sys files not found (/sys/class/infiniband/*/ports)
# ibv_devinfo
No IB devices found
# ibdiagnet
Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.2
-W- Topology file is not specified.
Reports regarding cluster links will use direct routes.
Loading IBDM from: /usr/lib64/ibdm1.2

-E- IBIS: No HCA was found on local machine.
Exiting.
# ibhosts
ibwarn: [10306] mad_rpc_open_port: can't open UMAD port ((null):0)
/usr/sbin/ibnetdiscover: iberror: failed: Failed to open (null) port 0


No Infiniband device is found at system

lspci | grep InfiniBand



Changes

Recently HBA battery was replaced

Cause

OS IB configuration files are good and the OS startup message shows:
kernel: ADDRCONF(NETDEV_UP): bondib0: link is not ready
which indicates a possible cabling issue or Host Channel Adapter issue

Solution

There is likely a cabling or hardware fault which mostly need a field visit.
Please raise a Service Request to support along with below information from problem node:
#/etc/init.d/celld status
#cd $ADR_BASE
#tar -cvf `hostname`cell.tar .
#cellci -e list alerthistory
#/var/log/messages*
#ifconfig -a
#/etc/modprobe.conf
#/etc/sysconfig/network-scripts/ifcfg-ib1
#/etc/sysconfig/network-scripts/ifcfg-ib0
#/etc/sysconfig/network-scripts/ifcfg-bondib0
# ibstatus
# ibv_devinfo
#lspci -tv / lspci -xxvvv
#dmesg
#ls /etc/rc3.d/
#cat /etc/init.d/openibd
#lspci | grep -i infi


Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback