Asset ID: |
1-75-1348904.1 |
Update Date: | 2012-07-23 |
Keywords: | |
Solution Type
Troubleshooting Sure
Solution
1348904.1
:
Useful Network Diagnostics Commands for Oracle Exalogic Machine
Related Items |
- Oracle Exalogic Elastic Cloud X2-2 Half Rack
- Oracle Exalogic Elastic Cloud Software
|
Related Categories |
- PLA-Support>Database Technology>Engineered Systems>Oracle Exalogic>MW: Exalogic Core
- .Old GCS Categories>Sun Microsystems>Switches>Sun InfiniBand IB
|
Network Diagnostics information for Oracle Exalogic machine
In this Document
Applies to:
Oracle Exalogic Elastic Cloud Software - Version 1.0.0.0.0 and later
Oracle Exalogic Elastic Cloud X2-2 Half Rack - Version Not Applicable to Not Applicable [Release N/A]
Information in this document applies to any platform.
Purpose
This document provides information about diagnostic commands which will be useful for troubleshooting network related issues in Exalogic machine.
Troubleshooting Steps
ping and rds-ping commands:
Ping and rds-ping commands are used to test reachability of remote node over RDS.
rds-ping command is used to test whether a remote node is reachable over RDS. Its interface is designed to operate pretty much the standard ping(8) utility, even though the way it works is pretty different. rds-ping opens several RDS sockets and sends packets to port 0 on the indicated host. This is a special port number to which no socket is bound; instead, the kernel processes incoming packets and responds to them.
Note: Reliable Datagram Sockets (RDS) is a reliable-socket off-load driver and inter-processor communication (IPC) protocol with low overhead, low-latency, high-bandwidth. RDS enables enhanced application performance and cluster scalability. RDS protocol provides reliable datagram services multiplexing UDP packets over InfiniBand connection improving performance to Oracle RAC. It provides high performance cluster interconnect for Oracle RAC.
Below is syntax for rds ping command.
#rds-ping -c 5 <IPoIB>
ping command is used to test IPoIB and eth interfaces. Below is syntax for this command.
#ping -c 5 <IPoIB>
#ping -c 5 <eth0>
To do tasks in case of network issues on any of Exalogic compute nodes:
Logon to compute node where you have issues and run below diagnostic commands.
Run infinicheck command
This command verifies the InfiniBand network connectivity of the rack. Run infinicheck command from any of the compute nodes. Below is syntax for this command.
# /opt/exalogic.tools/tools/infinicheck -g ib_group |tee /tmp/infinicheck_`hostname`.log
where ib_group contains the hostnames/IPoIB of the compute nodes part of the cluster. It has to be using the IB subnet.
PLEASE NOTE: Infinicheck may impact the performance of the running system. Network diagnostics may not always be collected when the system is broken, so running apps may be impacted.
Run verify-topology command
This command verifies the InfiniBand topology of the rack. Below is syntax for this command.
Usage: /opt/exalogic.tools/tools/verify-topology
[-h|--help] [-v|--verbose] [-t|--topology [fullrack | halfrack | quarterrack] default is fullrack]
#/opt/exalogic.tools/tools/verify-topology -t halfrack > /tmp/verifytop_`hostname`.log
Note: For halfrack use -t halfrack. For quarter rack use -t quarterrack
Gathering troubleshooting data from NM2 switches:
For gathering troubleshooting information from the NM2 switches execute below listed commands on each of the Nano Magnum 2 (NM2) switches and provide the output.
ibchecknet:
This command is a simplified version of the ibcheckerrors command. This InfiniBand command is a script that uses the topology file created by the ibnetdiscover command to scan the InfiniBand fabric to validate the connectivity and report errors from the port counters. Below is syntax of ibchecknet command.
ibchecknet [-h][-N][topology|-C ca_name -P ca_port -t timeout]
where:
topology is the topology file.
ca_name is the channel adapter name.
ca_port is the channel adapter port.
timeout is the timeout in milliseconds.
nm2version:
This hardware command shows the hardware, software versions and date information for the switch and management controller.
Below is output of this command which displays the version information.
# nm2version
NM2-72p version: 0.1.0-1
Build time: Aug 24 2009 16:41:03
FPGA version: 0x94
ComExpress info:
Board Name: "NOW1"
Manufacturer Name: "JUMP"
Manufacturing Date: 2009.02.19
Last Repair Date: 1980.01.01
Serial Number: "NCD2S0240"
Hardware Revision: 0x0100
Firmware Revision: 0x0102
Jida Revision: 0x0103
Feature Number: 0x0001
Note: The output of the nm2version command contains extraneous information relevant only to the manufacturer of the management controller.
sminfo:
This InfiniBand command conducts a query of the Subnet Manager and outputs the information in a human readable format. The target Subnet Manager is identified in the local port information, or it is specified by the smlid or smdr_path.
Note: Using the sminfo command for other than simple queries might fault the target Subnet Manager.
Below is syntax for sminfo command.
sminfo [-d][-e] -s state -p priority -a activity [-D][-G][-h][-V][-C ca_name][-P ca_port][-t timeout] smlid|smdr_path
where:
state is the state for the Subnet Manager.
priority is the priority.
activity is the activity count.
ca_name is the channel adapter name.
ca_port is the channel adapter port.
timeout is the timeout in milliseconds.
smlid is the Subnet Manager local identifier.
smdr_path is the directed path for the Subnet Manager.
Options
The following table describes the options to the sminfo command and their purposes:
-d : Sets the debug level. Can be used several times to increase the debug level.
-D : Uses the directed path address. The path is a comma delimited sequence of out ports.
-e : Displays send and receive errors.
-s : Sets the Subnet Manager state: Below are status codes.
0 – Not active.
1 – Discovering.
2 – Standby.
3 – Master.
-p : Sets the priority, (0–13).
-a : Sets the activity count.
-G : Uses the port GUID address.
-h : Provides help.
-V : Displays the version information.
-C : Uses the specified channel adapter name.
-P : Uses the specified channel adapter port.
-t : Overrides the default timeout.
dcsport:
dcsport command maps between switch chip ports and QSFP connectors. This hardware command displays the mapping between I4 switch chip ports,BridgeX chip ports, and QSFP connectors. You can specify either a port or a connector.
The dcsport command is available from the /SYS/Switch_Diag,/SYS/Gateway_Mgmt and /SYS/Fabric_Mgmt Linux shell targets of the Oracle ILOM CLI interface.
Below is syntax for this command.
dcsport [-guid guid|-type DCS-gw -ibdev ibdev] -port port|-connector connector|-printconnectors|-printinternal
where:
guid is the global unique identifier.
ibdev is the InfiniBand device name (Switch, Bridge-0-0, Bridge-0-1,Bridge-1-0, Bridge-1-1)
port is the number of the port (1–36).
connector is the name of the connector (0A – 15A, 0A-ETH, 1A-ETH, 0B – 15B).
Options
The following table describes the options to the dcsport command and their purposes:
Note – If no guid or ibdev are specified, the command defaults to the local I4 switch chip or BridgeX chips, inferred by the port number or connector name.
-guid : Identifies the GUID of the IB device for mapping.
-ibdev : Identifies the name of the IB device for mappping.
-port : Identifies the port to provide the connector mapping.
-connector : Identifies the connector to provide the port mapping.
-printconnectors : Displays mapping for all connectors.
-printinternal : Displays I4 switch chip to BridgeX chip internal mapping.
The following example shows how to display the mapping for connector 0A-ETH with the dcsport command.
# dcsport -connector 0A-ETH
Connector 0A-ETH maps to:
0A-ETH-1 Bridge-0-1 port 0A-ETH-1
0A-ETH-2 Bridge-0-1 port 0A-ETH-2
0A-ETH-3 Bridge-0-0 port 0A-ETH-3
0A-ETH-4 Bridge-0-0 port 0A-ETH-4
listlinkup:
This hardware command lists the presence of links and the up-down state of the associated ports on the switch chip. Please refer to below example which shows how to display link presence and associated ports with the listlinkup command.
# listlinkup
Connector 0A Present <-> I4-A Ports 22 up 21 up 20 up
Connector 1A Not present
Connector 2A Not present
Connector 3A Not present
Connector 4A Not present
.
.
Connector 10B Not present
Connector 11B Not present
Link I4-A 01 <-> I4-E 09 up
Link I4-A 02 <-> I4-F 08 up
Link I4-A 03 <-> I4-F 07 up
.
.
.
Link I4-D 18 <-> I4-E 16 up
#
env_test:
This hardware command performs a series of hardware and environmental tests of the switch. This command is an amalgamation of below commands.
checkpower
checkvoltages
showtemps
getfanspeed
connector
checkboot
The command output provides voltage and temperature values, pass-fail results, and error messages. Below is output of this command for reference. Below command shows how to display the hardware and environmental status of the switch with the env_test command.
# env_test
NM2 Environment test started:
Starting Voltage test:
Voltage ECB OK
Measured 3.3V Main = 3.28 V
Measured 3.3V Standby = 3.40 V
Measured 12V = 12.06 V
Measured 5V = 5.10 V
Measured VBAT = 3.17 V
Measured 1.8V = 1.78 V
Measured 1.2V Standby = 1.21 V
Measured 1.8V Standby = 1.80 V
Measured 2.5VA = 2.51 V
Measured 2.5VB = 2.51 V
Measured 1.2VA = 1.22 V
Measured 1.2VB = 1.22 V
Measured 1.2VC = 1.21 V
Measured 1.2VD = 1.21 V
Measured 1.2VB = 1.21 V
Measured 1.2VE = 1.21 V
Measured 1.2VF = 1.21 V
Voltage test returned OK
Starting PSU test:
PSU 0 present
PSU 1 present
PSU test returned OK
Starting Temperature test:
Back temperature 23.00
Front temperature 32.62
ComEx temperature 26.12
I4-A temperature 55, maxtemperature 56
I4-B temperature 48, maxtemperature 49
I4-C temperature 53, maxtemperature 53
I4-D temperature 48, maxtemperature 49
I4-E temperature 53, maxtemperature 54
I4-F temperature 53, maxtemperature 54
Temperature test returned OK
Starting FAN test:
Fan 0 running at rpm 12433
Fan 1 running at rpm 12311
Fan 2 running at rpm 12311
Fan 3 running at rpm 12433
Fan 4 running at rpm 12433
FAN test returned OK
Starting Connector test:
Connector test returned OK
Starting I4 test:
I4-A OK
I4-B OK
I4-C OK
I4-D OK
I4-E OK
I4-F OK
All I4s OK
I4 test returned OK
NM2 Environment test PASSED
#
Note: All diagnostics logs will be located inside /var/log directory. scp command can be used to copy the logs from nm2 switch to any compute node of Exalogic machine.
Attachments
This solution has no attachment