Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1380443.1
Update Date:2012-07-13
Keywords:

Solution Type  Troubleshooting Sure

Solution  1380443.1 :   Troubleshooting OHMP (Oracle Hardware Management Pack) Issues on Oracle X86 Systems  


Related Items
  • Sun Fire X4470 Server
  •  
  • Sun Fire X4600 M2 Server
  •  
  • Sun Fire X4640 Server
  •  
  • Sun Blade X6220 Server Module
  •  
  • Sun Blade X6440 Server Module
  •  
  • Sun Blade X6275 Server Module
  •  
  • Sun Fire X4170 Server
  •  
  • Sun Fire X4600 Server
  •  
  • Sun Server X3-2
  •  
  • Sun Netra X4200 M2 Server
  •  
  • Sun Fire X4450 Server
  •  
  • Sun Blade X6270 Server Module
  •  
  • Sun Fire X4440 Server
  •  
  • Sun Blade X6450 Server Module
  •  
  • Sun Blade X6240 Server Module
  •  
Related Categories
  • PLA-Support>Sun Systems>x64>Server>SN-x64: MISC-SERVER
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>x64 Servers
  •  




Applies to:

Sun Blade X6240 Server Module - Version Not Applicable to Not Applicable [Release N/A]
Sun Blade X6270 Server Module - Version Not Applicable to Not Applicable [Release N/A]
Sun Blade X6275 Server Module - Version Not Applicable to Not Applicable [Release N/A]
Sun Fire X4170 Server - Version Not Applicable to Not Applicable [Release N/A]
Sun Fire X4440 Server - Version Not Applicable to Not Applicable [Release N/A]
Information in this document applies to any platform.

Purpose

How to troubleshoot basic OHMP (Oracle Hardware Management Pack) issues on Oracle x64 systems?

Troubleshooting Steps

In addition to presenting overview of OHMP, this article will attempt to cover a few tips to troubleshoot basic OHMP issues relating to installation, configuration, general SNMP issues on x64 systems, types of traps that are generated on x64 platforms, why traps might not be received, storage/raid monitoring using OHMP etc.  Given the scope of this topic, we have to limit ourselves to the most commonly encountered issues and questions.


OHMP Documentation:


http://www.oracle.com/technetwork/documentation/sys-mgmt-networking-190072.html

Please check the OHMP user guide included in the OHMP download.

http://www.oracle.com/technetwork/server-storage/servermgmt/tech/hardware-management-pack/support-matrix-2-1-a-423457.html


Architectural Overview:

OHMP includes comprises two components. An SNMP monitoring agent, and a family of cross platform Command Line (CLI) tools (ilomconfig, biosconfig, raidconfig, fwupdate etc) for managing and configuring your Oracle Sun Fire x86 servers.

OHMP provides in-band monitoring to complement the existing out-of-Band monitoring provided by ILOM itself. This is really a question of choice rather than there being a correct approach, some users like out-of-band monitoring over private management LANs direct to ILOM whilst others prefer in-band communication with agent(s) in the host OS. We know that a significant proportion of users do not connect Service Processors to the network. In addition users may already utilize other monitoring agents on the host OS so it may be the preferred point of monitoring.

There are two fundamental components plus the KCS driver that provides a path to ILOM from the host OS:, a daemon hwagentd and the SNMP agent itself.   OHMP also installs a daemon that monitors the storage hardware on the serve and provides the status info through a couple of interfaces: in-band to ILOM, via CLI tools (raidconfig, hwmgmtcli), via SNMP plug-in modules to the Host OS' native SNMP agent. This daemon also monitors ILOM (inband), and presents the status of hardware via the hwmgmtcli and via the SNMP plug-in modules.  Details below.




First the hwagentd daemon's role: As illustrated in the figure above. hwagentd polls ILOM via IPMI over the KCS driver for events that it then communicates to the SNMP agent (and potentially any other future agents for example WS-MAN). In addition it can write events to syslog. The hwagentd polls every 30 seconds and caches the data. In the future hwagentd due to its architecture can also call other libraries/utilities, thus extending both it's own functionality and as well as the functionality of ILOM.

The agent supports a pair of MIBs (Management Information Base) SUN-HW-MONITORING-MIB defines the SNMP GET interface and SUN-HW-TRAP-MIB defines the SNMP Traps (event / alert) generated by the agent for  net-snmp and can both receive and request events from the daemon. The agent does not itself communicate directly with ILOM but rather makes calls via hwagentapi and reads the hwagentd cache. It can then propagate these events to an external management platform or tool or respond to SNMP trap requests.

The Oracle Hardware Management Pack (OHMP) allows ILOM events to be captured by the Host and forwarded through the Host network connection. This eliminates the need to network the Service Processor. The host must be configured and activated for ASR to properly forward ILOM telemetry. The OHMP is only available on certain systems.

Note:

The OHMP for ASR is only available for certain systems using Solaris 10. For more information about specific systems visit the Oracle ASR product page (http://www.oracle.com/asr).
The host must be activated for ASR and trap destinations configured.

Configure the host trap destination to the ASR Manager as described in "Enable FMA Telemetry".

Download and install the latest Oracle Service Hardware Management Pack.

Install the required components of OHMP manually rather than using the OHMP installer. The procedure for manual component installation is included in the Oracle Hardware Management Pack Installation Guide. The packages required to support OHMP telemetry are:

ORCLhmp-libs
ORCLhmp-snmp
ORCLhmp-hwmgmt

How to Manually Install Components on a Solaris Server
:

http://download.oracle.com/docs/cd/E19960-01/html/821-2486/gkiae.html

Installing and Uninstalling Components Manually on a Linux Server
:

http://download.oracle.com/docs/cd/E19960-01/html/821-2486/gkich.html#scrolltoc


Installing and Uninstalling Components Manually on a Windows Server
:

http://download.oracle.com/docs/cd/E19960-01/html/821-2486/gkidm.html#scrolltoc

Installing Drivers Manually
:

http://download.oracle.com/docs/cd/E19960-01/html/821-2486/gkibg.html#scrolltoc



Oracle Hardware Management Agents:

Oracle Hardware Management Agents provide operating-system-specific agents to enable management of your Oracle server.

The Oracle Hardware Management Agents component provides the following software:

Oracle Hardware Management Agent

Oracle Hardware SNMP Plugins

Oracle Hardware Storage Management Agent

Oracle Hardware Storage Access Libraries


Oracle Hardware Management Agent:

The Oracle Hardware Management Agent (Hardware Management Agent) and associated Oracle Hardware SNMP Plugins and Oracle Hardware Storage SNMP Plugins ( SNMP Plugins) provide a way to monitor your server and server module's hardware. With the Hardware Management Agent SNMP Plugins you can use SNMP to monitor the Oracle servers and server modules in your data center, without having to connect the management port of the ILOM service processor to the network. This in-band functionality enables you to use a single IP address (the host's IP) for monitoring you servers and server modules.

The Hardware Management Agent SNMP Plugins run on the host operating system of Oracle servers. The Oracle Hardware SNMP Plugin uses the keyboard controller-style (KCS) interface to communicate with the service processor, and the Oracle Hardware Storage SNMP Plugins use the Oracle Hardware Storage Access Libraries to communicate with the service processor. By regularly polling the service processor, information about the current state of the server is fetched automatically by the Hardware Management Agent. This information is then made available through SNMP, using the SNMP Plugins.

The Hardware Management Agent polls the service processor for hardware information over the KCS interface or Oracle Hardware Storage Access Libraries. The Hardware Management Agent is visible on the network through the SNMP Plugins. The SUN-HW-MONITORING-MIB Net-SNMP plugin communicates over a socket to the Hardware Management Agent daemon service, called hwmgmtd. The Hardware Management Agent also communicates over a socket to the SUN-HW-TRAP-MIB Net-SNMP plugin, sending SNMP traps via the Net-SNMP agent. In addition, the Hardware Management Agent provides sensor and indicator readings, as well as System Event Log records.

The System Event Log (SEL) is stored on the service processor and is used for recording hardware events such as temperatures crossing a threshold. The Hardware Management Agent reads the service processor's SEL records and writes this information to the host operating system's syslog and sends the SUN-HW-TRAP-MIB traps. For storage information, the Hardware Management Agent uses the reads the Oracle Hardware Storage Access Library. Finally, the Hardware Management Agent also maintains a separate log that contains information about the Hardware Management Agent status, which can be used for troubleshooting.

Previous versions of Hardware Management Pack have included a separate Storage Management Agent, but starting with Oracle Hardware Management Pack 2.1, the Storage Management Agent has been merged with the functionality of the Hardware Management Agent.

System storage information is now available via SNMP with the sunStorageMIB.


Oracle Hardware SNMP Plugins:

The Oracle Hardware SNMP Plugins consist of two Net-SNMP plugins. These Net-SNMP plugins are compiled versions of three Oracle-specific hardware Management Information Bases (MIB) that have been designed to enable you to monitor your Oracle servers effectively. The Sun HW Monitoring MIB is a newly developed MIB that provides the following information:

Overall system alarm status

Aggregate alarm status by device type

FRU Alarm status

Lists of sensors, sensor types, sensor readings, and sensor thresholds

Indicator states

System locator control

Inventory including basic manufacturing information

Product and chassis inventory information (such as serial number and part numbers)

Per-sensor alarm status

The HW Trap MIB describes a set of traps for hardware events that can be generated by an Oracle server and provides the following information:

Conditions affecting the environmental state of the server (such as temperature, voltage, and current out-of-range conditions)

Error conditions affecting the hardware components in the server such as FRU insertion and removal and security intrusion notification

The Storage MIB provides the following information about system storage:

Basic manufacturing information, properties, and alarm status for controllers

Properties and alarm status for disks

Properties and alarm status for RAID volumes

Status of logical components


OHMP Troubleshooting:


Version 2.1.1a is the latest version for x64:
http://www.oracle.com/technetwork/server-storage/servermgmt/tech/hardware-management-pack/support-matrix-2-1-a-423457.html

By support, we mean that the Oracle x64 support team would would check to ensure OHMP components are installed in a supported configuration and are installed correctly as per the procedures described in the Installation Guide.  If the issue is with SNMP traps that are not being generated, malformed OIDs, MIBs and such, assistance from the Network and/or OS support teams  would become necessary.

If the issue is suspected to be a bug in the HW agent, the MIBs, snmpwalk doesn't work, etc, the Oracle x64 support team would engage platform/sustaining engineering thru' our normal channels.

Some tips:

1. Check SNMP configuration on the ILOM and host OS (in the case of OHMP) -- trap destination address, SNMP version that is configured.

2. Check if the communities are configured correctly.  An SNMP community is the group that devices and management stations running SNMP belong to. It helps define where information is sent. The community name is used to identify the group. A SNMP device or agent may belong to more than one SNMP community. It will not respond to requests from management stations that do not belong to one of its communities. SNMP default communities are: Write (private) and Read (public).  It's easier debugging SNMP issues when the community is public.  Private communities can result in a failed SNMP GET operation.  An SNMP GET is a message that the network management system initiates when it wants to retrieve some data from a network element. For example, the network management system might query a system every 5 minutes.

The "community strings" in first two versions of the SNMP protocol (SNMPv1 and SNMPv2c) contained "passwords" in clear text.  SNMPv3 support authentication and encryption, however most Oracle x64 platforms only support v1 or v2c.   Third-party devices like the Cisco, Infiniband switches etc may support v3.

3. Check if the user has made any changes to SNMP configuration files.  Net-SNMP applications use similar configuration file structures. Global configuration that affects every SNMP application are usually placed in a snmp.conf file, and application specific configuration in application specific files like snmpd.conf and snmptrapd.conf.   Check for syntax and typographical errors in these files as this could result in traps not being sent, malformed traps etc.

4. Check if the user has a custom MIB or has made modifications to Oracle supported MIBs.  We do not support modifications to any of the MIBs we provide.  Nor do we support MIBs that Oracle does not provide.   Adding a MIB does not mean that our agents will automatically return values from this MIB. The agent needs to be explicitly extended to support the new MIB objects, which typically involves writing new code.  So a customer is running at his own risk with custom MIBs.  Check if any third-party MIBs are involved.  Oracle does not support third-party MIBs unless those MIBs are part of our software distribution.

5. Check if the OHMP agents (hwmgmtd etc) are running on the host OS.  Check if snmptrapd is running on the trap destination.  A trap server that is running a very old version of NET-SNMP can run into issues with displaying, receiving traps.

6. If the customer's issue is about some trap that is not being sent, inquire which one, check in the relevant MIB (more below) to see if that event will result in a trap.  Not all events reported in the ILOM or host OS logs will result in a trap being generated. 

7. Check if basic SNMP commands like snmpwalk are working.

Here's an example of how to snmpwalk on the host OS (Solaris in this case).

Create the following soft links.  Agent port is as per configured in the snmpd.conf file:

# ln -s /opt/sun-ssm/lib/mibs/SUN-HW-MONITORING-MIB.mib /etc/sma/snmp/mibs/SUN-HW-MONITORING-MIB.mib
# ln -s /opt/sun-ssm/lib/mibs/SUN-HW-TRAP-MIB.mib /etc/sma/snmp/mibs/SUN-HW-TRAP-MIB.mib
# ln -s /opt/sun-ssm/lib/mibs/SUN-STORAGE-MIB.mib /etc/sma/snmp/mibs/SUN-STORAGE-MIB.mib
root@nssa1bep1-03 # /usr/sfw/bin/snmpwalk -v2c -c public -mALL localhost:1161 SUN-HW-MONITORING-MIB::sunHwMonProductGroup
SUN-HW-MONITORING-MIB::sunHwMonProductName.0 = STRING: NETRA X6270 M2 SERVER MODULE
SUN-HW-MONITORING-MIB::sunHwMonProductType.0 = INTEGER: blade(4)
SUN-HW-MONITORING-MIB::sunHwMonProductPartNumber.0 = STRING: KDU1400048-1R1A
SUN-HW-MONITORING-MIB::sunHwMonProductSerialNumber.0 = STRING: 1110FMN064
SUN-HW-MONITORING-MIB::sunHwMonProductManufacturer.0 = STRING: ORACLE CORPORATION
SUN-HW-MONITORING-MIB::sunHwMonProductSlotNumber.0 = INTEGER: 1
SUN-HW-MONITORING-MIB::sunHwMonProductUUID.0 = STRING: 080020FFFFFFFFFFFFFF002128BBD200


A few things to be aware of wrt snmpwalk using SunHWMonProductGroup.  Ensure that the customer's platform supports the HW Monitoring MIB/plugin on the host OS.  Oracle x64 platforms do not support this MIB from an ILOM perspective.  This MIB is however supported on the host OS for most x64 platforms shown at:

http://www.oracle.com/technetwork/server-storage/servermgmt/tech/hardware-management-pack/support-matrix-2-1-a-423457.html

(Look under "Supported Servers")....and verify that this plugin is supported for your customer's system if he wants to do anything with this MIB from the host OS side. 

8.  Check the relevant MIB for the trap that the customer is expecting to be see.   If the user is trying to send traps using snmptrapd, check the command that is being used to generate the trap.  Check the snmptrap config and log files.  Any changes to the config files will require a restart of the daemon. 

9. If traps are not being received, try running snmpgets to check if the information for the OID specified can be retrieved.

% snmpget -v 2c -c demopublic test.net-snmp.org SNMPv2-MIB::sysUpTime.0
 SNMPv2-MIB::sysUpTime.0 = Timeticks: (586731977) 67 days, 21:48:39.77


In this example, test.net-snmp.org is the host name of the agent to query, using version 2 of the SNMP protocol, and the community string "demopublic". The OID being requested is sysUpTime.0 from the MIB module SNMPv2-MIB.

The same basic command can also be used to retrieve a single element from within a table:
 

% snmpget -v 2c -c demopublic test.net-snmp.org SNMPv2-MIB::sysORDescr.1
 SNMPv2-MIB::sysORDescr.1 = STRING: The Mib module for SNMPv2 entities


Note you can also refer to MIB objects by name (rather than having to use the numeric OIDs)
it allows the results to be displayed in a more immediately meaningful fashion. Not just giving the object names, but also showing named enumeration values, and interpreting table indexes properly (particularly for string and OID index values).

The following example shows the use of snmpget to retrive current sensor information from the SUN-PLATFORM-MIB on an Oracle Sun Fire X4150:

% snmpget -v3 -a MD5 -l authNoPriv -A [auth] -u [user] -m ALL -O qv 193.164.144.109 \
SUN-PLATFORM-MIB::sunPlatNumericSensorCurrent.113 \
SUN-PLATFORM-MIB::sunPlatNumericSensorCurrent.114 \
SUN-PLATFORM-MIB::sunPlatNumericSensorCurrent.115 \
SUN-PLATFORM-MIB::sunPlatNumericSensorCurrent.116 \
SUN-PLATFORM-MIB::sunPlatNumericSensorCurrent.140 \
SUN-PLATFORM-MIB::sunPlatNumericSensorCurrent.157 \
SUN-PLATFORM-MIB::sunPlatNumericSensorCurrent.171



This will result:

45
43
42
42
38500
39000
22

You can also do the same with the snmpwalk command below:

% snmpwalk -v3 -a MD5 -l authNoPriv -A 12345678 -u


10. If traps are not being sent, try sending traps using snmptrap using the relevant object name or OID.  You will need to know the OID or object name from the relevant MIB.  Running commands like snmptrap will require privileged access on the ILOM or host OS. 


Receiving Traps:

A simple procedure to receive traps on a Solaris host. This setup gives us the text translation when viewing the trap.

 On the trap destination host (Solaris), start snmptrapd.

/usr/sfw/sbin/snmptrapd -m /etc/sma/snmp/mibs/SUN-HW-TRAP-MIB.mib -o /var/tmp/test


 The command would be similar for a Linux host

 Now in ILOM, if you fail a fan, or pull a power cord, when viewing the trap output you will see the trap with the sunHwTrapChassisId.0 = the 10 Character Product SN.

For blades, sunHwTrapChassisId would be the concatenation of the chassis product serial number and the blade's product serial number.


A Brief Look at Traps:

The basic components of a trap are:

Enterprise: Identifies which object sent the trap.
Agent address: Gives the objects address.
Generic Trap Type: Provides generic types of traps.
Specific Trap Code: Provides specific codes for traps.
Time stamp: Gives the time between the last reinitialization and the trap generation.

Enterprise OID:
Everything in the trap's OID from the initial .1 up to the enterprise number, including any subtrees within the enterprise but not the specific trap number. For example, if your enterprise number is 2789, you've further subdivided your enterprise to include a group of traps numbered 5000, and you want to send specific trap 1234, the enterprise OID would be .1.3.6.1.4.1.2789.5000.

If you have some reason to send a generic trap, you can set the enterprise ID to anything you want -- but it's probably best to set the enterprise ID to your own enterprise number, if you have one.  A customer of course, is not allowed to modify any of these in an Oracle supported MIB.

Generic trap number
A number in the range 0-6. The true generic traps have numbers 0-5; if you're sending an enterprise-specific trap, set this number to 6.

Specific trap number
A number indicating the specific trap you want to send. If you're sending a generic trap, this parameter is ignored -- you're probably better off setting it to zero. If you're sending a specific trap, the trap number is up to you. For example, if you send a trap with the OID .1.3.6.1.4.1.2500.3003.0, 3003 is the specific trap number.

The basic syntax for snmptrap is:

$ snmptrap -v 1 [COMMON OPTIONS] [-Ci] destination enterprise-oid agent generic-trap specific-trap uptime [OID TYPE VALUE]
$ snmptrap -v 1 -c public host UCD-TRAP-TEST-MIB::demotraps "" 6 17 "" \
       SNMPv2-MIB::sysLocation.0 s "Just here"


An SNMPv2 or SNMPv3 notification takes the OID of the trap to send:

$ snmptrap -v 2c -c public localhost "" UCD-SNMP-MIB::ucdStart
$ snmptrap -v 2c -c public localhost "" .1.3.6.1.4.1.2021.251.1



(These two are equivalent ways of specifying the same trap). The empty parameter "" will use a suitable default for the relevant value (sysUptime).

The following examples generate the generic trap 'warmStart(1)' and a (dummy) enterprise specific trap '99' respectively:

       snmptrap -v 1 -c public localhost "" "" 1 0  ""
       snmptrap -v 1 -c public localhost "" "" 6 99 ""

The empty parameters "" will use suitable defaults for the relevant values (enterprise OID, address of sender and current sysUptime).

Example 1:

Here's a real world example on an Oracle Sun Fire X4170 M3 system where the snmptrap command was used to generate traps shown below.  These trap IDs are generated on a hot removal/insertion of a power supply on a X4170 M3. 

sunHwTrapFruInserted, sunHwTrapFruRemoved traps are driven by hotplug events and should be generated whenever hard drives, power supplies etc are removed/inserted.

Hot Removal:

SNMPv2-MIB::snmpTrapOID.0 = OID: SUN-HW-TRAP-MIB::sunHwTrapComponentError
iso.3.6.1.4.1.42.2.175.103.2.1.1.0 = ""
iso.3.6.1.4.1.42.2.175.103.2.1.14.0 = "1135FML00K".....
iso.3.6.1.4.1.42.2.175.103.2.1.15.0 = "SUN FIRE X4170 M3"......
iso.3.6.1.4.1.42.2.175.103.2.1.2.0 = "/SYS/PS1/STATE"
iso.3.6.1.4.1.42.2.175.103.2.1.9.0 = "Presence detected"..
iso.3.6.1.4.1.42.2.175.103.2.1.10.0 = iso.3.6.1.2.1.47.1.1.1.1.2.109


SNMPv2-MIB::snmpTrapOID.0 = OID: SUN-HW-TRAP-MIB::sunHwTrapFruRemoved
iso.3.6.1.4.1.42.2.175.103.2.1.1.0 = ""
iso.3.6.1.4.1.42.2.175.103.2.1.14.0 = "1135FML00K".....
iso.3.6.1.4.1.42.2.175.103.2.1.15.0 = "SUN FIRE X4170 M3"......
iso.3.6.1.4.1.42.2.175.103.2.1.2.0 = "/SYS/PS1"
iso.3.6.1.4.1.42.2.175.103.2.1.10.0 = ccitt.0


Hot Insertion:

SNMPv2-MIB::snmpTrapOID.0 = OID: SUN-HW-TRAP-MIB::sunHwTrapFruInserted
iso.3.6.1.4.1.42.2.175.103.2.1.1.0 = ""
iso.3.6.1.4.1.42.2.175.103.2.1.14.0 = "1135FML00K".....
iso.3.6.1.4.1.42.2.175.103.2.1.15.0 = "SUN FIRE X4170 M3"......
iso.3.6.1.4.1.42.2.175.103.2.1.2.0 = "/SYS/PS1"
iso.3.6.1.4.1.42.2.175.103.2.1.10.0 = iso.3.6.1.2.1.47.1.1.1.1.2.131


SNMPv2-MIB::snmpTrapOID.0 = OID: SUN-HW-TRAP-MIB::sunHwTrapComponentOk
iso.3.6.1.4.1.42.2.175.103.2.1.1.0 = ""
iso.3.6.1.4.1.42.2.175.103.2.1.14.0 = "1135FML00K".....   
iso.3.6.1.4.1.42.2.175.103.2.1.15.0 = "SUN FIRE X4170 M3"......     
iso.3.6.1.4.1.42.2.175.103.2.1.2.0 = "/SYS/PS1/STATE"
iso.3.6.1.4.1.42.2.175.103.2.1.9.0 = "Presence detected"..          
iso.3.6.1.4.1.42.2.175.103.2.1.10.0 = iso.3.6.1.2.1.47.1.1.1.1.2.138


You can configure the ILOM (thru' the /SP/alertmgmt/rules) to send out SNMP traps or IPMI PETs.  An IPMI PET is essentially an SNMP trap where the trap structure and format of the interfaces subscribe to the IPMI standard.  An example of an IPMI PET logged in snmptrapd.log is:

snmptrapd.log:

|2009-11-11 08:53:43|UDP: [10.8.147.98]:1060|TRAP, SNMP v1, community public|SUN-ILOM-PET-MIB::petTrapFanLowerNonCriticalGoingHigh|1257947622|Enterprise Specific|SUN-ILOM-PET-MIB::petTrapData = "FF 20 00 08 FF FF FF FF FF FF B2 FF CA 4F 14 00
01 00 66 E5 4F 16 FF FF 20 20 00 40 51 00 00 51
FF FF 00 00 00 00 00 19 2A 00 00 00 37 01 80 00
03 80 0D 03 58 34 35 37 30 20 53 45 52 56 45 52
00 C1 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 "|


Example 2:

A simulated trap from OEL 5/NET-SNMP version: 5.3.2.2:

snmptrap -d -mALL -IR -v2c -cpublic 10.152.36.20 0 sunHwTrapFaultDiagnosed  \
sunHwTrapFaultDiagnosed ''                                                  \
sunHwTrapEventTime.0 s "Mon May 16 16:19:49 2011"                           \
sunHwTrapFaultMessageID.0 s "SPT-8000-6E"                                   \
sunHwTrapFaultUUID.0 s "9074ff09-7f93-688b-afe8-a21a3a629098"               \
sunHwTrapKaUrl.0 s "http://www.sun.com/msg/SPT-8000-6E"                     \
sunHwTrapFaultDescription.0 s ""                                            \
sunHwTrapSeverity.0 i 0                                                     \
sunHwTrapProductManufacturer.0 s "Oracle Corporation"                       \
sunHwTrapProductName.0 s "SPARC T3-2"                                       \
sunHwTrapProductSn.0 s "BDL103255A"                                         \
sunHwTrapDiagEntity.0 i 1                                                   \
sunHwTrapSystemIdentifier.0 s ""                                            \
sunHwTrapHostname.0 s "sca-solana-9-sp"                                     \
sunHwTrapSuspectCnt.0 i 1                                                   \
sunHwTrapSuspectFruFaultCertainty.1 i 100                                   \
sunHwTrapSuspectFruFaultClass.1 s "fault.chassis.env.temp.over-fail"        \
sunHwTrapSuspectFruName.1 s ""                                              \
sunHwTrapSuspectFruLocation.1 s /SYS                                        \
sunHwTrapSuspectFruChassisId.1 s BDL103255A                                 \
sunHwTrapSuspectFruManufacturer.1 s ""                                      \
sunHwTrapSuspectFruPn.1 s ""                                                \
sunHwTrapSuspectFruSn.1 s ""                                                \
sunHwTrapSuspectFruRevision.1 s ""                                          \
sunHwTrapSuspectFruStatus.1 i 3                                             \
sunHwTrapSuspectFruFaultCertainty.2 i 0                                     \
sunHwTrapSuspectFruFaultClass.2 s ""                                        \
sunHwTrapSuspectFruName.2 s ""                                              \
sunHwTrapSuspectFruLocation.2 s ""                                          \
sunHwTrapSuspectFruChassisId.2 s ""                                         \
sunHwTrapSuspectFruManufacturer.2 s ""                                      \
sunHwTrapSuspectFruPn.2 s ""                                                \
sunHwTrapSuspectFruSn.2 s ""                                                \
sunHwTrapSuspectFruRevision.2 s ""                                          \
sunHwTrapSuspectFruStatus.2 i 0                                             \
sunHwTrapSuspectFruFaultCertainty.3 i 0                                     \
sunHwTrapSuspectFruFaultClass.3 s ""                                        \
sunHwTrapSuspectFruName.3 s ""                                              \
sunHwTrapSuspectFruLocation.3 s ""                                          \
sunHwTrapSuspectFruChassisId.3 s ""                                         \
sunHwTrapSuspectFruManufacturer.3 s ""                                      \
sunHwTrapSuspectFruPn.3 s ""                                                \
sunHwTrapSuspectFruSn.3 s ""                                                \
sunHwTrapSuspectFruRevision.3 s ""                                          \
sunHwTrapSuspectFruStatus.3 i 0                                             \
sunHwTrapSuspectFruFaultCertainty.4 i 0                                     \
sunHwTrapSuspectFruFaultClass.4 s ""                                        \
sunHwTrapSuspectFruName.4 s ""                                              \
sunHwTrapSuspectFruLocation.4 s ""                                          \
sunHwTrapSuspectFruChassisId.4 s ""                                         \
sunHwTrapSuspectFruManufacturer.4 s ""                                      \
sunHwTrapSuspectFruPn.4 s ""                                                \
sunHwTrapSuspectFruSn.4 s ""                                                \
sunHwTrapSuspectFruRevision.4 s ""                                          \
sunHwTrapSuspectFruStatus.4 i 0                                             \
sunHwTrapSuspectFruFaultCertainty.5 i 0                                     \
sunHwTrapSuspectFruFaultClass.5 s ""                                        \
sunHwTrapSuspectFruName.5 s ""                                              \
sunHwTrapSuspectFruLocation.5 s ""                                          \
sunHwTrapSuspectFruChassisId.5 s ""                                         \
sunHwTrapSuspectFruManufacturer.5 s ""                                      \
sunHwTrapSuspectFruPn.5 s ""                                                \
sunHwTrapSuspectFruSn.5 s ""                                                \
sunHwTrapSuspectFruRevision.5 s ""                                          \
sunHwTrapSuspectFruStatus.5 i 0                                             \


(-d output attached as "recv_dump")

As received on Ubuntu/NET-SNMP version: 5.4.2.1, with the following command line:

  

snmptrapd -f -t -OT -OX -e -d \
       -F "sysUpTime:%H.%J.%K\ntrap descr/type:%W %w.%q\nv1 only agent-addr:%A\n %V\n\t% %v\n" \
       -Le -mALL


(-d output attached as "recv_dump")

sysUpTime:20.55.6
trap descr/type:Cold Start 0.0
v1 only agent-addr:0.0.0.0
 DISMAN-EXPRESSION-MIB::sysUpTimeInstance = Timeticks: (0) 0:00:00.00
       SNMPv2-MIB::snmpTrapOID.0 = OID: SUN-HW-TRAP-MIB::sunHwTrapFaultDiagnosed
       SUN-HW-TRAP-MIB::sunHwTrapEventTime.0 = STRING: Mon May 16 16:19:49 2011
       SUN-HW-TRAP-MIB::sunHwTrapFaultMessageID.0 = STRING: SPT-8000-6E
       SUN-HW-TRAP-MIB::sunHwTrapFaultUUID.0 = STRING: 9074ff09-7f93-688b-afe8-a21a3a629098
       SUN-HW-TRAP-MIB::sunHwTrapKaUrl.0 = STRING: http://www.sun.com/msg/SPT-8000-6E
       SUN-HW-TRAP-MIB::sunHwTrapFaultDescription.0 = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSeverity.0 = INTEGER: 0
       SUN-HW-TRAP-MIB::sunHwTrapProductManufacturer.0 = STRING: Oracle Corporation
       SUN-HW-TRAP-MIB::sunHwTrapProductName.0 = STRING: SPARC T3-2
       SUN-HW-TRAP-MIB::sunHwTrapProductSn.0 = STRING: BDL103255A
       SUN-HW-TRAP-MIB::sunHwTrapDiagEntity.0 = INTEGER: fdd(1)
       SUN-HW-TRAP-MIB::sunHwTrapSystemIdentifier.0 = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapHostname.0 = STRING: sca-solana-9-sp
       SUN-HW-TRAP-MIB::sunHwTrapSuspectCnt.0 = INTEGER: 1
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruFaultCertainty[1] = INTEGER: 100
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruFaultClass[1] = STRING: fault.chassis.env.temp.over-fail
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruName[1] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruLocation[1] = STRING: /SYS
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruChassisId[1] = STRING: BDL103255A
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruManufacturer[1] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruPn[1] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruSn[1] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruRevision[1] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruStatus[1] = INTEGER: faulted(3)
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruFaultCertainty[2] = INTEGER: 0
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruFaultClass[2] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruName[2] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruLocation[2] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruChassisId[2] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruManufacturer[2] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruPn[2] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruSn[2] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruRevision[2] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruStatus[2] = INTEGER: 0
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruFaultCertainty[3] = INTEGER: 0
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruFaultClass[3] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruName[3] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruLocation[3] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruChassisId[3] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruManufacturer[3] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruPn[3] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruSn[3] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruRevision[3] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruStatus[3] = INTEGER: 0
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruFaultCertainty[4] = INTEGER: 0
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruFaultClass[4] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruName[4] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruLocation[4] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruChassisId[4] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruManufacturer[4] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruPn[4] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruSn[4] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruRevision[4] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruStatus[4] = INTEGER: 0
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruFaultCertainty[5] = INTEGER: 0
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruFaultClass[5] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruName[5] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruLocation[5] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruChassisId[5] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruManufacturer[5] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruPn[5] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruSn[5] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruRevision[5] = STRING:
       SUN-HW-TRAP-MIB::sunHwTrapSuspectFruStatus[5] = INTEGER: 0



Decoding OIDs:

OIDs are crucial in the assembly of SNMP messages. An SNMP OID functions as an address that identifies the location of a specific element within the entire SNMP network. The translation of OIDs allows the SNMP manager to determine values for these objects. The MIB assigns readable labels to each OID, which allows the manager to interpret and assemble SNMP messages. Without the OID, the message cannot be translated into a form that is readable to humans.

When the SNMP manager requests the value of any object, it assembles a message with the OID, which is sent to the MIB for decoding. If the OID is listed within the MIB at that particular management station, a message is sent back to the manager including the value requested for that particular OID.

If an object does not have an OID within a MIB, the SNMP manager cannot interpret it. For example, if an SNMP RTU has a built-in component to monitor battery charge levels, but the battery charge sensor does not have an OID listed in the MIB file, the RTU will be unable to send and receive traps that contain battery-charge-level data.

While each SNMP OID is unique, the first several pieces of each OID are almost always the same. These upper location levels are defined by a series of standard reference within the MIB.

Here's an example of to decode an OID.   OID 42.2.175.103.2.0.61 comes from the SUN-HW-TRAP-MIB. 

sunHwTrapSecurityIntrusion NOTIFICATION-TYPE
    OBJECTS { sunHwTrapSystemIdentifier,
              sunHwTrapChassisId,
              sunHwTrapProductName,
              sunHwTrapAdditionalInfo
    }
    STATUS    current
    DESCRIPTION
        "An intrusion sensor has detected that someone may have physically
         tampered with the system."
--#TYPE       "An intrusion sensor has detected that someone may have physically tampered with the system."
--#SUMMARY    "An intrusion sensor has detected that someone may have physically tampered with the system."
--#ARGUMENTS  {}
--#SEVERITY   MAJOR
    ::= { sunHwTrapPrefix 61 }

The OID 1.3.6.1.4.1.42.2.175 is defined as ilom

sun                 OBJECT IDENTIFIER ::= { enterprises 42 }
products            OBJECT IDENTIFIER ::= { sun 2 }
ilom                OBJECT IDENTIFIER ::= { products 175 }

and

1.3.6.1.4.1.42.2.175.103.2.0 as

sunHwTrapMIB             MODULE-IDENTITY    ::= { ilom 103 }
sunHwTraps                 OBJECT IDENTIFIER  ::= { sunHwTrapMIB 2 }
sunHwTrapPrefix            OBJECT IDENTIFIER  ::= { sunHwTraps 0 }

So,  1.3.6.1.4.1.42.2.175.103.2.0.61 would be sunHWTrapPrefix 61 which translates to the intrusion sensor trap above.

Similarly another OID “1.3.6.1.4.1.42.2.175.103.2.0.44” can be decoded to sunHwTrapHardDriveError defined in the SUN-HW-TRAP-MIB.

What types of traps are in what MIBs:


SUN-ILOM-PET-MIB.mib contains the following trap categories.

--* CHASSIS INTRUSION TRAPS
--* CHASSIS POWER SUPPLY TRAPS
--* CHASSIS TEMPERATURE TRAPS
--* I/O SENSOR TRAPS
--* POWER SUPPLY SENSOR TRAPS
--* ENTITY PRESENCE SENSOR TRAPS
--* FRONT/BACK PANEL LED TRAPS
--* SLOT/CONNECTOR TRAPS
--* TEMPERATURE SENSOR TRAPS
--* CPU/MAINBOARD VOLTAGE SENSOR TRAPS
--* CURRENT SENSOR TRAPS
--* PROCESSOR FAULT TRAPS
--* PROCESSOR DIMM FAULT LED TRAPS
--* FAN FAILURE SENSOR TRAPS
--* FAN SPEED SENSOR TRAPS
--* OEM EVENT TRAPS
--* OTHER SENSOR-SPECIFIC IPMI PLATFORM EVENT TRAPS


The SUN-HW-MONITORING-MIB provides hardware inventory, status,  version and power consumption information related to the Oracle server or blade implementing this MIB. SNMP Traps associated with this server are defined in a separate SUN-HW-TRAP-MIB.  The SUN-HW-MONITORING-MIB can be used to monitor entities like the service LED for the platform.  In the example shown below, we look in the ENTITY MIB to find the index for the specific LED.

OHMP supports the SUN-HW-MONITORING-MIB, it does not support the SUN-PLATFORM-MIB.

The SUN-HW-TRAP-MIB includes trap definitions for various types of faults that are monitored on the system.  These include traps like sunHwTrapComponentFault, sunHwTrapPowerSupplyFault, sunHwTempCritThresholdExceeded, sunHWTempFatalThresholdExceeded, sunHwTrapSecurityIntrusion, sunHwTrapFanSpeedCritThresholdExceeded etc.

Platform Event Trap (PET) event are defined in the SUN-HW-TRAP-MIB.

Please note that just because the MIB defines traps, it doesn't indicate which traps are actually generated on a given platform.

In the case of blades, we don't look for environmental alarms from the blades.  We use the telemetry from the enclosure's CMM (Chassis Server Module).

We do not support snmp sets on ENTITY & SUN-PLATFORM MIBs.


Supported MIBs :

The following MIBs are supported on Oracle x64 Systems.

ENTITY-MIB.txt
SUN-HW-TRAP-MIB.mib
SUN-ILOM-CONTROL-MIB.mib
SUN-ILOM-PET-MIB.mib
SUN-PLATFORM-MIB.txt

For Exadata, please refer to MOS Doc ID 1315086.1

MIBs can be downloaded from MOS (My Oracle Support). 

You can also download the ILOM MIBs directly from the system ILOM.

"Log into the ILOM->Configuration->System Management Access->SNMP" and click on "Download" under the "MIBS" section, the downloaded zip file contains the following files:

  $ unzip -l ilom-mibs.zip
  Archive:  ilom-mibs.zip
    Length     Date   Time    Name
   --------    ----   ----    ----
     146859  07-30-10 01:29   SUN-ILOM-CONTROL-MIB.mib
      52607  07-30-10 01:29   ENTITY-MIB.mib
     109303  07-30-10 01:29   SUN-PLATFORM-MIB.mib
      21218  07-30-10 01:29   SUN-HW-CTRL-MIB.mib
      84314  07-30-10 01:29   SUN-HW-TRAP-MIB.mib

For Exadata/Exalogic Infiniband switches, you can obtain the switch MIB files by logging in to ILOM, click to follow path Configuration->System Management Access->SNMP, then click Download.

OHMP Capabilities:

OHMP can monitor and send out traps for various events.  Please refer to the SUN-HW-TRAP-MIB.mib which includes trap definitions for various types of faults -- fans and power supplies etc.

Traps can be sent to whatever destination that is configured.

OHMP can get sensor and indicator readings that are available on the Service Processor as well as the SEL records from the SP.

OHMP can generate traps for FRU insert/removal.  These traps are generated based on the differences observed in output of 'ipmitool fru print'. If the 'ipmitool frun print' output  (shows FRU when on an hot-plug and does not show the FRU when you simulate hot-unplug),  then we should expect the OS SNMP agent to generate FRU insert/removal traps.  No traps are generated for Device present/absent events.   Device (DIMM for example) present or absent is not a error/event. When a server is booted up we do not send traps for every component that is present or absent.


Example of how to monitor service LED on a X4470 Server using SNMP:


Here's an example of how to use the ENTITY and SUN-PLATFORM mibs to monitor the service LED on a server.

So say a customer wants to monitor the service LED using snmp for the x4470 running ilom v3.X.
Sensors and indicators are monitorable via tables defined in SUN-PLATFORM-MIB. These tables
are indexed by entPhysicalIndex which is defined in ENTITY-MIB.

1) Find the index of the service LED by walking the entPhysicalTable
$ snmpwalk 10.153.55.24 entPhysicalName | grep '/SYS/SERVICE'
ENTITY-MIB::entPhysicalName.203 = STRING: /SYS/SERVICE
$
    entPhysicalIndex of service LED is 203 on 10.153.55.24

2) Monitor service LED via xxxxtable in SUN-PLATFORM-MIB using the index
$ snmpget 10.153.55.24 sunPlatAlarmState.203
SUN-PLATFORM-MIB::sunPlatAlarmState.203 = INTEGER: steady(3)

You can verify that the that Service LED is lit:
-> show /SYS/SERVICE

 /SYS/SERVICE
    Targets:

    Properties:
        type = Indicator
        ipmi_name = SERVICE
        value = On

    Commands:
        cd
        show

You can also use OHMP (which includes the SUN-HW-MONITORING mib) to monitor
service LED.  One just needs OHMP on the Host OS.

Note there are basically two ways to monitor x64 servers using SNMP:

  • via SP/ILOM - supports SUN-PLATFORM-MIB. Does not support SUN-HW-MONITORING-MIB.
  • via Host OS - with HMP installed on the OS. OHMP does not support SUN-PLATFORM-MIB. It does support SUN-HW-MONITORING-MIB.


The SUN-HW-MONITORING-MIB defines the SNMP GET interface and SUN-HW-TRAP-MIB
defines the SNMP Traps (event/alert) generated by the agent.

The SUN-PLATFORM, ENTITY and SUN-HW-TRAP mibs are part of the ILOM MIBs pack.


FAQ

1. Why when I execute the ipmi tool commands (absent/present) I do see the events in the hwmgmtd log but the traps are not sent?


http://download.oracle.com/docs/cd/E19428-01/821-1610/p56.html#scrolltoc

The SNMP traps are categorized into three groups. Any SNMP trap name ending in Ok or Error, as well as any SNMP trap name containing Threshold, is reporting a change in a sensor value.
Any SNMP trap name ending in Fault is reporting a problem detected by the system's fault management subsystem, if such a subsystem is available on the server.

The final group is the status SNMP traps, which report the environmental state and any hardware information that is not covered by the two previous groups.

Notice that it doesn't say that the traps will be generated for absent/present messages.

Removing/reinserting fans results in an error/fault which is why a trap is generated.

The HMA will log both informational and errors, warnings, alarms etc.  See

http://download.oracle.com/docs/cd/E19428-01/821-1610/p36.html#scrolltoc

So we would expect the present/absent messages to be logged in the hwmgmtd log.
However that doesn't mean that everything that gets logged in the hwmgmtd log will result
in a trap being generated. 

2. Would be possible to send traps for the Ethernet link failures using OHMP?

OHMP provides in-band monitoring using the IPMI drivers to get to the ILOM.  The HMA uses the SUN-HW-MONITORING-MIB and the SUN-HW-TRAP mibs.  Any thing that is not defined in those traps is out of scope as far as OHMP is concerned.

In case of component failures (say DIMM failure), normally there would be a trap associated with a DIMM failure, furthermore the service LED would be lit as well.  All of this can be monitored by SNMP of course using the SUN-PLATFORM-MIB on the SP.  The SUN-HW-MONITORING mib cannot monitor the DIMM present/absent status either as there are n entity presence events defined
there.  Entity presence sensors would be in the ENTITY or SUN-PLATFORM mibs.  A customer would have use one of those two MIBs if he really wanted to monitor entity presence events.  Check those MIBs to see if there is a specific trap that is generated for that device (Eg: DIMM) entity presence event.  If there is none, he would see no trap for this event. 

3.  How can I monitor the status of controllers and disks (Storage/RAID monitoring using OHMP):


After you have installed OHMP, you will be able to see the controllers and disks information in the Storage tab in the ILOM web interface. Go there once you've got the hardware management pack installed and you should be able to see the controllers and disks enumerated.

On the system itself, theraidconfigcommand is very useful.

 
raidconfig list all

will give you a device summary, and:

 
raidconfig list disk -c c0 -v


will give a verbose listing of the disks on controller c0. (And. just to remind you, the c0 in c0t6006016021B02C00F22A3EED6CADE011d0s2 doesn't refer to physical controller 0.)

4.  Customer is seeing Unknown Object Identifier

In general the "Unknown Object Identifier" error suggests that the MIB maybe is not properly loaded.   Check if the .so SunHwMonMIB file has an entry in snmp.conf.

5. How do you upgrade the SAS expander firmware using OHMP with ESX running on x6240s
with Hydra (VMF NEM) NEMs in the chassis? 


You can use the normal methods to upgrade the NEM/expander firmware thru' the NEM ILOM, CMM, or using CAM.

To upgrade using OHMP, fwupdate is required. FWupdate is supported for VMWare ESX 3.5 (for older platforms) and 4.0 (for new platforms: X4800, X4470, X4170 M2, X6270 M2, X4270 M2)  with HMP 2.1.   You would install HMP on the ESX console just as on any other host OS.
Refer to http://www.sun.com/systemmanagement/managementpack_supportmatrix.jsp

6. Can I monitor chassis alarms (e.g. fan faulted/power supply fault) from the Blades OS's snmp agent after having installed OHMP on the blades?

Yes, OHMP can monitor and send out traps for these events.  Have a look at the SUN-HW-TRAP-MIB.mib which includes trap definitions for fan and power supply faults.  Traps will be sent to whatever trap destination that is configured.

7. Can OHMP be used for memory/cpu usage monitoring/threshold alarms as well?

CPU/Memory usage is best done by tools (such as sar, cpustat, mpstat, vmstat etc) that the host
OS offers.  OHMP doesn't help in end-user level monitoring of CPU/Memory usage.  CPU/MEM faults, temperature, current, voltage thresholds are monitored by the SP and in some cases by the host and are available to the host thru' OHMP.  OHMP can get sensor and indicator readings that are available on the SP as well as the SEL records from the SP.

8. If a disk that was part of an LSI RAID setup is hot removed, would it result in a trap being sent out?

If disk drive is part of a RAID setup, then the LSI controller would act as a passive pass through. With that, any non raided disks should respond to HDD traps defined in the "--* I/O SENSOR TRAPS" section of the SUN-ILOM-PET-MIB.mib file.

Disk drives that are part of the RAID configuration are managed by the LSI RAID controller.  You will have to look for specific trap types if any that are defined in the MIBs (if any) for the specific controller for the event that you are interested in.

9. Why doesn't the trap clearly identify the FRU part number or serial number?


It is likely that the system is running an older version of the ILOM.

The sunHWTrapFaultDiagnosed notification is a new trap type implemented in recent versions of the ILOM that supports platform and FRU identification from ILOM fdd telemetry.  This new notification also has support for multi-suspect lists -- i.e the ability to identify multiple suspect FRUs in a fault that could not be diagnosed to a single component.   This trap was was backported to Xxx7x platforms in later firmware versions.  Newer Xxx7x platforms were released with support for the new trap at product release, these include the X4370 M2 and X4800 M2:

SUN-HW-TRAP-MIB contains Product objects such as sunHwTrapProductManufacturer, sunHwTrapProductName, sunHwTrapProductSn, sunHwTrapProductPn that support the sunHwTrapFaultDiagnosed Notification.

SUN-HW-TRAP-MIB also contains the sunHwTrapSuspectFruChassisId (chassis serial number) in addition to other sunHwTrapSuspectFruChassis properties mentioned below. that are part of the sunHwTrapFaultDiagnosed Notification.
 - Problem status: sunHwTrapStatus
 - Top level product manufacturer: sunHwTrapTopProdMfg
 - Top level product model name: sunHwTrapTopProdName
 - Top level product part number: sunHwTrapTopProdPartNumber
 - Top level product serial number: sunHwTrapTopProdSerialNumber
 - Product Part number: sunHwTrapProductPn
 - Chassis manufacturer: sunHwTrapSuspectFruChassisMfg
 - Chassis part number: sunHwTrapSuspectFruChassisPn
 - Chassis model name: sunHwTrapSuspectFruChassisName


To have the trap clearly identify the chassis, system, and FRU, it is highly recommended that the platform run the most recent version of the ILOM.

10. How does a customer monitor disks on his X6270 which has an LSI controller thru' SNMP from a Windows system?


Refer to:

Sun LSI 106x RAID Users Guide
http://download.oracle.com/docs/cd/E19658-01/820-4933-15/index.html

Installing LSI SNMP on a Remote Station
http://download.oracle.com/docs/cd/E19658-01/820-4933-15/lsi_snmp.html#50487332_pgfId-1035648

The customer will have enable SNMP services on Windows.  Note getting basic SNMP services
configured and enabled has nothing to do with LSI MegaRAID.  The README in the LSI SW bits and the MegaRAID user guide list the steps necessary to accomplish this.  Refer to Microsoft documentation on how to enable and configure SNMP for the specific flavor of Windows.

Once SNMP services have been configured, you have to install an LSI SNMP agent (SAS-IR_SNMP_Win_Installer-3.xx-xxxx.zip)  on Windows.  This agent will report information about the RAID controller, virtual drives, physical devices, enclosures, and  other items
using SNMP.   Refer to the MegaRAID SAS Software User's Guide which has all the instructions necessary for you to get the SNMP setup going.  Any issues with the LSI provided agent or the LSI provided MIBs will have to be addressed with LSI.


Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback