Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1008702.1
Update Date:2012-07-30
Keywords:

Solution Type  Technical Instruction Sure

Solution  1008702.1 :   Console Logging Options to capture Fatal Reset output for Sun systems  


Related Items
  • Sun Enterprise 4500 Server
  •  
  • Sun Fire 4810 Server
  •  
  • Sun Enterprise 5500 Server
  •  
  • Sun Fire 280R Server
  •  
  • Sun Fire 3800 Server
  •  
  • Sun Enterprise 3500 Server
  •  
  • Sun Netra 1290 Server
  •  
  • Sun Fire V480 Server
  •  
  • Sun Fire V490 Server
  •  
  • Sun Fire V880z Visualization Server
  •  
  • Sun Enterprise 6500 Server
  •  
  • Sun Fire V880 Server
  •  
  • Sun Fire 6800 Server
  •  
  • Sun Fire E6900 Server
  •  
  • Sun Fire 4800 Server
  •  
  • Sun Fire E2900 Server
  •  
  • Sun Fire V1280 Server
  •  
  • Sun Fire V890 Server
  •  
  • Sun Fire E4900 Server
  •  
  • Sun Netra 1280 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: Exx00
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>Midrange Servers
  •  

PreviouslyPublishedAs
211946


Applies to:

Sun Enterprise 3500 Server
Sun Enterprise 4500 Server
Sun Enterprise 5500 Server
Sun Enterprise 6500 Server
Sun Fire 280R Server - Version: Not Applicable and later    [Release: N/A and later]
All Platforms

Goal

Purpose of logging.
The purpose of console logging is to capture console messages, which are used to improve the quality and timeliness of problem diagnosis. By default, Fatal Reset details and POST output after a Fatal Reset are directed to serial port A (ttya). Some families of systems incorporate a system controller. A system controller(SC) is a integrated system board running its own operating system that is designed to act as a console and in some systems capture and store console activity. See Below for Implementation by Platform.

In many system interrupts, console data is the only output available. This is because in some failure modes, Solaris [TM] Operating Environment has already terminated and there is no software running in the system that is capable of logging messages to traditional file system locations. For this reason, capturing diagnostic/failure data via serial console logging provides additional diagnostic information and reduces the number of "unexplained system reboots".

Fatal error fatal resets bring a system down extremely fast. Additional components to the failing item often detect the error, but the speed of crash often leaves these "error artifacts"  in component registers. The PROM can subsequently interpret these artifacts by indicating the wrong component as the cause for the reset and may offline a good component as a result. Serial  console logging allows analysis of the Fatal Reset ouput to help ensure that the actual defective FRU is replaced, and not a good component incorrectly reported as failed.

The last section of this document outlines other aids in console logging. Note that there may be other software and hardware vendors with equivalent products. The functionality of these other  products is likely similar to what is discussed below. When selecting a console server, ensure that it supports port buffering and that the buffer size is at least 200K per port. The larger  the buffer size, the less likely that important data is overwritten.


Solution

Implementation by Platform:
Sun SPARC Enterprise Server E3000/E3500/E4000/E4500/E5000/E5500/E6000/E6500
Sun Fire[TM] E2900, 3800, 4800, 4810, E4900, 6800, E6900, V1280 and Netra[TM] 1280, 1290
Sun Fire[TM] 280R, V480/V490 and V880/V890
Generic Tools and Console Logging Information


Sun SPARC Enterprise Server E3000/E3500/E4000/E4500/E5000/E5500/E6000/E6500
This family of servers is the most basic. There is only a simple system serial port /dev/console or ttya. There is no system controller to capture reset data or post data. Any system with nothing attached to the system serial port will lose valuable troubleshooting data. A simple dumb terminal with a printer attached and running can be useful in collecting data, but an electronic means is more useful in sending in for analysis in the event of a system fault. It is strongly advised that console monitoring be accomplished by either a tip(1) session or a logging terminal server es explained below.


Sun Fire[TM] E2900, 3800, 4800, 4810, E4900, 6800, E6900, V1280 and Netra[TM] 1280, 1290
This family of servers incorporates a system controller to attach to the console of the Solaris Operating system. The 3800/4800/4810/E4900/E6900 allow for several domains, so the SC serves as console for up to4 domains. The E2900/V1280 and the Netra[TM] servers only support a single domain. Memory on this system controller is limited, and it does not automatically store and capture console data. It will however capture reset data automatically after an XIR or Red state reset. There are several options depending on server type to collect the data.

All Servers in this family allow reset data to be collected from the SC using the showresetstate command. They generally store the last 2 resets, if there have been resets since the system was last powered on. See the Command Reference Guide for your specific system for details on showresetstate.

Sun Fire[TM] 3800, 4800, 4810, E4900, 6800, E6900 allow the SC to automatically be forwarded reset data. This is accomplished by setting up a reset loghost with the setupdomain command. The parameters used are log-reset-data, verbose-reset-data and reset-data-ftp-url. See the Command Reference Manual for the version of System Controller software you are running.

In addition to the fault reset data collection, a hung domain if properly configured will be reset by the SC, and the data from this XIR will be automatically sent to the loghost for the domain if that is configured.Only the 3800, 4800, 4810, E4900, 6800, E6900 support a domain loghost. Again, refer to the Command Reference Manual for the version of System Controller software you are running. See here for additional information about collecting data from a Hung system:

<Document 1001778.1> Instructions on How to Gather Data from a Hung Domain on a Sun Fire[TM] 3800, 4800/4810, 6800, E2900, E4900, E6900, V1280 or Netra[TM] 1280, 1290 server

For setting up Persistent Console Logging:
<Document 1011212.1> Sun Fire[TM] Midrange Servers: How to setup Persistent Console logging?

There are several options on this platform to set up Persistent console logging. Refer to the section on conserver in the Server Best Practices Guide:

<Document 1297294.1> Sun Fire Midframe & Entry-Level Servers Best Practices

And this Document is also useful:
<Document 1011212.1> Sun Fire[TM] Midrange Servers: How to setup Persistent Console logging?



Sun Fire[TM] 280R, V480/V490 and V880/V890
This family of servers incorporates a system controller to attache to the console. Some documentation refers to the SC as the LOM, or RSC. These servers also have a serail port for the Solaris Instance to use. The Serail port is the default console device. Using the SC/LOM/RSC is optional, but highly recommended. The SC allows a administrator to connect to the SC using telnet if the SC is configured on the network. These SCs provide some limited storage for console errors, reboot information and reset collection. The SC also provides monitoring for various environmentals in addition to allowing remote console access. For additional information on capturing error data see:

<Document 1012454.1> How to capture errors via RSC


For information on configuring the SC on this platform:

<Document 1011888.1> How to set up and disable the RSC console on Sun Fire[TM] 280R, V480, V490, V880, V890 and V880z servers.


Generic Tools and Console Logging Information


***Console Logging Options - Data Logging Console Servers
A replacement for traditional terminal servers, which do not have console logging capability, is a console server device from Lantronix. The Lantronix console server is the equivalent to a traditional network based terminal server, except that the Lantronix device has memory added which is used as a "wrap around" message buffer.

As console messages are output from a SUN system, they are stored in this memory. As the memory fills up, the oldest messages are overwritten. One can connect to this console server via the network, and then display the contents of the memory buffer for a specific system in order to retrieve the stored console messages.

More information can be found at: http://www.lantronix.com/products/cs/index.html


***Console Logging Options - Centralized Console Control
A centralized console control solution is available from SIE Computing Solutions.

This is a solution that allows a single Sun workstation to serve as a console access and logging point. Hardware is installed in the Sun system which supports multiple serial ports and system consoles that are controlled and monitored via these ports. The workstation can both grant console access as well as log all console activity on its local disk for review at anytime.

More information can be found at:

http://sie-cs.com/products/details.php?Product=114

***Console Logging Options - Tip line to ttya

This may be one of the least expensive console logging options, but can create challenges when attempting to monitor multiple systems. The system that is performing the monitoring function must be up and operational, or logging of the other system's console is lost.

To enable this console logging mode, take a null serial cable (see below) and connect one end to the monitored system's ttya port, then connect the other end of the cable to any serial port on the monitoring system.

Once the cable is connected, a user on this monitoring system can issue the tip command and be connected to the other system's console. Note that prior to issuing the tip command, the user must enable some form of logging, for instance. using the log to file option of an xterm session, etc.

Using TIP
Have the system console of the monitored system redirected to another system.

The basic steps:
Hook a null modem cable between serial port A of the monitored machine and one of the serial ports of the healthy machine. The port (a or b) on the healthy machine depends on the hardwire entry in the /etc/remote file on the healthy system.

Here is the hardwire entry /etc/remote that uses port b on the healthy machine.

hardwire: :dv=/dev/term/b:br#9600:el=^C^S^Q^U^D:ie=%$:oe=^D:

A null modem cable in its most basic form is a rs232 serial cable with minimal pin connections as follows:

2 ------ 3
3 ------ 2
7 ------ 7

A standard serial cable with a null modem adapter from an electronics store will work as well.

There should be an entry for hardwire already in /etc/remote. It comes with the default OS. If one is not there, you can always copy it from another Solaris system.

Now open a command-tool on the healthy system. Sometimes tip behaves better with a shell-tool, but you lose scrolling (this window will be your buffer).

Type in: tip hardwire

You should see a connected message in this command-tool window.

NOTE: you will get the connected message regardless of the presence of the serial cable. Connected just means your tip session is talking to the serial port, not to another system.

***Serial console logging using RMC

RMC - Remote Management Control
Sun products use several RMC cards. These include:
ALOM Advanced Lights Out Manager
RSC Remote System Control

These cards allow serial as well as network access to the console ports. These cards have buffers built into them so that some console output may be captured. For information on pinouts and additional information on how to use these cards see Document: 1005844.1 .

***Serial console logging using non-Sun system
Serial console logging can also be done using a laptop or other PC type system running a terminal emulator program. The cabling requirements are identical as for a tip session (see "Using Tip"). Serial parameters are 9600 8n1; i.e. 9600 baud, 8 data bits, no parity bit and 1 stop bit. Set the term program to emulate a VT100 or similar terminal. Logging to disk parameters are set within the emulator program, usually referred to as either session logging or session capture. For systems running Win OS, a program named Tera Term is available that works with fewer problems than Hyperterm.

Recommended NVRAM settings :

Bring system to OBP level from command line using "shutdown" or "init 0" commands (either will run all RC shutdown scripts), sync file systems and then drop system to OK prompt. DO NOT use a stop+A key press. The following commands can be executed from the OK prompt or from the command line using the "eeprom <variable=parameter>" command.



at OK prompt

# eeprom

Description

setenv diag-level max

diag-level=max

system will run extended POST

printenv boot-device

boot-device

determine what your boot device is....

setenv diag-device

diag-device=

prevent attempting net boot w diags on

setenv error-reset-recovery sync

error-reset-recovery=sync

force sync reboot if system drops to OK

setenv diag-switch  true

diag-switch =true


reset-all

reboot or init 6

system has to reset for changes to take affect




 Internal Comments

Also see web based decoder for diagnosis http://panacea.uk.oracle.com/twiki/bin/view/Tools/ToolDecodeFatalResetDecoder for a
  tip, console, logging, capture, null, modem, alom, rsc, rmc, serial port


References

<NOTE:1004222.1> - How to setup console logging and obtain diagnostic data from different types of SPARC servers

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback