Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1453917.1
Update Date:2012-05-02
Keywords:

Solution Type  Problem Resolution Sure

Solution  1453917.1 :   On lw8 systems (V1280, E2900, Netra 1280, Netra 2900) lom>console command may return that Solaris is not active when Solaris is actually running  


Related Items
  • Sun Netra 1290 Server
  •  
  • Sun Fire E2900 Server
  •  
  • Sun Fire V1280 Server
  •  
  • Sun Netra 1280 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: SF-x8x0/Ex900
  •  


lom>console command returns Solaris is not active when Solaris is active
Messages like these are produced:
lom: [ID 336549 local0.error] sun.serengeti.TunnelSwitchFailedException: Tunnel Switch Failed: Invalid IO Board Selected
picld[2051]: [ID 383862 daemon.error] led ioctl failed: Connection timed out
sgsbbc: [ID 394589 kern.info] NOTICE: Solaris failed to send a message (0x6/0x6005) to the System Controller. Error: 145, Message Status: 145

In this Document
Symptoms
Cause
Solution


Created from <SR 3-5597807418>

Applies to:

Sun Fire E2900 Server - Version Not Applicable and later
Sun Netra 1290 Server - Version Not Applicable and later
Sun Fire V1280 Server - Version Not Applicable and later
Sun Netra 1280 Server - Version Not Applicable and later
Information in this document applies to any platform.

Symptoms

On lw8 systems (V1280, E2900, Netra 1280, Netra 2900), lom>console command may return that Solaris is not active when Solaris is active:

lom>console
Solaris is not active

From showlogs on the lom, we see:
lom: [ID 336549 local0.error] sun.serengeti.TunnelSwitchFailedException: Tunnel Switch Failed: Invalid IO Board Selected

From showsc -v on the lom, we see:
Solaris Host Status: Powered Off

We know instead that Solaris is up and running on the server.

From Solaris /var/adm/messages, we see:
picld[2051]: [ID 383862 daemon.error] led ioctl failed: Connection timed out
sgsbbc: [ID 394589 kern.info] NOTICE: Solaris failed to send a message (0x6/0x6005) to the System Controller. Error: 145, Message Status: 145

Additionally, on the Solaris side we are missing these messages at boot time:
lw8: [ID 190882 kern.notice] Unretrieved lom log history follows ...
lw8: [ID 653806 kern.notice] Main, up 9 days 12:37:27, Memory 6,574,848 

Cause

There exists a mailbox between Solaris and ScApp (the lom) through which we tunnel lom messages, error information, environmental information, and console traffic.  This is implemented as an SRAM on the IB portion of the lw8 (V1280, E2900, Netra 1280, Netra 2900) IB_SSC.  The SRAM hangs off of the Serengeti Boot Bus Controller (SBBC) which appears as a PCI device to Solaris and as a an internal command bus device to the System Controller, which is the SSC portion of the IB_SSC which runs ScApp, which implements the lw8 lom.

The Tunnel Switch is something we can try on other Serengeti class servers (3800-E6900) because they have multiple I/O boats.  If our mailbox dies, or if the I/O Boat currently used for the SRAM tunnel is Dynamically Reconfigured (DR'ed) out, we attempt to switch to the same SBBC/SRAM hardware on one of the remaining I/O Boats.  On lw8, this will always fail, because we have only IB6.
 

Solution

First, check to make sure that the Solaris instance we are communicating with is in fact running on THIS server.

From the lom, issue showsc -v to obtain the Chassis HostID
Chassis HostID: 845a3402

From Solaris, issue the hostid command which should match the lom Chassis HostID
$ /usr/bin/hostid
845a3402

Next, try rebooting the lom and restarting the Solaris picl daemon.

Reboot the lom with
lom>resetsc

Wait three minutes...then restart the picl daemon.

For Solaris 8 and 9
# /etc/init.d/picld stop
# /etc/init.d/picld start

For Solaris 10
# /usr/sbin/svcadm restart picl

Wait one minute...

Then try the console command again
lom>console

And look at /var/adm/messages again
# egrep 'lw8|picld|sgsbbc' /var/adm/messages

If the problem was not resolved by resetting each side of the mailbox (resetsc and  restart picld), then A SR needs to be opened for further investigation.

Internal Section:

If previous AP did not fix the issue, we probably need to replace the IB_SSC.

Look at SR 3-5597807418 for futher details.

 


Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback