Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1007568.1
Update Date:2012-07-31
Keywords:

Solution Type  Technical Instruction Sure

Solution  1007568.1 :   Sun Fire[TM] 12K/15K/E20K/E25K: Testing a single slot 0 board when no slot 1 board is present in domain and/or when slot 0 board is COD  


Related Items
  • Sun Fire E25K Server
  •  
  • Sun Fire 12K Server
  •  
  • Sun Fire 15K Server
  •  
  • Sun Fire E20K Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: SF-Exxk
  •  
  • .Old GCS Categories>Sun Microsystems>Servers>High-End Servers
  •  

PreviouslyPublishedAs
210473


Applies to:

Sun Fire 12K Server - Version Not Applicable and later
Sun Fire E20K Server - Version Not Applicable and later
Sun Fire E25K Server - Version Not Applicable and later
Sun Fire 15K Server - Version Not Applicable and later
All Platforms

Goal

Before putting a replacement or new System Board into a production environment, you should run high level hardware HPOST tests against that board and components to confirm its sanity.

To reduce domain downtime while doing these hardware tests, a method for testing this new or replacement hardware is to configure the board in its own "test" domain for extended HPOST testing.

The method below can be used to test a single Slot0 board (System Board) in a "test" domain without a Slot1 (IO or maxcpu) board in the configuration. After successful testing has completed, the new or replacement board can be dynamically reconfigured (DR'ed) into the production domain with confidence that it does not contain bad hardware.

Fix

Testing a slot 0 board in a domain with no slot 1 board can be done. The following steps will guide you through doing this. The example uses the unused domain 'R' and tests system board SB15.

Edit the postrc file for domain R ($SMSETC/config/R/.postrc) to contain the three postrc directives shown below. Make certain the file has world readable permissions (chmod 644) and remove this temporary postrc file when your work is completed.

level 64
no_ioadapt_ok
no_obp_handoff


NOTES:

  • hpost should not be run manually from the command line.
  • Sun strongly encourages post level 64 or higher be run on newly inserted hardware. If new memory is inserted level 96 or higher is advised


In order to perform the test:

% addboard -d R -c assign SB15
% setkeyswitch -d R on

Powering on: CSB at CS1
Already powered on: CSB at CS1
Powering on: CSB at CS0
Already powered on: CSB at CS0
Powering on: EXB at EX15
Already powered on: EXB at EX15
Powering on: CPU at SB15
Already powered on: CPU at SB15
NOTE: There are no Slot 1 system boards assigned to this domain.
Significant contents of .postrc (domain)
/etc/opt/SUNWSMS/SMS1.5/config/R/.postrc:
no_ioadapt_ok
no_obp_handoff
level 64
no_obp_handoff in .postrc. COD CPU license requests will be skipped
to facilitate offline hardware testing.
.
.
stage final_config: Final configuration...
Skipping OBP handoff as requested
Key to resource status value codes:
=Unknown       p=Present       c=Crunched      _=Undefined     m=Missing
i=Misconfig     o=FailedOBP     f=Failed        b=Blacklisted   r=Redlisted
x=NotInDomain   u=G,unconfig    P=Passed        ==G,lockstep    l=NoLicense
e=EmptyCasstt
CPU_Brds:  Proc  Mem P/B: 3/1 3/0  2/1 2/0  1/1 1/0  0/1 0/0
Slot  Gen  3210        /L: 10  10   10  10   10  10   10  10     CDC
SB15:  P   PPPP            PP  PP   PP  PP   PP  PP   PP  PP      P
Exitcode = 36: Non-configuration special hpost mode successful
POST (level=64, verbose=20) execution time 66:48
[5304] Domain failed by hpost: ecode=36
Resetting and deconfiguring: CPU at SB15
Resetting and deconfiguring: EXB at EX15
Powering on: CSB at CS0
Powering on: CSB at CS1
%

 

How do you determine if the tested Systemboard is OK: checkout the listed status value codes at the very end of the POST run. You will see only 1 Systemboard, which is what you expect when testing a single board.

Do not be fooled by the following side-effects of having no_obp_handoff in domain R's .postrc file:

  • your message file will report setkeyswitch to have failed:

Apr 30 08:34:54 2006 s4u-12ka-sc0 setkeyswitch[20826]-R(): [5304 78389629078374 
ERR KeyswitchUtls.cc 1963] Domain failed by hpost: ecode=36
  • regardless the POST test result, showboards will report  Unknown  in the  Test Status  column, this is caused the PCD not being updated with the test result; showboards is querying the PCD, hence the unknown test status:

   Location Pwr Type of Board Board Status Test Status Domain
  -------- --- ------------- ------------ ----------- ------
  SB15     On  CPU           Assigned     Unknown     R
 

Additional information regarding COD boards.

Generally speaking, POST cannot test unlicensed COD processors when invoked by setkeyswitch; this will not allow to perform a complete test of the hardware present on a CPU/Memory Board (or more), if the processors present into this configuration are not COD licensed.
In order to allow setkeyswitch to test (POST) all the processors physically present in a domain configuration even if some of them are COD unlicensed, the .postrc file of the domain must be populated with the no_obp_handoff directive (note that SMS version should be SMS 1.3 and above).

Product

Sun Fire 15K Server
Sun Fire 12K Server
Sun Fire E25K Server
Sun Fire E20K Server

Internal Section

Additional References:

Additional Information regarding COD boards to be tested.


1. This directive is also helpful if control doesn't want to be sent to OBP after finishing POST (domain won't be actually booted)


2. no_obp_handoff (Intended Use = Service/Ops)
Suppresses the normal operation of creating the GDCD and LDCD in SRAMs in the domain, and setting certain PCD attributes, as part of the domain boot process. This command causes an otherwise successful run of POST to exit with POST_EXIT_NOCONFIG instead of a golden SRAM number.
Starting from SMS version 1.3: if no_obp_handoff is invoked, POST will not attempt to acquire COD licenses for processors, and will run even if use of these processors is unlicensed.


3. This is an example of what setkeyswitch does on a domain with unlicensed processor and without the directive described above (level 32):

.....
stage_cpu_lpost(): No NMB Boards in config. Skipping Stage nmb_cpu_lpost.
Acquiring licenses for all good processors...
Proc SB17/P3 deconfigured: no license available. 
Proc SB17/P1 deconfigured: no license available. 
stage wib_lpost: Wildcat interface board tests...
stage_wib_lpost(): No good Wcis; Skipping Stage wib_lpost
stage pci_lpost: Test all L1 I/O boards...
Performing ASIC config with bus config a/d/r = 333...
Slot0 in domain: 20000
Slot1 in domain: 20000
EXBs in use: 1FFFF
pcilpost.elf Version 5.17.0 Build 6.4 I/F 12 is newest supported
stage exp_lpost: Domain-level board and system tests...
explpost.elf Version 5.17.0 Build 6.4 I/F 12 is newest supported
stage cpu_lpost_II: CPU L1 domain/system tests... 
sgcpu.flash file: Version 5.17.0 Build 6.4 I/F 12 is newest supported
Fprom SB17/F0: Version 5.17.0 Build 6.4 I/F 12 is newest supported
Fprom SB17/F1: Version 5.17.0 Build 6.4 I/F 12 is newest supported
stage pci_lpost_Q: Init all L1 I/O boards under -Q...
stage cpu_lpost_II_Q: CPU L1 domain/system init under -Q... 
stage final_config: Final configuration...
Creating CPU SRAM handoff structures...
Creating GDCD IOSRAM handoff structures in Slot IO17...
Writing domain information to PCD
.....


As you can see some of the tests on the CPU SB17/P3 and SB17/P1 have been skipped.


Keywords: HPOST, slot0, slot1, DR, diagnostic, post, test, 12k, 15k, 20k, 25k

Previously Published As 47497


Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback