Asset ID: |
1-71-1017491.1 |
Update Date: | 2010-07-16 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1017491.1
:
Sun Fire[TM] 3800/48x0/6800/E4900/E6900/E2900/v1280 or Netra[TM] 1280/1290 server: showcomponent indicates something is disabled
Related Items |
- Sun Fire E6900 Server
- Sun Fire 3800 Server
- Sun Fire 6800 Server
- Sun Fire E4900 Server
- Sun Netra 1280 Server
- Sun Fire 4800 Server
- Sun Fire V1280 Server
- Sun Fire E2900 Server
- Sun Fire 4810 Server
|
Related Categories |
- GCS>Sun Microsystems>Servers>Midrange Servers
- GCS>Sun Microsystems>Servers>Entry-Level Servers
|
PreviouslyPublishedAs
228613
Applies to:
Sun Fire 3800 Server
Sun Fire V1280 Server
Sun Netra 1280 Server
Sun Fire 4800 Server
Sun Fire 4810 Server
All Platforms
Goal
This document applies to the Sun Fire 3800, 4800, 4810, 6800, E4900, E6900, E2900, v1280 and Netra 1280, and 1290 server.
- SC refers to the System Controller (for which some systems have only a single SC while others have dual SCs).
- When commands are to be executed on the "SC", it means from the "SC>" or "lom>" prompt on the Main SC for the server in question.
There are three reasons a component might show as disabled in the output of showcomponent.
-
Someone has manually disabled the component
-
No COD Capacity on Demand license exists
-
The component has been disabled as a result of a fault
Solution
In this example (from a v1280, E2900, n1280, or n1290 system) SB4 has been disabled using disablecomponent SB4
lom>showcomponent SB4
Component Status Pending POST Description
--------- ------ ------- ---- -----------
/N0/SB4/P0 disabled - untest UltraSPARC-IV, 1200MHz, 16M ECache
/N0/SB4/P1 disabled - untest UltraSPARC-IV, 1200MHz, 16M ECache
/N0/SB4/P2 disabled - untest UltraSPARC-IV, 1200MHz, 16M ECache
/N0/SB4/P3 disabled - untest UltraSPARC-IV, 1200MHz, 16M ECache
/N0/SB4/P0/B0/L0 disabled - untest 2048M DRAM
/N0/SB4/P0/B0/L2 disabled - untest 2048M DRAM
/N0/SB4/P1/B0/L0 disabled - untest 2048M DRAM
/N0/SB4/P1/B0/L2 disabled - untest 2048M DRAM
/N0/SB4/P2/B0/L0 disabled - untest 2048M DRAM
/N0/SB4/P2/B0/L2 disabled - untest 2048M DRAM
/N0/SB4/P3/B0/L0 disabled - untest 2048M DRAM
/N0/SB4/P3/B0/L2 disabled - untest 2048M DRAM
First, confirm if there is a reason the component cannot be re-enabled, for example perhaps an application license has forced you to have to limit the number of CPUs (and this has been forgotten). If no issue with re-enabling the component(s) use the following command (from the SC or lom prompt):
lom>enablecomponent sb4
In this example SB0 contains no licensed COD CPUs.
lom>showcomponent SB0
Component Status Pending POST Description
--------- ------ ------- ---- -----------
/N0/SB0/P0 Cod-dis - untest UltraSPARC-IV, 1350MHz, 16M ECache
/N0/SB0/P1 Cod-dis - untest UltraSPARC-IV, 1350MHz, 16M ECache
/N0/SB0/P2 Cod-dis - untest UltraSPARC-IV, 1350MHz, 16M ECache
/N0/SB0/P3 Cod-dis - untest UltraSPARC-IV, 1350MHz, 16M ECache
/N0/SB0/P0/B0/L0 Cod-dis - untest 1024M DRAM
/N0/SB0/P0/B0/L2 Cod-dis - untest 1024M DRAM
/N0/SB0/P1/B0/L0 Cod-dis - untest 1024M DRAM
/N0/SB0/P1/B0/L2 Cod-dis - untest 1024M DRAM
/N0/SB0/P2/B0/L0 Cod-dis - untest 1024M DRAM
/N0/SB0/P2/B0/L2 Cod-dis - untest 1024M DRAM
/N0/SB0/P3/B0/L0 Cod-dis - untest 1024M DRAM
/N0/SB0/P3/B0/L2 Cod-dis - untest 1024M DRAM
To enable the CPUs contact the licensing center.
See 1007945.1 for further information relating to Capacity on Demand.
If the word chs (Component Health Status) appears in the POST column of showcomponent output, this indicates that a component has been marked as faulty, by the automated diagnostic engine built into Solaris and/or ScApp (System Controller Application). The component(s) will not be available to the system. This chs status remains with the component until it is serviced (replaced, or the status reset).
If a parent FRU has a chs value in the POST column, all its child FRUs will also have a chs value (either Suspect or Faulty). This does not mean that there is a specific issue with the Child or the Parent FRU(s) just from the status output alone. Errors must be investigated.
- The result of this is that if something is to be disabled, the architectural relationship between Parent and Child will group all of these components together in the action that the system software eventually takes (ie disable all of them, or not).
- The most common Parent to Child FRU relationship is CPU (parent) and DIMMs (children).
- Note: DIMM failures during POST can result in both the DIMM and CPU displaying chs in the POST column as both can be marked faulty.
In the latest release of ScApp, 5.20.x 114527-xx, you can view chs status of components using the showchs command. In ScApp 5.20.15 or higher, you can also reset this status if needed (details later).
If an error is seen by ScApp, the diagnosis engine will analyze the event, produce it's advice and the showchs command will only indicate faulty or suspect against the component. The date when the status changed is not shown (you can view the full error details using the SC command showlogs -v).
- A faulty component will be disabled at the next reboot (note that most faults which will result in a faulty component automatically result in a recovery reboot, so this component may already be out of the configuration when you go to view it's status).
- A suspect component will be available in the domain and the status is an indication that it may need it's errors investigated (for example the errors are correctable in nature, or there are many different FRU suspects).
It is quite common for systems to contain components in a suspect state. In many cases the suspect status is very old and can simply be cleared. Contact Support Services if you find components which are marked suspect or faulty and have the events diagnosed.
sun_fire-sc1:SC> showcomponent sb2
Component Status Pending POST Description
--------- ------ ------- ---- -----------
/N0/SB2/P0 enabled - pass UltraSPARC-III+, 900MHz, 8M ECache
/N0/SB2/P1 enabled - pass UltraSPARC-III+, 900MHz, 8M ECache
/N0/SB2/P2 disabled - chs UltraSPARC-III+, 900MHz, 8M ECache
/N0/SB2/P3 enabled - pass UltraSPARC-III+, 900MHz, 8M ECache
/N0/SB2/P0/B0/L0 enabled - pass 512M DRAM
/N0/SB2/P0/B0/L2 enabled - pass 512M DRAM
/N0/SB2/P0/B1/L1 enabled - pass 512M DRAM
/N0/SB2/P0/B1/L3 enabled - pass 512M DRAM
/N0/SB2/P1/B0/L0 enabled - pass 512M DRAM
/N0/SB2/P1/B0/L2 enabled - pass 512M DRAM
/N0/SB2/P1/B1/L1 enabled - pass 512M DRAM
/N0/SB2/P1/B1/L3 enabled - pass 512M DRAM
/N0/SB2/P2/B0/L0 disabled - chs 512M DRAM
/N0/SB2/P2/B0/L2 disabled - chs 512M DRAM
/N0/SB2/P2/B1/L1 disabled - chs 512M DRAM
/N0/SB2/P2/B1/L3 disabled - chs 512M DRAM
/N0/SB2/P3/B0/L0 enabled - pass 512M DRAM
/N0/SB2/P3/B0/L2 enabled - pass 512M DRAM
/N0/SB2/P3/B1/L1 enabled - pass 512M DRAM
/N0/SB2/P3/B1/L3 enabled - pass 512M DRAM
sun_fire-sc1:SC> showchs
Component Status
--------------- --------
/N0/SB2/P2 Faulty
To service this component, contact your Support Services provider. Should the Support Services engineer direct you to reset the CHS status of a particular component(s), utilize 1004879.1 to perform this procedure.
Note: See 1009358.1 if showcomponent indicates a Pending "disabled" status for a device.
The information required to troubleshoot the fault depends on the platform.
- For v1280, E2900, n1280, n1290 collect an explorer (see 1019066.1) with the following options:
- /opt/SUNWexplo/bin/explorer -w default,1280extended
- For 3800, 4810, 4800, E4900, 6800, and E6900s collect an explorer (see 1019066.1) with the following options as well as the loghost data (see 1008676.1):
- /opt/SUNWexplo/bin/explorer -w default,scextended,fru
- For Sun Fire 12K, 15K, E20K, or E25K platforms an explorer from the Main System Controller would be required.
- No additional options to explorer are required.
Note: Explorer version 5.6 and above will take a lot less time to capture the required data due to bugs fixed in the scextended and sf15k modules (See 1002383.1).
Sun Shared Shell
If you require assistance in
collecting
the data recommended in this article or require help in diagnosing a system issue, there is a collaborative service tool called Sun Shared Shell which allows Sun Service engineers to remotely view and diagnose customer's systems. Consider using this option to reduce the problem resolution time.
Internal Comments
Investigating CHS Status from prtfru -x outputs
The chs status of a component is recorded to the component FRUID
and is captured in the
prtfru -x output.
To reconstruct the
showchs output from a
prtfru_-x.out you may
utilize the internal tool,
showfru.
################################################################################
Latest version 1.16 /net/cores.uk/export/hotline/hotlocal/bin/showfru
Report bugs, RFEs or if you have questions email [email protected]
Further info from http://panacea/twiki/bin/view/Tools/ToolPageShowfru
################################################################################
... Removing all the FRU information ...
################################################################################
CHS History of currently disabled Components, use -v to see full history
################################################################################
Component : N0.SB2.P2
Time Stamp : Mon Jul 20 19:16:53 YEKST 2006
New Status : FAULTY
Old Status : FAULTY
Event Code : *** UNKNOWN Invalid Value ***: 0x0000000003000000
Initiator : SCAPP
Message : SF4800.VCMON.1.03.1425
@
Investigating CHS status with physical access to the SC
1. Use the
showchs -b command at the platform shell.
For example:
sun_fire-sc1:SC> showchs -b
Component Status
--------------- --------
/N0/SB2/P2 Faulty
In the above example, SB2/P2 is marked faulty and must be serviced.
2. Collect the following data while in service mode (if ScApp version
is < 5.20.15) or from normal mode (if ScApp 5.20.15 or higher is
installed) on the Main SC:
v4u-6800c-sc0:SC[service]> showchs -v -c sb2
Total # of records: 2
Component : N0/SB2/P2
Time Stamp : Wed Jul 19 19:16:29 YEKST 2006
New Status : FAULTY
Old Status : OK
Event Code : Other
Initiator : SCAPP
Message : SF4800.VCMON.1.03.1424
Component : N0/SB2/P2
Time Stamp : Thu Jul 20 19:16:53 YEKST 2006
New Status : FAULTY
Old Status : FAULTY
Event Code : Other
Initiator : SCAPP
Message : SF4800.VCMON.1.03.1425
@
What to do next
With the date stamp and the fault code (
SF4800.VCMON.1.03.1425 in the
above example) you can determine if the fault is valid, replace the part, or
re-enable if a known bug has been hit.
In this example, check if CR 6353053 "false VCMON failures due to ramping
system load" applies then check the replacement rules in
1010919.1 to see
what to do to resolve this issue.
If you are not sure if the component needs to be replaced ask for confirmation
in the
GL-ESG IM room.
If component status must be reset
Utilize
1004879.1 for instructions on resetting component CHS status
using setchs (if needed).
chs, showchs, setchs, OK, SUSPECT, FAULTY, normalized
Previously Published As
70181
Attachments
This solution has no attachment