Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1484421.1
Update Date:2012-10-04
Keywords:

Solution Type  Troubleshooting Sure

Solution  1484421.1 :   T2000 shows BOTH power supplies as failed but system still Operational  


Related Items
  • Sun SPARC Enterprise T2000 Server
  •  
  • Sun Fire T2000 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: Tx000
  •  




In this Document
Purpose
Troubleshooting Steps


Applies to:

Sun SPARC Enterprise T2000 Server - Version All Versions and later
Sun Fire T2000 Server - Version All Versions and later
Sun SPARC Sun OS

Purpose

 Provide information to assist in diagnosing power faults

Troubleshooting Steps

 First step in diagnosing this type of fault is to have the correct data available.

Using the following commands from the ALOM to gather specific information.

sc> showfaults -v
ID Time FRU Fault
0 SEP 11 09:22:18 PS0 PSU at PS0 has FAILED.
1 SEP 11 09:22:18 PS1 PSU at PS1 has FAILED.
sc>
SC Alert: PSU at PS0 has FAILED.

SC Alert: PSU at PS1 has FAILED.

sc> showfru -s

FRU_PROM at PDB/SEEPROM is not present

FRU_PROM at SASBP/SEEPROM is not present

FRU_PROM at PS0/SEEPROM
failed to read ipmi header data.
Failed to print information

FRU_PROM at PS1/SEEPROM
failed to read ipmi header data.
Failed to print information

sc> showenvironment

=============== Environmental Status ===============

--------------------------------------------------------------------------------
System Temperatures (Temperatures in Celsius):
--------------------------------------------------------------------------------
Sensor Status Temp LowHard LowSoft LowWarn HighWarn HighSoft HighHard
--------------------------------------------------------------------------------
PDB/T_AMB UNKNOWN -- -- -- -- -- -- --
MB/T_AMB     OK         17        -10          -5 0       50        55          60
MB/CMP0/T_TCORE OK 30     -10          -5 0       85        90          95
MB/CMP0/T_BCORE OK 29     -10          -5 0       85        90          95
IOBD/IOB/T_CORE    OK 32     -10          -5 0       95       100        105
IOBD/T_AMB             OK 22     -10          -5 0       52        57          62

--------------------------------------------------------
System Indicator Status:
--------------------------------------------------------
SYS/LOCATE SYS/SERVICE SYS/ACT
OFF                      ON                  ON
--------------------------------------------------------
SYS/REAR_FAULT SYS/TEMP_FAULT SYS/TOP_FAN_FAULT
ON                                  OFF                              OFF
--------------------------------------------------------

DisplayDisk: Error reading disk(s)
---------------------------------------------------

------------------------------------------------------------------------------
Power Supplies:
------------------------------------------------------------------------------
Supply Status Underspeed Overtemp Overvolt Undervolt Overcurrent
------------------------------------------------------------------------------
PS0 FAILED       OFF         ON            ON          ON          ON
PS1 FAILED       OFF         ON            ON          ON          ON

From Solaris, the following messages are found under /var/adm/messages

/var/adm/messages:-

Dec 11 13:43:57 brudapp1 SC Alert: [ID 437756 daemon.error] Input power unavailable for PSU at PS0.
Dec 11 15:23:41 brudapp1 SC Alert: [ID 438624 daemon.error] PSU at PS1 has FAILED.
Dec 11 15:23:52 brudapp1 SC Alert: [ID 438608 daemon.error] PSU at PS0 has FAILED.

 

Troubleshooting:

1. Re-seat both power supply assembly's

2. Check and re-seat cables from PDB to MB, SAS Backplane

3. Re-seat SP ( service processor )

4. Upgrade SysFW to the latest (as there are fixes related to i2c bus noise and false readings)

5. If the above actions do not resolve, the next step will be to open a service request.

Open service request using My Oracle Support interface or call 1-800-223-1711 and follow the prompts.

 

Please do not distribute anything below this to customers, this is internal only information

We are seeing this issue in our SoD/APQ defect analysis and many times the FW upgrade helps, so they don't have to replace parts. As you know

CR 6607953 is not fixed in FW, but there are other fixes related to i2c bus noise and false readings that probably help.

Bug 6607953: T2000 displays PSUs as 'FAILED'. System is running

I2C bus: The i2c bus is not communicating to the PSU due to path breakage or corruption.

NOTE: Always try RE-SEATING the Power Supplies first, before replacing any parts (PSUs are hot-swappable and can be reseated one at a

time without any downtime). If that doesn't help, proceed with replacement of the following:

OSP module
PS0
PS1
PDB
SAS bkpln

Each of these devices can be replaced starting with the OSP (Ontario Service Processor is the most likely culprit as shown), to determine

the cause of the failure. After each replacement boot up the system and check the PS status to see if the problem is resolved.

 

 


Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback