Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1321710.1
Update Date:2011-05-25
Keywords:

Solution Type  Troubleshooting Sure

Solution  1321710.1 :   Sun Enterprise[TM] 10000: Troubleshooting Power Puck Issues  


Related Items
  • Sun Enterprise 10000 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>High-End Servers
  •  




In this Document
  Purpose
  Last Review Date
  Instructions for the Reader
  Troubleshooting Details


Applies to:

Sun Enterprise 10000 Server - Version: Not Applicable to Not Applicable - Release: N/A to N/A
Information in this document applies to any platform.

Purpose

This document provides way to identify and resolve domain unexpected outages caused by failing or failed power pucks on various boards in the E10000.

Last Review Date

May 12, 2011

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details

Background and Manifestation of Domain Outages Caused by Power Puck Failures

A power puck is a DC-DC converter on a system board, centerplane support board, or control board.  Persistent power puck failures result in an inability to power on the affected board.  At times, the power puck will continue to provide enough voltage to power on the affected board, but lack enough voltage for cpus.  Either type of power puck failures have caused domain outages.

Power puck failures on a board have caused domain Arbstop events with wfail signatures matching the following list.  NOTE:  The list should not be considered all-inclusive, and power puck issues are not the only cause of the observed Arbstops.  Please use the recommendations in the next section to confirm the cause is a power puck failure on a board.

wfail reports  Illegal Coherent condition/access proc 0
wfail reports  Port 0 UPA fatal error
wfail reports  Sysboard Request Parity Error Mask
wfail reports  MC Timeout: waiting for data to match address
wfail reports  MC Timeout: waiting for address to match data
wfail reports  Port 0 unexpected foreign PIO queue p_reply received


Identifying and Confirming Power Puck Failures

Voltage issues caused by power puck failures are logged to the platform logs on the SSP.  Look for messages like the below example.

From /var/opt/SUNWssp/adm/messages:


May  2 11:17:59 ssp procesvolt: Warning: Voltage readings have exceeded the thresholds on system board 4
May  2 11:17:59 ssp procesvolt: Voltage data for board 4, range trap: sysBrdStarfire3p3VDC.0 0.68 V


Running the power command with no options from the Main SSP will also confirm power puck failures on the various boards.  Look for extremely low values in the columns marked with >>> in the below example.


ssp% power

Good 48V Bulk Power Supplies:                     0 1 2 3 4 6 7
Number of Good 48V Bulk Power Supplies:           7 (N+1 redundancy ok)
Required 48V Power Supplies for 14 System Boards: 6
Number of Good Peripheral Cabinet Power Supplies: 0

 Centerplane Support Board Average Voltages (V):
 CSB#     5VDC Vcc HK     3.3VDC Vdd HK       3.3VDC Vdd Core
 ----  -----------------  -------------  -------------------------
  0     4.988     5.022       3.373       3.296    3.295    3.292
  1     5.017     4.998       3.502   >>> 1.079    1.079    1.080 <<<

 System Board Average Voltages (V):
      3.3VDC    5VDC     3.3VDC       VDC        5VDC
 SB#   Vdd     Vcc HK    Vdd HK     Vdd Core     Vcc
 ---  -------  -------  --------   ----------  --------
  0    3.295    4.976    3.381       1.904       5.005
  1    3.295    4.993    3.417       1.902       4.995
  2    3.300    5.000    3.407       1.903       4.998
  3    3.300    5.000    3.402       1.895       4.998
  4    3.301    5.030    3.395   >>> 0.681 <<<   5.005
  5    3.297    4.978    3.419       1.904       5.005
  6    3.300    5.015    3.409       1.908       4.998
  8    3.306    5.008    3.417       1.904       5.003
  9    3.297    5.008    3.417       1.906       5.005
 10    3.304    4.993    3.417       1.902       4.993
 11    3.297    4.978    3.418       1.906       4.995
 12    3.293    5.008    3.416       1.903       4.998
 14    3.298    4.915    3.406       1.904       4.995
 15    3.293    5.015    3.417       1.909       4.993

 Control Board Average Voltages (V):
        5VDC        5VDC       3.3VDC      5VDC Vcc     5VDC
 CB#    Vcc        Vcc HK      Vdd HK     Peripheral   Vcc Fans
 ---   --------   --------    ---------   ----------  ---------
  0     5.039      5.071       3.427       5.105       5.348
  1     5.089      5.049       3.425       5.125       5.348




Resolution

Replace the board with the confirmed failed or failing power puck.



Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback