Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Troubleshooting Sure Solution 1321710.1 : Sun Enterprise[TM] 10000: Troubleshooting Power Puck Issues
In this Document
Applies to:Sun Enterprise 10000 Server - Version: Not Applicable to Not Applicable - Release: N/A to N/AInformation in this document applies to any platform. PurposeThis document provides way to identify and resolve domain unexpected outages caused by failing or failed power pucks on various boards in the E10000.Last Review DateMay 12, 2011Instructions for the ReaderA Troubleshooting Guide is provided to assist
in debugging a specific issue. When possible, diagnostic tools are included in the document
to assist in troubleshooting.
Troubleshooting DetailsBackground and Manifestation of Domain Outages Caused by Power Puck FailuresA power puck is a DC-DC converter on a system board, centerplane support board, or control board. Persistent power puck failures result in an inability to power on the affected board. At times, the power puck will continue to provide enough voltage to power on the affected board, but lack enough voltage for cpus. Either type of power puck failures have caused domain outages. Power puck failures on a board have caused domain Arbstop events with wfail signatures matching the following list. NOTE: The list should not be considered all-inclusive, and power puck issues are not the only cause of the observed Arbstops. Please use the recommendations in the next section to confirm the cause is a power puck failure on a board. wfail reports Illegal Coherent condition/access proc 0 wfail reports Port 0 UPA fatal error wfail reports Sysboard Request Parity Error Mask wfail reports MC Timeout: waiting for data to match address wfail reports MC Timeout: waiting for address to match data wfail reports Port 0 unexpected foreign PIO queue p_reply received Identifying and Confirming Power Puck Failures Voltage issues caused by power puck failures are logged to the platform logs on the SSP. Look for messages like the below example. From /var/opt/SUNWssp/adm/messages: May 2 11:17:59 ssp procesvolt: Warning: Voltage readings have exceeded the thresholds on system board 4 May 2 11:17:59 ssp procesvolt: Voltage data for board 4, range trap: sysBrdStarfire3p3VDC.0 0.68 V Running the power command with no options from the Main SSP will also confirm power puck failures on the various boards. Look for extremely low values in the columns marked with >>> in the below example. ssp% power Good 48V Bulk Power Supplies: 0 1 2 3 4 6 7 Number of Good 48V Bulk Power Supplies: 7 (N+1 redundancy ok) Required 48V Power Supplies for 14 System Boards: 6 Number of Good Peripheral Cabinet Power Supplies: 0 Centerplane Support Board Average Voltages (V): CSB# 5VDC Vcc HK 3.3VDC Vdd HK 3.3VDC Vdd Core ---- ----------------- ------------- ------------------------- 0 4.988 5.022 3.373 3.296 3.295 3.292 1 5.017 4.998 3.502 >>> 1.079 1.079 1.080 <<< System Board Average Voltages (V): 3.3VDC 5VDC 3.3VDC VDC 5VDC SB# Vdd Vcc HK Vdd HK Vdd Core Vcc --- ------- ------- -------- ---------- -------- 0 3.295 4.976 3.381 1.904 5.005 1 3.295 4.993 3.417 1.902 4.995 2 3.300 5.000 3.407 1.903 4.998 3 3.300 5.000 3.402 1.895 4.998 4 3.301 5.030 3.395 >>> 0.681 <<< 5.005 5 3.297 4.978 3.419 1.904 5.005 6 3.300 5.015 3.409 1.908 4.998 8 3.306 5.008 3.417 1.904 5.003 9 3.297 5.008 3.417 1.906 5.005 10 3.304 4.993 3.417 1.902 4.993 11 3.297 4.978 3.418 1.906 4.995 12 3.293 5.008 3.416 1.903 4.998 14 3.298 4.915 3.406 1.904 4.995 15 3.293 5.015 3.417 1.909 4.993 Control Board Average Voltages (V): 5VDC 5VDC 3.3VDC 5VDC Vcc 5VDC CB# Vcc Vcc HK Vdd HK Peripheral Vcc Fans --- -------- -------- --------- ---------- --------- 0 5.039 5.071 3.427 5.105 5.348 1 5.089 5.049 3.425 5.125 5.348 Resolution Replace the board with the confirmed failed or failing power puck. Attachments This solution has no attachment |
||||||||||||
|