Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1222158.1
Update Date:2011-08-22
Keywords:

Solution Type  FAB (standard) Sure

Solution  1222158.1 :   NEMHydra's Main Power Shuts Down Unexpectedly  


Related Items
  • Sun Blade 6000 System
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun FAB
  •  




In this Document
  Symptoms
  Changes
  Cause
  Solution
  References


Oracle Confidential (PARTNER). Do not distribute to customers
Reason: FABs available to Internals and Partners only

Applies to:

Sun Blade 6000 System - Version: Not Applicable to Not Applicable - Release: N/A to N/A
Information in this document applies to any platform.
__________



Escalation ID: 41143985
_________

Affected Parts: (FRU/CRU Part Number / Description)

540-7695 - 16-Port Virtualized Multi-Fabric Network Express Module (X4238)

Symptoms

The Sun Blade 6000 (SPARC) will panic and X86 blades will lose communication with the affected NEM.

Example panic string:
    panic[cpu100]/thread=2a104663ca0:
Fatal error has occured in: PCIe fabric.(0x0)(0x41)

Check the FMA errors after the blade reboots. Look for the following signature, to determine if it was a surprise down event on the NEM due to it processing a KILLALL signal. The FMA event will be on one of the NEM modules:

    grep pcie_ue_status */fma/*fmdump*
pcie_ue_status = 0x20 = surprise down
Impact

The main power is turned off causing the NemHydra to power off.  This will cause the blades OS to react to a network device loss.

Changes

Contributing Factors

Sun Blade 6000 Virtualized Multi-Fabric 10GbE Network Express Module.

Increased i2c activities could affect a corrupted read/write to the ADM1066.

The ADM1066 is a stand alone power sequencer and monitoring device which monitors multiple voltage rails and is also in charge of initiating power down with KILLALL signal to the NEM. The NEM contains two of these ADM1066 devices.

Cause

Root Cause

Due to the inability to consistently repeat this failure, we do not know what device asserts the KILLALL on the NEM's ADM1066.   The SAS expander could be a possible suspect as by design it will assert KILLALL when i2c temperature (ambient and junction) readings exceed 75C, 120C.  Although these temperatures were never observed in a failing environment, a corrupted temperature read could cause this effect.

By blocking the KILLALL signal on the ADM1066, the SAS expander can no longer shut down the NEM due to false overtemp reading. However the SAS expander will still turn on the LED when the warning threshold (65C, 100C) is actually reached. Also when a real NEM overtemp occurs, the voltage would increase and the main power sequencer will detect it and shutdown the NEM.

Solution

Workaround

No workaround available - see Resolution section.

Resolution

Patch 11884187: SUN BLADE 6000 10GBE VMF NEM SW 2.2.1 TOOLS AND DRIVERS
contains the Power Sequencer code update and SAS update Firmware.

Patch 11883817: SUN BLADE 6000 10GBE VMF NEM SW 2.2.1 FIRMWARE contains the
firmware for the NEM/SP.

This ILOM package contains fixes for the following reported issues:

   1.  7017229 - lades fail to attach NEM Hydra on reboot or after crash
   2.  7010225 - Onbox legal notices need to be updated to 2011

Reference the attached document for Power Sequencer firmware update instructions, which will require ILOM "escalation mode" and an FE on-site to perform these instructions.

References


For information about FAB documents, its release processes, implementation strategies and billing information, go to the following URL:

* http://tns.central.sun.com/fab

In addition to the above you may email:

* [email protected]


Contacts

Contributor: [email protected]
Responsible Engineer: [email protected]
Responsible Manager: [email protected]
Business Unit Group: Systems Group-x64 (X4100-X4600 (and M2), V20z/V40z/V60z/V65z, @Ultra20/40 (and M2) Workstations), Systems Group-SVS (SPARC Volume Systems, Horizontal @Systems,(includes T2000/Ontario)

References

<PATCH:11884187> - SUN BLADE 6000 10GBE VMF NEM SW 2.2.1 TOOLS AND DRIVERS
<PATCH:11883817> - SUN BLADE 6000 10GBE VMF NEM SW 2.2.1 FIRMWARE

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback