Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1021661.1
Update Date:2010-06-24
Keywords:

Solution Type  FAB (standard) Sure

Solution  1021661.1 :   J4400 SIM cards randomly failing due to heartbeat timeout.  


Related Items
  • Sun Storage J4400 Array
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Sun Storage 7410 Unified Storage System
  •  
Related Categories
  • GCS>Sun Microsystems>Sun FAB>Standard>Reactive
  •  

PreviouslyPublishedAs
273189


Oracle Confidential (PARTNER). Do not distribute to customers
Reason: FABs available to Partners and Internals only.

Applies to:

Sun Storage 7310 Unified Storage System
Sun Storage 7410 Unified Storage System
Sun Storage J4400 Array
All Platforms
__________

SUNBUG 6803801

Affected Parts:

375-3584 - J4400 SAS Interface Module (SIM)

Symptoms

This SIM failure is indicated by a blue LED on the failed SIM (visible from the rear of the chassis). The failure will also be visible by viewing the number of paths associated with a particular JBOD in the "BUI Maintenance->Hardware" view. JBODs with a failed SIM will report only 1 path instead of the usual 2 paths. The combination of a lit blue LED on the SIM and the missing path in the "Maintenance->Hardware" view is the definitive symptom of this condition. Additionally, the Back view of the JBOD chassis will show the failed SIM as missing. At the time of failure, the appliance will log an alert as in the below example;

   The component 'SIM (0|1)' has been removed from chassis 'XYZ'

Impact

J4400 SIM cards randomly failing due to heartbeat timeout causes one of the two SIM modules in a JBOD to go offline, indicated by a blue light on failed SIM. Once failed, the JBOD has only one path available to connect the appliance head to the disks.  Re-seating the failed SIM clears this issue.

Changes

Contributing Factors

The above listed products running SIM firmware less than 3R24 are subject to this issue.

The SIM failure condition is sporadic in nature. Customers with larger configurations tend to see this issue more than smaller configurations because each additional JBOD adds additional exposure. Among large configurations, some customers see this problem more often than others. Because manual intervention is required to clear the failure (re-seating the SIM module), customers who don't notice this failure tend to stack up failures on multiple JBODs over time.

Cause

Root Cause

The SIM failure is caused by a missed heartbeat signal. The SIM that detects the heartbeat timeout takes the action of disabling it's peer (assuming that it is hung or otherwise non-functional). See CR 6803801 for more details. Sun engineering has very strong evidence to suggest that upgrading the SIM firmware to 3R24 resolves this issue.

Solution

Implementation: Reactive

Workaround


Manually re-seat the failed SIM card. This may be done while the system is running, but care should be taken not to disturb the cabling to the remaining SIM or to other JBODs in the chain.

Resolution

Firmware 3R24 must be loaded on each attached JBOD SIM card in order to resolve the "Blue Light Special" issue. Firmware 3R24 is bundled with Appliance SW 2010.Q1 or later and is automatically updated once the Appliance SW is installed.

For installing Sun Storage 7000 Software Update 2010.Q1.1.0 or later Release Notes can be found here:

   http://wikis.sun.com/display/FishWorks/ak-2010.02.09.1.0+Release+Notes

and the release itself is linked from the Software Updates page:

   http://wikis.sun.com/display/FishWorks/Software+Updates


Identification of Affected Parts (how to)

As noted in the "Symptoms" section, SIMs status is indicated by the number of paths associated with a JBOD chassis. The Blue Light on the rear of a SIM module indicates a failure.

References

Bug Id: 6803801


For information about FAB documents, its release processes, implementation strategies and billing information, go to the following URL:

* http://tns.central/fab

In addition to the above you may email:

* [email protected]



Internal Contributor/submitter
[email protected]

Internal Eng Responsible Engineer
[email protected]

Responsible Manager:
[email protected]

Internal Services Knowledge Engineer
[email protected]

Internal Eng Business Unit Group
NWS (Network Storage)

Internal Sun Alert & FAB Admin Info
20-Nov-2009: Completed draft and sent to Extended Review.
24-Nov-2009: No feedback from Ext Rvw - sending to Publish.
23-Jun-2010: Major rewrite of the Solution section.

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback