Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1163816.1
Update Date:2012-03-20
Keywords:

Solution Type  Problem Resolution Sure

Solution  1163816.1 :   Sun Storage 7000 Unified Storage System: SAS Interconnect Module (SIM) failure with blue LED  


Related Items
  • Sun Storage 7410 Unified Storage System
  •  
  • Sun Storage 7210 Unified Storage System
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Sun Storage 7110 Unified Storage System
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>NAS>SN-DK: 7xxx NAS
  •  
  • .Old GCS Categories>Sun Microsystems>Storage - Disk>Unified Storage
  •  




In this Document
  Symptoms
  Changes
  Cause
  Solution
  References


Applies to:

Sun Storage 7410 Unified Storage System - Version: Not Applicable and later   [Release: N/A and later ]
Sun Storage 7210 Unified Storage System - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 7110 Unified Storage System - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 7310 Unified Storage System - Version: Not Applicable and later    [Release: N/A and later]
Information in this document applies to any platform.

Symptoms

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - 7000 Series ZFS Appliances

  • Blue LED lit on the failed SIM (visible from the rear of the chassis).
  • One or more JBODs with less than two paths listed in BUI 'Maintenance->Hardware' view.
  • Alert, log message or active problem related to the loss of a path:
    • Alert example: The component 'SIM (0|1)' has been removed from chassis 'XYZ'

Changes

N/A

Cause

The SIM failure is caused by a missed heartbeat signal. The SIM that detects the heartbeat timeout
takes the action of disabling it's peer (assuming that it is hung or otherwise non-functional).


For SAS-1 systems, see <SUNBUG: 6803801> for more details.  Sun engineering has very
strong evidence to suggest that upgrading the SIM firmware to 3R24 resolves
this issue.

For SAS-2 systems, see <SUNBUG: 7017185> for more details.  This is fixed by
upgrading to the 2010.Q3.4 release.

See also FAB 1021661.1 (J4400 SIM cards randomly failing due to heartbeat timeout)


Solution

Steps to follow:

1. Physically re-seat the SIM module. This item is hot-pluggable, and as far
    as the Appliance is concerned, it is not present, so it is safe to re-seat.

2. If possible, upgrade system software to 2010.Q1 or later - if not, continue on to step 4.

    To Download Software from the My Oracle Support Release Updates Page
  1. Sign in to my oracle support at https://support.oracle.com
  2. Select the "Patches & Updates" tab.
  3. Search by Sun ZFS Storage Appliance product family .
  4. Download the zip file to your local system and unzip
  [ For SAS-2 systems, upgrade system software to 2010.Q3.4 or later. ]


3. Wait for the SIM firmware update to complete. If there's no progress monitor available, allow 15 minutes per JBOD.

4. Navigate in the BUI to Maintenance->Problems.  Select any path faults, if present.  Click the 'Mark Repaired' button.


It is possible to lose access to your storage pool for the duration of the SIM failure, which in turn could cause a reboot and/or failure.
This would generally only happen if incorrectly cabled (i.e. no alternate path available), or in the case of multiple SIM failures.


NOTE:  Under no circumstances should you attempt to update the SIM firmware
            or anything on the appliance other than the system software without the
            direct involvement of Technical Support.


Additional Resources:

Appliance help under Installation for diagrams of correct cabling for 7310 and 7410 systems.
Appliance help under Maintenance:System:Updates for software upgrade procedure and related information.
Appliance help under Maintenance:Problems for help with the Fault Management (FMA) subsystem.




References

<NOTE:1307224.1> - FAB: Standard: Reactive: RW2 SIM heartbeat timeout (aka White Light Special) scenario with RW2, which will cause one of the SIMs to go offline and show up as one of the paths to the JBOD removed.

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback