Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1017618.1
Update Date:2012-10-12
Keywords:

Solution Type  Troubleshooting Sure

Solution  1017618.1 :   All connectivity (data and management) is lost to a Sun Storedge 35XX / 33XX arrays. Both controller status LED's are Blinking green  


Related Items
  • Sun Storage 3511 SATA Array
  •  
  • Sun Storage 3510 FC Array
  •  
  • Sun Storage 3310 Array
  •  
  • Sun Storage 3320 SCSI Array
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>Arrays>SN-DK: SE31xx_33xx_35xx
  •  
  • .Old GCS Categories>Sun Microsystems>Storage - Disk>Modular Disk - 3xxx Arrays
  •  

PreviouslyPublishedAs
228794
This document describes how to resolve a RAID controller "race condition" on a Storage 3310,

Applies to:

Sun Storage 3510 FC Array - Version Not Applicable and later
Sun Storage 3511 SATA Array - Version Not Applicable and later
Sun Storage 3310 Array - Version Not Applicable and later
Sun Storage 3320 SCSI Array - Version Not Applicable and later
All Platforms

Purpose

This document will explain the symptoms and resolution for a total loss of connectivity to a Sun StorEdge 3X00 array. The problem has become known as a "race condition".  The array goes into a state, where both controllers assume a role of primary. You know this has occurred when both controller status LED's are blinking green. Other criteria include.

  • BOTH RAID controller status LEDs are flashing green.
  • The TCP/IP (ethernet) connection may not respond.
  • Serial console (Console Menu Interface) sends garbled characters
  • You cannot reach the array via sccli
  • I/O to the host  may stop as well.

 

This document does not apply to single raid controller 35X0 and 33X0 arrays.


Explanation:

In this situation, the RAID controllers have gotten into a "race condition" in that they are both functioning as the Primary Controller. Only one controller may be the primary. The other must be a secondary. The best way to determine primary and secondary roles is with a visual inspection of status LED on the controller. The Primary status LED is blinking green. The Secondary status LED should be solid green.

In a race condition, both controllers try to.

  • Service requests to the LUNs on all of the host channels
  • Send and receive on the serial port (which is a common bus for both controllers).
  • Send and receive data on the ethernet port using the same IP and MAC addresses.


Cause:

This problem can be attributed to either an improper controller firmware upgrade, or may occur when a 3.2x controller was installed in a 4.x array. Specifically, there is an NVRAM mismatch somewhere between the NVRAM on each controller and the NVRAM on the Disks.



 

Troubleshooting Steps

Troubleshooting Steps:

The following procedure is used to resolve race conditions. It requires

  • downtime (roughly 1 hour)
  • sccli access (preferrably out of band)
  • console access. (You will need to tip to the serial port of the controller)
  • A previous show_configuration.xml file from either explorer or se3kxtr.

 

Steps to Follow:

  • Stop all Host I/O. Start downtime. 
  • Power off the RAID chassis using the power switches on the two PCUs.
  • Pull the BOTTOM RAID controller module part way out of the RAID chassis.
  • Wait 10 seconds, then power the RAID chassis on.
  • Wait for the TOP RAID controller module to boot up (approx. 90 seconds),
  • From the host start an sccli session in-band -> /usr/sbin/sccli oob-> /usr/sbin/sccli <ip-address> 
  • Select (or verify that sccli displays) the correct S/N for the RAID chassis you are working on.
    S/N  The Serial Number of the RAID chassis can be found on a sticker on the 
            bottom left side of the Disk Drive Bay of the RAID chassis. If the sticker has 
            the number 0451-0408008DB3, the S/N of the RAID chassis would be 008DB3.
  • Use sccli command  sccli> Reset Nvram   When complete, exit sccli.
  • Power off the RAID chassis using the power switches on the two PCU's
  • Fully insert the BOTTOM RAID controller module (ensure it is fully seated).
  • Pull the TOP RAID controller module part way out of the RAID chassis
  • Power the RAID chassis on.
  • Wait for the BOTTOM RAID controller module to boot up (approx. 90 seconds)
  • From the host start an sccli session in-band -> /usr/sbin/sccli oob-> /usr/sbin/sccli <ip-address> 
  • Select (or verify that sccli displays) the correct S/N for the RAID chassis you are working on.
    S/N  The Serial Number of the RAID chassis can be found on a sticker on the 
            bottom left side of the Disk Drive Bay of the RAID chassis. If the sticker has 
            the number 0451-0408008DB3, the S/N of the RAID chassis would be 008DB3.
  • Use sccli command   sccli> Reset Nvram    When complete, exit sccli.
  • Power off the RAID chassis using the power switches on the two PCU's\
  • Reinsert the TOP RAID controller.
  • Power up the RAID array.
  • From sccli, issue the   sccli> show redundancy   command, and verify the Status is "Enabled"
  • Verify the IP address is set correctly on the array with  sccli> show ip    If it is incorrect, reset the IP address from console. Using the serial Console Menu Interface select "view and Edit Configuration Parameters" ->Communication Parameters->TCP/IP Address.
  • Restore the Configuration. In this example, the show_configuration.xml file is restored via the in-band scci device c4t4d0
    /opt/SUNWsscs/sbin/s3kdlres /var/tmp/show_configuration.xml --device=/dev/rdsk/c4t4d0s2 
    
  • See Document Sun StorEdge 3000 Family RAID Controller Firmware Migration Guide for more details on the s3kdlres utility 
    
    

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback