Asset ID: |
1-75-1017618.1 |
Update Date: | 2012-10-12 |
Keywords: | |
Solution Type
Troubleshooting Sure
Solution
1017618.1
:
All connectivity (data and management) is lost to a Sun Storedge 35XX / 33XX arrays. Both controller status LED's are Blinking green
Related Items |
- Sun Storage 3511 SATA Array
- Sun Storage 3510 FC Array
- Sun Storage 3310 Array
- Sun Storage 3320 SCSI Array
|
Related Categories |
- PLA-Support>Sun Systems>DISK>Arrays>SN-DK: SE31xx_33xx_35xx
- .Old GCS Categories>Sun Microsystems>Storage - Disk>Modular Disk - 3xxx Arrays
|
PreviouslyPublishedAs
228794
This document describes how to resolve a RAID controller "race condition" on a Storage 3310,
Applies to:
Sun Storage 3510 FC Array - Version Not Applicable and later
Sun Storage 3511 SATA Array - Version Not Applicable and later
Sun Storage 3310 Array - Version Not Applicable and later
Sun Storage 3320 SCSI Array - Version Not Applicable and later
All Platforms
Purpose
This document will explain the symptoms and resolution for a total loss of connectivity to a Sun StorEdge 3X00 array. The problem has become known as a "race condition". The array goes into a state, where both controllers assume a role of primary. You know this has occurred when both controller status LED's are blinking green. Other criteria include.
- BOTH RAID controller status LEDs are flashing green.
- The TCP/IP (ethernet) connection may not respond.
- Serial console (Console Menu Interface) sends garbled characters
- You cannot reach the array via sccli
- I/O to the host may stop as well.
This document does not apply to single raid controller 35X0 and 33X0 arrays.
Explanation:
In this situation, the RAID controllers have gotten into a "race condition" in that they are both functioning as the Primary Controller. Only one controller may be the primary. The other must be a secondary. The best way to determine primary and secondary roles is with a visual inspection of status LED on the controller. The Primary status LED is blinking green. The Secondary status LED should be solid green.
In a race condition, both controllers try to.
- Service requests to the LUNs on all of the host channels
- Send and receive on the serial port (which is a common bus for both controllers).
- Send and receive data on the ethernet port using the same IP and MAC addresses.
Cause:
This problem can be attributed to either an improper controller firmware upgrade, or may occur when a 3.2x controller was installed in a 4.x array. Specifically, there is an NVRAM mismatch somewhere between the NVRAM on each controller and the NVRAM on the Disks.
Troubleshooting Steps
Troubleshooting Steps:
The following procedure is used to resolve race conditions. It requires
- downtime (roughly 1 hour)
- sccli access (preferrably out of band)
- console access. (You will need to tip to the serial port of the controller)
- A previous show_configuration.xml file from either explorer or se3kxtr.
Steps to Follow:
- Stop all Host I/O. Start downtime.
- Power off the RAID chassis using the power switches on the two PCUs.
- Pull the BOTTOM RAID controller module part way out of the RAID chassis.
- Wait 10 seconds, then power the RAID chassis on.
- Wait for the TOP RAID controller module to boot up (approx. 90 seconds),
- From the host start an sccli session in-band -> /usr/sbin/sccli oob-> /usr/sbin/sccli <ip-address>
- Select (or verify that sccli displays) the correct S/N for the RAID chassis you are working on.
S/N The Serial Number of the RAID chassis can be found on a sticker on the
bottom left side of the Disk Drive Bay of the RAID chassis. If the sticker has
the number 0451-0408008DB3, the S/N of the RAID chassis would be 008DB3.
- Use sccli command sccli> Reset Nvram When complete, exit sccli.
- Power off the RAID chassis using the power switches on the two PCU's
- Fully insert the BOTTOM RAID controller module (ensure it is fully seated).
- Pull the TOP RAID controller module part way out of the RAID chassis
- Power the RAID chassis on.
- Wait for the BOTTOM RAID controller module to boot up (approx. 90 seconds)
- From the host start an sccli session in-band -> /usr/sbin/sccli oob-> /usr/sbin/sccli <ip-address>
- Select (or verify that sccli displays) the correct S/N for the RAID chassis you are working on.
S/N The Serial Number of the RAID chassis can be found on a sticker on the
bottom left side of the Disk Drive Bay of the RAID chassis. If the sticker has
the number 0451-0408008DB3, the S/N of the RAID chassis would be 008DB3.
- Use sccli command sccli> Reset Nvram When complete, exit sccli.
- Power off the RAID chassis using the power switches on the two PCU's\
- Reinsert the TOP RAID controller.
- Power up the RAID array.
- From sccli, issue the sccli> show redundancy command, and verify the Status is "Enabled"
- Verify the IP address is set correctly on the array with sccli> show ip If it is incorrect, reset the IP address from console. Using the serial Console Menu Interface select "view and Edit Configuration Parameters" ->Communication Parameters->TCP/IP Address.
- Restore the Configuration. In this example, the show_configuration.xml file is restored via the in-band scci device c4t4d0
/opt/SUNWsscs/sbin/s3kdlres /var/tmp/show_configuration.xml --device=/dev/rdsk/c4t4d0s2
-
See Document Sun StorEdge 3000 Family RAID Controller Firmware Migration Guide for more details on the s3kdlres utility
Attachments
This solution has no attachment