Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1450121.1
Update Date:2012-06-21
Keywords:

Solution Type  Technical Instruction Sure

Solution  1450121.1 :   How to Resolve a Missing Hot Spare Drive  


Related Items
  • Sun Storage 6580 Array
  •  
  • Sun Storage Flexline 380 Array
  •  
  • Sun Storage Flexline 240 Array
  •  
  • Sun Storage 2540-M2 Array
  •  
  • Sun Storage 6130 Array
  •  
  • Sun Storage Flexline 280 Array
  •  
  • Sun Storage 2510 Array
  •  
  • Sun Storage 6180 Array
  •  
  • Sun Storage 2540 Array
  •  
  • Sun Storage 6540 Array
  •  
  • Sun Storage 6780 Array
  •  
  • Sun Storage 2530-M2 Array
  •  
  • Sun Storage 2530 Array
  •  
  • Sun Storage Flexline 210 Array
  •  
  • Sun Storage 6140 Array
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>Arrays>SN-DK: 6140_6180
  •  


How to recover from and avoid a missing Hot Spare condition.

In this Document
Goal
Fix


Applies to:

Sun Storage 6140 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 6780 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 6130 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage Flexline 280 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 2530-M2 Array - Version Not Applicable to Not Applicable [Release N/A]
Information in this document applies to any platform.

Goal

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community, Storage Disk 2000, 3000, 6000 RAID Arrays & JBODs Community.

The intent of this document is to provide the steps to resolve the alarm xx.66.1203 - Missing Hot Spare Drive that can be raised by the array management software. While this problem can be seen in 6.x and 7.x controller firmware versions, it is much more prevelent in 7.x. If you see this problem on array running 6.x controller firmware, please contact Oracle support for a resolution. The balance of this document assumes 7.x controller firmware.

There are two primary ways this problem can be created:

  1. The actual removal of a drive that has been assigned as a Hot Spare Drive.
  2. Not unassigning a Hot Spare Drive that has failed before replacement.

As the first problem is easily fixed by inserting the removed Hot Spare Drive, this document will focus on the second cause. This problem can also be easily avoided by unassigning the failed Hot Spare Drive from the Hot Spare list before attempting to replace it.

Fix

 

  1. Collect supportData bundle.

    1. Reference <Document 1002514.1> Collecting Support Data for Arrays using Sun StorageTek[TM] Common Array Manager.
    2. Reference <Document 1014074.1> Collecting Support Data from Arrays using Sun StorageTek[TM] SANtricity Storage Manager.

     

  2. Verify Missing Hot Spare Drive.

    1. Unzip the supportData bundle from step 1.
    2. For Sun Storage Common Array Manager (CAM) users, look at the alarms.txt file for an alarm Gridcode of xx.66.1203.
    3. For SANtricity users, look in recoveryGuruProcedures.html for HOT_SPARE_MISSING-Recovery Failure Type Code: 203.

     

  3. Attempt to unassign the hot spare drive.

    CAM Browser interface:

    1. Select the array in the navigation tree on the left.
    2. Expand Physical Devices.
    3. Select Disks
    4. Navigate to and select the hot spare disk.
    5. From the Disk Details page, select the "Unassign Hot-Spare" button if it is enabled.

    NOTE: The fault should disappear within a few minutes of performing this update.  If the command errors or the fault does not clear within five minutes, please contact Oracle Support.


    CAM command line:

    # service -d <array_name> -c unassign -t tNNdYY


    where NN is the tray number and YY is the slot number of the drive in question.

    Service command is located:
    Solaris: /opt/SUNWsefms/bin
    Linux: /opt/sun/cam/private/fms/bin
    Windows: \Program Files\Sun\Common Array Manager\Component\fms\bin

    If the unassign option is not available, upgrade CAM to the latest version and try again.

    NOTE: The fault should disappear within a few minutes of performing this update.  If the command errors or the fault does not clear within five minutes, please contact Oracle Support.


    SANtricity:

    1. Open the Array Management Window for the array.
    2. Select Drive in the Menu bar.
    3. Select Hot Spare Coverage...
    4. Select View /change Hot Spare Coverage and OK.
    5. Select the Drive in the right pane that has (Missing) next to it.
    6. Click Unassign.

    SANtricity command line:

    # SMcli -n array_name -c 'set drive[trayID,slotID] hotSpare=FALSE;'


    NOTE: The fault should disappear within a few minutes of performing this update.  If the command errors or the fault does not clear within five minutes, please contact Oracle Support.


Internal Only:

If the resolution for this issue requires the use of serial port commands, an L2 collaboration is required.  If the array is running 7.x firmware on the controllers, it is a relatively quick process that does not require an outage but still needs to be confirmed by L2.  If the array is running 6.x firmware, please escalate to L2 as the steps to resolve are unique to each individual case and implementing the fix requires an outage.  Either way, the recovery steps can either be performed remotely or on site but must be done by an Oracle badged person.

    1. Establish a serial connection to one of the controllers, reference <Document 1400311.1>. Access to the shell requires a collaboration with L2.
    2. Load the debug module and then use the command vdmDrmShowMgr to get the missing or phantom hot spare.  The offending  devnum(s) will consist of some combination of up to eight 0's and f's:

      -> loadDebug
      value = o = 0x0
      -> vdmDrmShowMgr
      =================
      m_HSDrives in DRM
      =================

      Drive:0x453b408 devnum:0xffff role:Standby
      Drive:0x453b81c devnum:0x1030f role:Standby
      Drive:0x434da1c devnum:0x1050f role:Standby
      Drive:0x46f9800 devnum:0x1010f role:Standby
      ->

      Depending on how it came into existence, the phantom may also be seen in the output of vdmShowDriveList.  Using the devnum(s) found above, examine the matching line(s) from vdmShowDriveList.  The Tray/Slot column will have a 0/0 entry and on the left side of the State column there will be an NP.
    3. In the example above, the first entry is the missing of phantom hot spare drive.  It can be removed by using the deassignDrivesAsHotSpares_MT command and the devnum of the phantom:

      -> deassignDrivesAsHotSpares_MT 1,0xffff

      Multiple instances of phantom spares should be removed with a single execution of the command.
    4. Re-run the first command to confirm that the missing or phantom Hot Spare Drive is no longer listed and unload the debug module:

      -> vdmDrmShowMgr
      =================
      m_HSDrives in DRM
      =================
      Drive:0x453b81c devnum:0x1030f role:Standby
      Drive:0x434da1c devnum:0x1050f role:Standby
      Drive:0x46f9800 devnum:0x1010f role:Standby

      -> unld "Debug"
      -> value = 0 = 0x0


Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback