Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1021055.1
Update Date:2011-06-09
Keywords:

Solution Type  Troubleshooting Sure

Solution  1021055.1 :   Troubleshooting Sun Storage[TM] 2500 and 6000 RAID Array Disk Failures  


Related Items
  • Sun Storage 6180 Array
  •  
  • Sun Storage 6580 Array
  •  
  • Sun Storage 2540-M2 Array
  •  
  • Sun Storage 2540 Array
  •  
  • Sun Storage 6780 Array
  •  
  • Sun Storage 2510 Array
  •  
  • Sun Storage 6140 Array
  •  
  • Sun Storage 2530-M2 Array
  •  
  • Sun Storage 2530 Array
  •  
  • Sun Storage 6540 Array
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - 6xxx Arrays
  •  
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - 2xxx Arrays
  •  

PreviouslyPublishedAs
270029


Applies to:

Sun Storage 2510 Array - Version: Not Applicable and later   [Release: N/A and later ]
Sun Storage 2530 Array - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 2540 Array - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 6180 Array - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 6540 Array - Version: Not Applicable and later    [Release: N/A and later]
All Platforms

Purpose

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - 6000 and 2500 Series RAID Arrays

The purpose of this document is to help troubleshoot disk failure symptoms on StorageTek, Sun StorEdge, Sun StorageTek, and Sun Storage arrays.

Symptoms
  • SANtricity Storage Manager shows an alert for Failed Hot Spare or Unassigned Drive.
  • Common Array Manager shows an alarm for Failed Drive (alarm ID xx.66.1023).
  • SANtricity Storage Manager shows an alert for Volume Degraded - Failed Drive.
  • Common Array Manager shows an alarm for Volume Degraded (alarm ID xx.66.1013).
  • SANtricity Storage Manager shows an alert for Volume Failed.
  • Common Array Manager shows an alarm for Volume Failed (alarm ID xx.66.1017).
  • SANtricity Storage Manager shows an alert for Volume Failed - Interrupted Write.
  • Common Array Manager shows an alarm for Volume Failed Intterupted Write (alarm ID xx.66.1014).
  • SANtricity Storage Manager shows an alert for Volume Failed - Awaiting Initialization.
  • Common Array Manager shows an alarm for Volume Failed - Awaiting Initialization (alarm ID xx.66.1020).
  • SANtricity or Common Array Manager show a critical fault for Impending Drive Failure Risk Low (xx.66.1026).
  • SANtricity or Common Array Manager show a critical fault for Impending Drive Failure Risk Medium (xx.66.1025).
  • SANtricity or Common Array Manager show a critical fault for Impending Drive Failure Risk High (xx.66.124).
  • SANtricity or Common Array Manager show a critical fault for Drive Bypassed, reason not specified (xx.66.1064).
  • SANtricity or Common Array Manager show a critical fault for Drive Bypassed, Single Port (xx.66.1119)
  • Drive Path Degraded (xx.66.1076).
  • An amber LED is lit on one or more drives in the storage system.
Please validate that each troubleshooting step below is true for your environment. Each step will provide instructions via a link to a document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.

Last Review Date

June 14, 2010

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details

1.  Verify whether there are multiple Critical Faults Seen by the array

Use the user interface to verify the list of critical faults, and the details of each fault.  Verify that there is only a single disk drive failure on the array.  For the purposes of this investigation, other
critical alerts can be ignored for now, although you may want to review them after troubleshooting your drive fault.

Reference <Document: 1021057.1>  Verify Sun StorageTek[TM] 2500 and Sun Storage[TM] 6000 Critical Faults via the User Interface.

Check the list below as to whether you have one or more of the following critical faults listed:

Failed Unassigned Drive or Hot Spare or
Drive Tray.XX.Drive.YY failed
Degraded Volume detected or Degraded Volume
Hot Spare In Use
Failed Volume - Drive Failure or Failed Volume detected
Drive Bypassed, reason unknown/REC_DRIVE_BYPASSED_CAUSE_UNKNOWN
Impending Drive Failure Risk Medium/REC_IMPENDING_DRIVE_FAILURE_RISK_MED
Impending Drive Failure Risk Low/REC_IMPENDING_DRIVE_FAILURE_RISK_LOW
Impending Drive Failure Risk High/REC_IMPENDING_DRIVE_FAILURE_RISK_HIGH
Channel Path xx is Degraded for Drive/REC_PATH_DEGRADED

Drive Bypassed, Single Port (xx.66.1119)
  • If there are more than one fault, go to Step 2.
  • If there is only a single fault, go to Step 3.
  • If ANY of these are for Failed Volume, go to Step 7.
  • If NO faults are listed as above, go to Step 7.
2.  Verify whether the critical faults are for the same drive

There may be three or four faults for the same drive depending on firmware revisions, which is normal.

Compare the following list of faults to the list of faults that you have:

Impending Drive Failure Risk Medium
Impending Drive Failure Risk Low
Impending Drive Failure Risk High
Degraded Volume detected or Degraded Volume
Hot Spare in Use
Drive Bypassed, reason unknown, or Drive Bypassed
Channel Path xx is Degraded for Drive/REC_PATH_DEGRADED

Drive Bypassed, Single Port

Of the faults listed above, there should only be a single entry for any one of these, where the drive location is the same in each fault.

For the faults in the array, look at the details of the fault and determine the failed drive.
  • If the faults are for the same disk drive, then it should be replaced, continue to Step 5.
  • If the faults are for different disk drives, then you have multiple drive faults for your disk drive.  This will require further analysis, please continue to Step 7.
3.  Verify the Critical Fault Seen by the Array

Use the user interface to verify the list of critical faults, and the details of each fault.  Verify that there is only a single disk drive failure on the array.  For the purposes of this investigation, other
critical alerts can be ignored for now, although you may want to review them after troubleshooting your drive fault.

Reference <Document: 1021057.1>  Verify Sun StorageTek[TM] 2500 and Sun Storage[TM] 6000 Critical Faults via the User Interface.

  • If there is a single Hot Spare in Use, but you have already replaced your drive and it has not copied back from it's Global Hotspare, go to Step 6.
  • If there are one or more Impending Failure faults, Reference <Document: 1103184.1> Troubleshooting Sun Storage[TM] Array Impending Drive Failures.
  • If there is a single critical fault of Failed Volume - Drive Failure or Failed Volume detected go to Step 7.
  • If there is a single critical fault listed as:  Failed Unassigned Drive or Hot Spare or Drive Tray.XX.Drive.YY failed, a drive failure occurred due to the array's periodic media scan.  go to Step 5.
  • If there is a single, critical fault for Drive Bypassed, Single Port, or Channel Path xx for Drive, you will need to manually fail the drive, prior to replacement.  Go to Step 5.
  • If there is a single critical fault listed as:  Degraded Volume detected or Degraded Volume, the data on the volume is accessible, but has sustained one or more drive faults.  Go to Step 4.
  • If there is a single critical fault states Hot Spare In Use go to Step 5.

4.  Verify that there are no other assigned drives failed in the Degraded Volume Fault

There will be ONE degraded volume fault for each VDisk or Volume Group affected by the drive failure. That may mean that for RAID 1 and RAID 6 configurations, multiple drives
can be listed in the fault. We need to make sure that only one drive has failed in your VDisk/Volume Group.

  • If there is a Degraded Volume detected or Degraded Volume fault, but more than one drive listed in the fault, go to Step 7.
  • If there is a Degraded Volume detected or Degraded Volume fault, and only a single drive listed in the fault, go to Step 5.

5.  Identify Drive Model for Alert 1300555.1

For drive model Reference:  <Document:1021060.1>   Verify Sun Storage[TM] Array Drive Model Information via the User Interface.
  • If the model is a ST330055SSUN300G or ST330055FSUN300G, please reference Alert <Document: 1300555.1>  Replacement of Drives with Mechanical Positioning Errors May Cause RAID Controllers Reset or Lockdown Unexpectedly, for instructions on how to handle these drives.  Contact Oracle for the drive replacement.
  • Otherwise contact Oracle for the drive replacement.
Reference <Document:1002514.1> Collecting Support Data for Arrays Using Sun StorageTek[TM] Common Array Manager.
Reference <Document:1014074.1> Collecting Support Data for Arrays Using Sun StorageTek[TM] SANtricity Storage Manager.

6.  Verify your firmware revision, and review against document 1020689.1

If your drive has not copied back from Hot Spare, the reason may depend on the revision of firmware and the circumstances of why the drive was failed.  Verify your array firmware through the user interface.

Then check this against <Document:1020689.1 > Global Hot Spare Copyback Function Changes for Sun StorageTek[TM] 2500, 6140, 6540, 6580, 6780 and StorageTek[TM] Flexline 380.
  • If the document did not resolve your issue, go to Step 7.
7. Open a call for further analysis

You have indicated that you may be suffering from multiple drive failures on your array.
Please supply:
  • Critical Faults.
  • Support Data Collection.
Reference <Document:1002514.1> Collecting Support Data for Arrays Using Sun StorageTek[TM] Common Array Manager.
Reference <Document:1014074.1> Collecting Support Data for Arrays Using Sun StorageTek[TM] SANtricity Storage Manager.

At this point, if you have validated that each troubleshooting step above is true for your environment, and the issue still exists, further troubleshooting is required. Please contact Oracle Support.



Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback