Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1019953.1
Update Date:2009-10-11
Keywords:

Solution Type  Troubleshooting Sure

Solution  1019953.1 :   Troubleshooting Sun StorEdge 6320[TM] Raid Controller Problems  


Related Items
  • Sun Storage 6320 System
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - 6xxx Arrays
  •  

PreviouslyPublishedAs
249667


Description
This document addresses the identification of failed or failing raid controllers in the array via various symptoms provided.
Symptoms:
  • SSRR reports failed raid controller
  • Storade reports a Device Alert on a failed raid controller
  • Email
  • Raid controller fault LED lit
  • Global fault LED lit


Steps to Follow
Please validate that each troubleshooting step below is true for your environment. The steps will provide instructions or a link to a document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.
1.  Validate the symptom from the symptom list above.

1a. If you received an email from the array or SSRR opened a Service Request
then go to Step 2.

1b. If you See a Device Alert, per the symptom set above,  then go to Step 5.

1c. If you noticed a fault LED on your raid controller or the global fault LED is lit, then go to Step 2.

1d.  If none of the above skip to Step 9.

2.  Validate the ability to log into your 6320.

Go to https://<array_IP>:9443, and log in. 
See chapter 2 of the Sun StorEdge 6320 System 1.2 Reference and Service Manual for details on logging into array. 

If you have trouble logging into the service processor on the 6320, Refer to Solution 249670: Troubleshooting Sun StorEdge 6320[TM] Loss Of management Access Faults
If you can log in, continue to Step 3.

3. Validate "Overall Health Status" box in Configuration Service page.

3a. If the status shows:  "Error"  Go to Step 4
3b. If the status shows:  "Ok"  Go to Step 5

4. Validate the existence of a StorADE Alarm against the raid controller by logging into the Storage Automated Diagnostic Environment (StorADE).

 use the following URL:
https://system_ip_address:7443

From the "Home" window that comes up, check the "Device Health Summary" to see if there are alerts listed. To verify that an alarm is related to a raid controller issue, select the "Alerts" link in the "Device Health Summary" and search for raid controller alarms. Selecting the alarm link will display details about the alarm.

4a.  If there's an alarm, go to Step 5
4b.  If there's no alarm, go to Step 9
4c.  If there's no alarm AND an LED lit on the raid controller or Global Fault Indicator, go to Step 5

5.  Validate raid controller status from a Detailed FRU Report.
  a) Login into STORade, see Step 4 for details, and select the "Reports" tab. 
  b) In the "Reports" window select "General Reports" and then "Fru Reports".
  c) From the "Fru Reports" window under "Select report to display or Email", select in the "Display" link in the "Detailed Fru Report" row.



5a.  If status is ready-disabled, go to Step 9
5b.  If status is ready-enabled, go to Step 6

6.  Validate LED and/or Alarm existence against controller in ready-enabled state.

6a.  If there is an Alarm AND a fault LED lit for the raid controller, go to Step 9.
6b.  If there is an Alarm and no fault LED lit for the raid controller, go to Step 8.
6c.  If there is an LED and NO Alarm for the raid controller, go to Step 9.
6d.  If there are no Alarms or LED's lit, go to Step 9.

8. Validate the RAID controller firmware version against minimums.

a) Log into Configuration Services
b) Click on the Administration Tab
c) Click on the General Link
d) The RAID controller firmware is in the lower portion of this screen for all configured arrays.

8a. If the raid controller firmware is below the version listed for T3+/6120 arrays in Solution  200077: Minimum supported releases for the Sun StorageTek T3+, 6120, 6320 and 6920,
   Clear the alarm for the controller in StorADE, and contact Sun Support to schedule an array upgrade for your unit.
8b. If the raid controller firmware is at or above the version listed in Solution  200077: Minimum supported releases for the Sun StorageTek T3+, 6120, 6320 and 6920, go to step 9

9.  At this point, if you have validated that each troubleshooting step above is true for your environment, and the issue still exists, further troubleshooting is required.  Please open a Service Request with Sun Microsystems.

Please include:

    * StorADE Alarm text if available
    * Statement of Symptoms you see that pertain to the raid controller
    * Solution Extract Refer to Solution 230665: Sun StorEdge[TM] 6320:How To:How to collect 6320 extractor output using StorADE command line or GUI
    * Status of raid controller as shown in the Detailed FRU report in Step 5
    * Email text received from the 6320 storage system.


Product
Sun StorageTek 6320 System
Sun StorageTek 6120/6320 Controller Firmware 3.2

Internal Comments
This is a continuation of steps that are not possible using the customer user interface to troubleshoot. If you have not performed steps 1-9 above, please do so first with the customer, then continue to step 10 below.

10. Validate message entries for ready-disabled controller

Review the <extractor>/Arrays/<array>/filesystem/syslog file, or the <extractor>/Sp/messages/var_adm/messages.array for the array in question, and check
whether there were any disk media errors prior to the controller fault.  Reference <Document: 1019954.1> : Troubleshooting Sun StorEdge 6320[TM] Disk Faults
  •   If there were disk media errors prior to the controller failing, enable the controller and replace the disk drive that reported errors.
  •   If there were NO disk media processes prior to the controller failing, verify that the controller firmware is at or above the version listed in Solution  200077: Minimum supported releases for the Sun StorageTek T3+, 6120, 6320 and 6920. Enable controller and schedule an upgrade to if not at minimum supported release.  This will require that an engineer be sent on site or if SSRR is enabled that you log into the affected array and issue "enable u<N>", where "N" is the affected controller.
  •   If there were NO disk media processes prior to the controller failing AND the array firmware is at the minimum supported release, go to Step 11
11.  Validate array against Sun Alert   <Document: 1019245.1> :   T3B and Sun StorEdge 6120
Arrays may go down unexpectedly and lose Host Connectivity after 994
days of Continuous Operation.


If array does not have the symptoms of 237605, continue to Step 12.
If the array does have the symptoms of 237605, follow the resolution path provided by the Sun Alert.

12.  Validate that the controller can boot successfully.

a) Enable the controller
b) Check the status of the controller against fru stat.  It should go to "ready enabled"
c) Wait a period of 5 minutes to ensure that the array controller is stable and stays in an optimal state. 

If the controller fails to reach a "ready enabled" state, replace the controller.
If the controller fails to reach a "ready enabled" state, and the controller has been replaced, continue to Step 13.
If the controller reaches "ready enabled", but does not stay that way for at least 5 minutes, continue to Step 13.

13. At this point, if you have validated that each troubleshooting step above is true for your environment, and the issue still exists, further troubleshooting is required.
Please Escalate to your next level of support with the following information:

  • Solution Extract.  Reference <Document: 1018865.1> : Sun StorEdge[TM] 6320:How to Collect an Extractor
  • Results of each step above
  • Array name
  • Whether a controller has been replaced
  • Location of controller
  • Current Status of controller
This document contains normalized content and is managed by the the Domain Lead
(s) of the respective domains. To notify content owners of a knowledge gap
contained in this document, and/or prior to updating this document, please
contact the domain engineers that are managing this document via the “Document
Feedback” alias(es) listed below:

[email protected]

The Knowledge Work Queue for this article is KNO-STO-MIDRANGE_DISK

6320,controller,reset,failed,RAID,read-disabled,normalized

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback