Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1147857.1
Update Date:2010-08-16
Keywords:

Solution Type  Troubleshooting Sure

Solution  1147857.1 :   Troubleshooting Sun StoreEdge[TM] 6920 Disk Faults  


Related Items
  • Sun Storage 6920 System
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - 6xxx Arrays
  •  




In this Document
  Purpose
  Last Review Date
  Instructions for the Reader
  Troubleshooting Details


Applies to:

Sun Storage 6920 System - Version: Not Applicable and later   [Release: NA and later ]
Information in this document applies to any platform.

Purpose

This document addresses the identification of failed or failing disk drive(s) in the array via various symptoms provided.

Symptoms:

  • SSRR  disk.u#d# in 6020 array## is 'fault-disabled'
  • SSRR disk.u2d3 in sp0-array03 Not-Available ready-enabled->fault-enabled
  • SSRR disk.u2d2.fruDiskPort2State on sp0-array00 change 'ready' to 'bypass'
  • Storade reports disk fault
  • Performance degraded
  • Disk Fault LED lit/on
  • Global Fault LED lit/on
  • Email of SSRR alarm
  • SSRR reports PFA(predictive failure)

Last Review Date

July 12, 2010

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details

1.  Validate you have the ability to log into the array user interface via the Browser, and collect a Solution Extract

Reference <Document: 1003756.1> How to Collect and Extractor from a Sun StorEdge 6920[TM]

  • If you are not able to log into the Service Processor via the web,  Reference <Document: 1012833.1> Troubleshooting Sun Storedge 6920[TM] Browser and sscs(1M) Access Problems
  • Otherwise continue to Step 2.

2.  Verify the drive status in the extractor.

a) Review the extractor:

  1. Login to SP
  2. Click on Storage Automated Diagnostic Environment
  3. Click on Administration Tab -> Utilities Tab -> Solution Extract Tab
  4. Your solution extract from Step 1, should be listed, click on the View Content link.
  5. Look at the Arrays section(this is usually at the top.
  6. Click on the arrayNN/commands/fru_stat output.


The Arrays section comprises output from each 6020 tray in the 6920 Array.  You should be able to get the array name in question from the Alarm, or email notification you were sent.


b) Check the drive status:


CTLR    STATUS   STATE       ROLE        PARTNER    TEMP
------  -------  ----------  ----------  -------    ----
u1ctr   ready    enabled     master      u2ctr      41.0
u2ctr   ready    enabled     alt master  u1ctr      37.0

DISK    STATUS   STATE       ROLE        PORT1      PORT2      TEMP  VOLUME
------  -------  ----------  ----------  ---------  ---------  ----  ------
u1d1    ready    enabled     data disk   ready      ready      26    v0
u1d2    ready    enabled     data disk   ready      ready      35    v0
u1d3    ready    enabled     data disk   ready      ready      42    v0


  •   If status is ready-enabled, go to Step 5.
  •   If status is substituted, go to Step 7 .
  •   If status is ready-disabled,  go to Step 3.
  •   If status is fault-disabled, go to Step 4.
  •   If there are more than one drive in a state other than ready-enabled, please open a service call with Oracle, and supply the Solution Extract.  Reference <Document: 1003756.1> How to Collect and Extractor from a Sun StorEdge 6920[TM]
3. Validate local and/or global hot spare presence and state.

Verify the presence and status of a hotspare by looking at BOTH:
a) vol_list to confirm the existence of local hotspare under the column "standby"
b) global_standby_list_uN to confirm the existence of global hotspare


example of vol list


============================
| COMMAND: vol list
============================
volume capacity raid data standby
tray0_pool1 202.335 GB 0 u1d01-06 none
tray1_pool1 202.335 GB 0 u2d01-06 none
tray2_pool1 168.613 GB 5 u3d01-06 u3d14
tray3_pool1 168.613 GB 5 u4d01-06 u4d14


example of global_standby list


============================
| COMMAND: global_standby list u1
============================
Global Standby Substituted Drive
u1d14 -
-- end



    ▪    If a hot spare is present, go to Step 4, to verify if a reconstruction is in progress.
    ▪    If there are no hot spares configured, go to Step 5.


4.  Verify the presence of an ongoing reconstruction by:

Look at the proc_list output, to confirm the existence of a process vol recon


Example:
myarray:/:<1>proc list
VOLUME          CMD_REF PERCENT    TIME COMMAND
 tray0_pool1             21568      74 53928:47 vol verify
 tray1_pool2             25666      27  178:04 vol recon  <--- reconstruction process.


  • If there is no reconstruction process, AND the drive is ready-disabled, please open a service call with Oracle, and supply the Solution Extract.  Reference <Document: 1003756.1> How to Collect and Extractor from a Sun StorEdge 6920[TM]
  • If there is no reconstruction process, AND the drive is fault-substituted, go to Step 7.
  • If a reconstruction is ongoing, allow it to complete before preceding, and re-evaluate the drive status in Step 2.
  • If there is no ongoing reconstruction and the drive isn't in a fault-substituted state, please open a service call with Oracle, and supply the Solution Extract.  Reference <Document: 1003756.1> How to Collect and Extractor from a Sun StorEdge 6920[TM]

5.  Check the status of the volume associated to the drive by executing the command vol stat.


myaray:/:<3>vol stat

v0            u1d1   u1d2   u1d3   u1d4   u1d5   u1d6   u1d7   u1d8   u1d9
mounted        0      0      0      0      0      0      0      0      0  
myarray:/:<4>


  • If the volume is "mounted", but more than one drive has a non-zero status, please open a service call with Oracle, and supply the Solution Extract.  Reference <Document: 1003756.1> How to Collect and Extractor from a Sun StorEdge 6920[TM]
  • If volume is "unmounted", you have sustained a drive failure beyond the capabilities of your RAID level for the volume.
  • Otherwise continue to Step 6.

6.  Validate LED existence against disk drive in ready-enabled state.

  • If there is an amber fault LED or any other LED lit for the disk drive, please open a service call with Oracle, and supply the Solution Extract.  Reference <Document: 1003756.1> How to Collect and Extractor from a Sun StorEdge 6920[TM]
  • If there is no LED's lit.  You have verified that the disk drive is healthy. 


7.  You have validated that a drive has failed in the array, and requires replacement. 

Please collect a Solution Extract and supply it to Oracle.

Reference <Document: 1003756.1> How to Collect and Extractor from a Sun StorEdge 6920[TM]

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback