Troubleshooting Sun StorEdge [TM] 33x0/351x Disk Failures

Asset ID:	1-75-1008190.1
Update Date:	2011-05-13
Keywords:

Solution Type Troubleshooting Sure

Solution 1008190.1 : Troubleshooting Sun StorEdge [TM] 33x0/351x Disk Failures

Applies to:

Sun Storage 3320 SCSI Array
Sun Storage 3310 Array
Sun Storage 3510 FC Array
Sun Storage 3511 SATA Array
All Platforms

Purpose

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Storage Disk 3000 Series RAID Arrays

Symptoms:

'show disk' shows failed or missing drive(s)
'show logical drive' indicates failures

'show events' report ' Drive Recovered Error Reported' message
drive replacement failures
Amber drive LED indicating failure
multiple disk drive failures

Purpose/scope:

This document addresses troubleshooting disk devices in a StorEdge TM] 33x0/351x array.

Last Review Date

May 13, 2011

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details

Steps to Follow

NOTE: This is a sub-set of DocID 1011431.1: "Troubleshooting Sun StorEdge[TM] 33x0/351x Hardware."

Before replacing a failed drive, save the configuration settings to NVRAM as described in Section 6.2, Saving the NVRAM Configuration Settings in the:

Sun StorEdge 3000 Family FRU Installation Guide

or refer to:

Sun StorEdge[TM] 3000: Saving and Restoring NVRAM and Logical Drive Configuration (Doc ID 1012254.1)

Note: This document references steps for an array at 4.x firmware revision.

If array has 3.2x controller firmware, you will need to reference the appropriate revision of the Firmware User's Guide.

Troubleshooting Steps

Step 1 - Verify the physical disk status is ONLINE by issuing a sccli>show disks command or through the firmware interface steps described in:

Viewing the Status of a Physical Drive in the Sun StorEdge[TM] 3000 Family RAID Firmware 4.2x User's Guide
Refer to the Physical Drive Status Table for a list of all possible drive status.

Step 2 - Verify the logical drive status is GOOD, INITING, or REBUILDING by issuing the sccli>show logical drive command, or through the firmware interface steps described in:

Viewing the Logical Drive Status Table the Sun StorEdge[TM] 3000 Family RAID Firmware 4.2x User's Guide which also includes a list of possible logical drive states.

Step3 - view the eventlog or persistent eventlog to confirm the 'show disk' and 'show logical' output by running the

sccli> show events

or

sccli> show persistent-events

identifying any failed disk drives.

Step 4 - if more than one drive is in a MISSING or BAD state, or a logical drive is in a FATAL FAIL state:

Determine that this isn't a Redundant Loop Failure as described in :

Troubleshooting StorEdge [TM] 351x Redundant Loop Failures (Doc ID 1006856.1)

For 3510 arrays only, issue the sccli>show disks command to determine if the drives are Fujitsu drives and meet all the following criteria:

media scan is enabled and running
controller just reseated, replaced or failed
Drive firmware revisions are older than: MAP3147FC: , MAP3147FC: 1701, MAS3735FC: 0901, MAS3367FC: 0901

If all these conditions are met, resolution would be to upgrade the drive firmware to the latest revision available from MOS.

Step 5 - if more than one drive is in a MISSING or BAD state, or the logical drive is in a FATAL FAIL state, follow the steps described in:

Recovering From Fatal Drive Failure in the Sun StorEdge 3000 Family Installation, Operation, and Service Manual.

Step 6 - For multiple drives with an amber LED status, or if a logical drive is in an INCOMPLETE or DRV ABSENT state where there are more than 2 drives missing or failed, power cycle the array following the procedures in:

Section 2.2.5.2 Checking and Performing the Correct Power-up Sequence in the:

Sun StorEdge 3000 Family FRU Installation Guide

Step 7 - If the logical drive is in a degraded (DRV FAIL) state and you have one failed drive (BAD or ABSENT):

- for a Raid array (not JBOD): identify the failed drive by following the steps in Identifying the Defective Disk Drive in a Raid Array in in the:

Sun StorEdge 3000 Family FRU Installation Guide

-for a JBOD: identify the failed drive following the steps in Identifying the Defective Disk Drive in a JBOD Array in the:

Sun StorEdge 3000 Family FRU Installation Guide

Replace the identified failed drive, following the instructions in:

Removing a Defective Disk Drive in a RAID or JBOD Array

followed by:

Installing a New Disk Drive in a RAID or JBOD Array.

Step 8 - Verify the state of the new disk is FRMT, NEW, USED, or GOOD by following step 1 above.

Step 9 - Verify the state of the logical drive by following step 2 above to verify the status is either GOOD or REBUILDING.

- If the target logical drive status is GOOD, the spare disk is successfully protected and is now integrated into the logical drive, and the replacement disk drive is available to be assigned as a global spare.

See: Assigning a Disk Drive as a Spare .

- If the target logical drive status is DEGRADED, follow the steps:

Assigning a Disk Drive as a Spare and then initiate a REBUILD operation.

Step 10 - Run the command

sccli>show events

to determine if there are "Drive Recovered Error Reported" messages.

If this is true, refer to:

Sun StorEdge [TM] 351x FC: How to Handle "Drive Recovered Error Reported" and Other Disk Drive Messages (Doc ID 1008255.1)

Step 11 - If disk problems still persist, refer back to

Troubleshooting Sun StorEdge[TM] 33x0/351x Hardware (Doc ID 1011431.1)

Internal Comments
This document contains normalized content and is managed by the the Domain Lead(s) of the respective domains. To notify content owners of a knowledge gap contained in this document, and/or prior to updating this document, please contact the domain engineers that are managing this document via the "Document Feedback" alias(es) listed below:

[email protected]

3510, 3511, Drive Failure, disk failure, double drive failure, normalized, audited
Previously Published As
89045

Change History
Date: 2010-01-14
User Name: [email protected]
Action: Currency & Update

Attachments

This solution has no attachment