On StorageTek 2501/2530/2540 one or both controllers I/O to disk drives may timeout until drives are disabled.

Asset ID:	1-73-1001057.1
Update Date:	2010-11-03
Keywords:

Solution Type FAB (standard) Sure

Solution 1001057.1 : On StorageTek 2501/2530/2540 one or both controllers I/O to disk drives may timeout until drives are disabled.

Related Items


Sun Storage 2530 Array
 Sun Storage 2540 Array

Related Categories


GCS>Sun Microsystems>Sun FAB>Standard>Reactive

PreviouslyPublishedAs
201384

Product
Sun StorageTek 2530 Array
Sun StorageTek 2501
Sun StorageTek 2540 Array

Bug Id
<SUNBUG: 6544466>

Impact

This issue results in the loss of drive(s) which would, at a minimum, put the associated volumes into a degraded state. The loss of several drives can cause the associated volumes to be taken offline, leading to a loss of availability.

During 25xx beta testing, one customer experienced a drive disabled event. Based on analysis by Sun Product Engineering, it is anticipated that Sun Service may encounter 1-3 customers during the first quarter of shipments who may experience this particular issue.

Contributing Factors

Products:

StorageTek 2501
StorageTek 2530
StorageTek 2540

This is a new product (expected FCS in mid-May) to the Sun StorageTek Entry Disk Portfolio. The Sun System Handbook product page for these will not be available until late May 2007. In the interim, please reference the following TSC Backline webpage for these new products, along with the SSH page when it becomes available:

http://pts-storage.west/products/ST25xx/

https://support.us.oracle.com/handbook_internal/Systems/2540/2540.html

Symptoms

These are the symptoms and how to identify this issue:

For MEL Events:

1) Clusters of the following event types occur repeatedly:

A) Check condition events coming back from the drive(s):

     Event type: 100A
Event category: Error
Priority: Informational
Description: Drive returned CHECK CONDITION
Event specific codes: 6/2a/2
     Component type: Drive

B) Drive side timeout events

     Event type: 100D
Event category: Error
Priority: Informational
Description: Timeout on drive side of controller
Event specific codes: 0/0/0

2) Eventually the drive gets failed and at least one of the following events will be logged:

Event type: 2217
Event category: Notification
Priority: Informational
Description: Piece failed
Event specific codes: 0/0/0
Component type: Drive
Event type: 2216
Event category: Notification
Priority: Informational
Description: Piece taken out of service
Event specific codes: 0/0/0
Component type: Drive
Event type: 2215
Event category: Notification
Priority: Informational
Description: Drive marked failed
Event specific codes: 0/0/0
Component type: Drive

Root Cause

Engineering is currently trying to determine what conditions are required for the array to enter into this state. Currently it appears as though one of the back-end SAS drive channels is marginally functioning and causing the array's error recovery procedures to be executed at an abnormally high frequency.

Workaround

The recovery requires the drives to be reconstructed.

Collect support data and escalate to TSC-Storage Backline who maintain an onsite service procedure which may be required for recovery, and would be implemented with live support/guidance from TSC Backline. Do not power cycle or otherwise modify the state of the array. Based upon the support data collected, TSC-Storage Backline will provide service personnel with the steps to:

1) Clear the condition

2) Recover any volumes that were taken off line due to the condition

3) Reinstate and rebuild any drives that were failed due to the condition

Resolution

A final resolution is pending completion. Please use CR 6544466 to track the final resolution as this document may not be updated.

Previously Published As
102907
Internal Contributor/submitter
[email protected]

Internal Eng Business Unit Group
NWS (Network Storage)

Internal Eng Responsible Engineer
[email protected]

Internal Services Knowledge Engineer
[email protected]

Internal Kasp FAB Legacy ID
102907

Internal Sun Alert & FAB Admin Info
Critical Category:
Significant Change Date: 2007-05-08
Avoidance: Service Procedure
Responsible Manager: [email protected]
Original Admin Info: WF submitted on 02 May 2007. I will send to review today - karen.

Attachments

This solution has no attachment