Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type FAB (standard) Sure Solution 1001057.1 : On StorageTek 2501/2530/2540 one or both controllers I/O to disk drives may timeout until drives are disabled.
PreviouslyPublishedAs 201384 Product Sun StorageTek 2530 Array Sun StorageTek 2501 Sun StorageTek 2540 Array Bug Id <SUNBUG: 6544466> Impact This issue results in the loss of drive(s) which would, at a minimum, put the associated volumes into a degraded state. The loss of several drives can cause the associated volumes to be taken offline, leading to a loss of availability. During 25xx beta testing, one customer experienced a drive disabled event. Based on analysis by Sun Product Engineering, it is anticipated that Sun Service may encounter 1-3 customers during the first quarter of shipments who may experience this particular issue. Contributing Factors Products:
This is a new product (expected FCS in mid-May) to the Sun StorageTek Entry Disk Portfolio. The Sun System Handbook product page for these will not be available until late May 2007. In the interim, please reference the following TSC Backline webpage for these new products, along with the SSH page when it becomes available: http://pts-storage.west/products/ST25xx/ https://support.us.oracle.com/handbook_internal/Systems/2540/2540.html Symptoms These are the symptoms and how to identify this issue: For MEL Events: 1) Clusters of the following event types occur repeatedly: A) Check condition events coming back from the drive(s): Event type: 100A Event category: Error Priority: Informational Description: Drive returned CHECK CONDITION Event specific codes: 6/2a/2 Component type: Drive B) Drive side timeout events Event type: 100D Event category: Error Priority: Informational Description: Timeout on drive side of controller Event specific codes: 0/0/0 2) Eventually the drive gets failed and at least one of the following events will be logged: Event type: 2217 Event category: Notification Priority: Informational Description: Piece failed Event specific codes: 0/0/0 Component type: Drive Event type: 2216 Event category: Notification Priority: Informational Description: Piece taken out of service Event specific codes: 0/0/0 Component type: Drive Event type: 2215 Event category: Notification Priority: Informational Description: Drive marked failed Event specific codes: 0/0/0 Component type: Drive
Root Cause Engineering is currently trying to determine what conditions are required for the array to enter into this state. Currently it appears as though one of the back-end SAS drive channels is marginally functioning and causing the array's error recovery procedures to be executed at an abnormally high frequency. Workaround The recovery requires the drives to be reconstructed. Collect support data and escalate to TSC-Storage Backline who maintain an onsite service procedure which may be required for recovery, and would be implemented with live support/guidance from TSC Backline. Do not power cycle or otherwise modify the state of the array. Based upon the support data collected, TSC-Storage Backline will provide service personnel with the steps to: 1) Clear the condition 2) Recover any volumes that were taken off line due to the condition 3) Reinstate and rebuild any drives that were failed due to the condition Resolution A final resolution is pending completion. Please use CR 6544466 to track the final resolution as this document may not be updated. Previously Published As 102907 Internal Contributor/submitter [email protected] Internal Eng Business Unit Group NWS (Network Storage) Internal Eng Responsible Engineer [email protected] Internal Services Knowledge Engineer [email protected] Internal Kasp FAB Legacy ID 102907 Internal Sun Alert & FAB Admin Info Critical Category: Significant Change Date: 2007-05-08 Avoidance: Service Procedure Responsible Manager: [email protected] Original Admin Info: WF submitted on 02 May 2007. I will send to review today - karen. Attachments This solution has no attachment |
||||||||||||
|