Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Sun Alert Sure Solution 1000370.1 : Sun StorEdge 3510 Arrays May Mark Disks as "bad" After Reporting Disk Errors
PreviouslyPublishedAs 200492 Product Sun StorageTek 3510 FC Array Bug Id <SUNBUG: 6357118> Date of Workaround Release 28-APR-2006 Date of Resolved Release 28-Mar-2008 One or more disk drive(s) may become disabled and the logical drive may transition to a "Fatal Fail" status. (see below for details) 1. Impact One or more disk drive(s) may become disabled and the logical drive may transition to a "Fatal Fail" status. It is possible that cached data may be written to the logical drive. If this occurs, pending write cache contents may be lost when the array is reset/power cycled. If the array is running 4.15F firmware, "Cache purged" messages will be logged. For previous firmware versions, cache contents may be lost without notification. 2. Contributing Factors This issue can occur on the following platform: SPARC Platform
for all current releases of controller firmware. 3. Symptoms If the described issue occurs, one or more disks may be disabled, perhaps in quick succession, especially under conditions of heavy I/O load. If running firmware 4.15F, there may be "0B/47" SCSI parity error messages in the event log. For previous firmware versions there are no specific error messages to identify this issue. 4. Workaround For array firmware 4.15F: On Sun StorEdge 3510 FC arrays with firmware 4.15F, an array reset could clear this issue. Upon proper array shutdown and reset, there is a possibility that the transient error condition causing disturbances in disk drive loop may not be present. In this case the disks could participate in array operations if the disks are good and the error was transient in nature. Documented procedure can then be followed to force the logical drive to become available. Note: Appropriate care should be taken to verify data consistency if the "cache purge" message was logged. For additional details on recovering a logical drive from a "Fatal Fail" state, see the "Sun StorEdge 3000 Family Installation, Operation, and Service Manual" and reference section 8.5 "Recovering From Fatal Drive Failure". ***IMPORTANT NOTE*** For array firmware prior to 4.15F: Upon array shutdown and reset, the "cache purged" warning message is only available in firmware 4.15F. Therefore, for firmware versions prior to 4.15F, the data consistency must be checked for any logical drive which has been recovered from a "fatal fail" state. Pending write cache data may have been lost without any warning message, if the cache was set in "write back" mode. Note: Array users should regularly monitor their arrays for messages in "persistent event log" and take actions to replace any faulty components. 5. Resolution There are no further updates planned for this Sun Alert document. If This Sun Alert notification is being provided to you on an "AS IS" basis. This Sun Alert notification may contain information provided by third parties. The issues described in this Sun Alert notification may or may not impact your system(s). Sun makes no representations, warranties, or guarantees as to the information contained herein. ANY AND ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE HEREBY DISCLAIMED. BY ACCESSING THIS DOCUMENT YOU ACKNOWLEDGE THAT SUN SHALL IN NO EVENT BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES THAT ARISE OUT OF YOUR USE OR FAILURE TO USE THE INFORMATION CONTAINED HEREIN. This Sun Alert notification contains Sun proprietary and confidential information. It is being provided to you pursuant to the provisions of your agreement to purchase services from Sun, or, if you do not have such an agreement, the Sun.com Terms of Use. This Sun Alert notification may only be used for the purposes contemplated by these agreements. Copyright 2000-2008 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved. Modification History 28-Mar-2008: Resolved Previously Published As 102329 Internal Comments
The Sun StorEdge 3510 array uses dual FC loops for communicating to disks. If any loop disturbance is observed including low signal quality, or error conditions on the disks which disrupts the loop transmission, then the disks will report a SCSI parity error (0B/47) condition. Current array firmware error recovery does not include using the alternate path to the disk drive. Current algorithm disables the drive upon an error condition on one path, (eg: scsi parity error conditions on one path itself) without trying the alternate path. Current array firmware needs improvement in drive error handling. The array development group is working on best possible approach to address this issue. Notes on recovery options and procedure: For array firmware 4.15F: 1. Depending on the type of logical drive (eg., RAID 5), when one or more drives are disabled, the logical drive could go into a "Fatal Fail" state. This means the logical drive is not usable and has crossed the failure tolerance limit (eg: In a RAID 5 more than one disk failure results in the array not being able to provide access to data). It is possible for the cached data to be written to this logical drive which is already acknowledged to the host as received. In this case, "cache purged" messages will be recorded to notify that the cached data belonging to the logical drive will be discarded when the system is reset/power cycled. 2. The array reset could clear the issue. Upon proper array shutdown and reset, there is a possibility that the transient error condition causing disturbances in the disk drive loop may not be present. In this case the disks could participate in the array operation, if the disks are good and the error was transient in nature. User can then follow documented procedure to force the logical drive to become available. Appropriate care should be taken to verify data consistency if the "cache purge" message was logged. In summary, the following is the behavior of the array with 4.15 firmware when a logical drive has gone into a "Fatal Fail" state.
Internal Contributor/submitter [email protected] Internal Eng Business Unit Group NWS (Network Storage) Internal Eng Responsible Engineer [email protected] Internal Services Knowledge Engineer [email protected] Internal Sun Alert Kasp Legacy ID 102329 Internal Sun Alert & FAB Admin Info Critical Category: Data Loss, Availability ==> Severe Significant Change Date: 2006-04-28 Avoidance: Workaround Responsible Manager: [email protected] Original Admin Info: [WF 28-Apr-2006, Jeff Folla: Sent for release.] [WF 27-Apr-2006, Jeff Folla: Sent for review.] [WF 26-Apr-2006, Jeff Folla: Sent to submitter and responsible engineer for review.] Product_uuid 58553d0e-11f4-11d7-9b05-ad24fcfd42fa|Sun StorageTek 3510 FC Array Attachments This solution has no attachment |
||||||||||||
|