Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Sun Alert Sure Solution 1000319.1 : Data Inconsistencies May Occur When Persistent SCSI Parity Errors are Generated Between the Host and the SE33x0 Array
PreviouslyPublishedAs 200437 Product Sun StorageTek 3310 SCSI Array Sun StorageTek 3320 SCSI Array Bug Id <SUNBUG: 6363490>, <SUNBUG: 6378796> Date of Workaround Release 12-JAN-2006 Date of Resolved Release 13-Mar-2008 Data Inconsistencies May Occur When Persistent SCSI Parity Errors are Generated Between the Host and the SE33x0 Array 1. Impact When the connection between the SE33x0 array and the host has degraded to the point that WRITE requests cannot be completed due to connectivity issues, persistent SCSI parity errors may be generated between the host and the SE33X0 array and data inconsistencies may occur. 2. Contributing FactorsThis issue can occur on the following platforms:
SCSI parity errors can cause invalid data to get written into the array's cache. Prior to firmware version 4.15, this data eventually gets flushed to the disk media, permanently storing this invalid data on the volume. Firmware version 4.15 was modified to discard this corrupted data rather than write it to disk media. This reduces the probability of corrupting the volume. However, in the rare case where the write command overlapped a prior write command's data that still resided in cache, that data will also be discarded. Single Path Configurations Configurations in which a host has only one path to one or more logical units on the array are exposed to this problem. This is because there is no redundant path between the host and the SE33x0 array. This lack of redundancy does not allow for a retry using a second path to the SE33x0 array. When using firmware version 4.15 in this configuration, if any write commands failed due to parity errors, there is a possibility of lost write data in cache if the application or file system issued writes to overlapping LBAs. When using older firmware in this configuration, the data for LBAs of any WRITE request that cannot be completed as a result of a PARITY ERROR returned by the SE33x0 should be considered to have invalid data. Multi Path/High Availability Configurations The exposure for a properly configured High Availability configuration using a host multi-pathing driver and and multiple separate connections between the host(s) and the SE33x0 array is very small. In this configuration, the multi-pathing driver in the host will utilize the second, non-compromised path to the array controller to retry the WRITE request. A successful retry will successfully write the intended data to the correct LBAs with the following exceptions: 1. If the SE33x0 array or the host experiences a power failure between the failed WRITE request and the successful completion of the retry down the second path, the data for the failed WRITE request should be considered invalid. 2. If the Host OS experiences a crash or a multi-path driver error between the failed WRITE request and the successful completion of the retry down the second path, the data for the failed WRITE request should be considered invalid. 3. SymptomsShould the described issue occur, persistent SCSI parity errors between the host and the SE33x0 array will be generated. The SE33x0 array will return a SCSI status of "Parity Error" to the host SCSI Host Bus Adapter (HBA). Typically, the host SCSI HBA will retry the WRITE request some number of times (most drivers attempt between 2 to 6 retries) before returning the WRITE request to the application with a FAILURE status. 4. WorkaroundThere is no workaround for this issue. Please see the resolution section below. 5. ResolutionThe issue described in BugID 6363490 is addressed on the following platforms:
Note: Insure that SCSI connections are reliable and properly configured to minimize the probability of parity errors and use multiple SCSI connections with failover drivers. Because the nature of the changes would require a major redesign, the issue described in BugID 6378796 was closed as "will not fix." This Sun Alert notification is being provided to you on
an "AS IS"
basis. This Sun Alert notification may contain information provided by
third parties. The issues described in this Sun Alert notification may
or may not impact your system(s). Sun makes no representations,
warranties, or guarantees as to the information contained herein. ANY
AND ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR
NON-INFRINGEMENT, ARE HEREBY DISCLAIMED. BY ACCESSING THIS DOCUMENT YOU
ACKNOWLEDGE THAT SUN SHALL IN NO EVENT BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES THAT ARISE OUT
OF YOUR USE OR FAILURE TO USE THE INFORMATION CONTAINED HEREIN. This
Sun Alert notification contains Sun proprietary and confidential
information. It is being provided to you pursuant to the provisions of
your agreement to purchase services from Sun, or, if you do not have
such an agreement, the Sun.com Terms of Use. This Sun Alert
notification may only be used for the purposes contemplated by these
agreements.
Modification History 14-Jun-2006: Updated Contributing Factors and Resolution Sections 13-Mar-2008: Updated Resolution section - RESOLVED References<SUNPATCH: 113722-15><SUNPATCH: 113730-01> Previously Published As 102128 Internal Comments Please send technical questions to the following email: [email protected] and CC the following persons: Internal Contributor/Submitter Internal Eng Responsible Engineer Internal Services Knowledge Engineer The following Sun Alerts have information about other known issues for the 3000 series products: 102011 - Sun StorEdge 33x0/3510 Arrays May Report a Higher Incidence of Drive Failures With Firmware 4.1x SMART Feature Enabled 102067 - Sun Cluster 3.x Nodes May Panic Upon Controller Failure/Replacement Within Sun StorEdge 3510/3511 Arrays 102086 - Failed Controller Condition May Cause Data Integrity Issues 102098 - Insufficient Information for Recovery From Double Drive Failure for Sun StorEdge 33x0/35xx Arrays 102126 - Recovery Behavior From Fatal Drive Failure May Lead to Data Integrity Issues 102127 - Performance Degradation Reported in Controller Firmware Releases 4.1x on Sun StorEdge 3310/351x Arrays for All RAID Types and Certain Patterns of I/O 102128 - Data Inconsistencies May Occur When Persistent SCSI Parity Errors are Generated Between the Host and the SE33x0 Array 102129 - Disks May be Marked as Bad Without Explanation After "Drive Failure," "Media Scan Failed" or "Clone Failed" Events Note: One or more of the above Sun Alerts may require a Sun Spectrum Support Contract to login to a SunSolve Online account. As referenced in bug 6363490, this issue may occur with a faulty cable where possibly a toggled bit on the upper 8-bit of the cable has occurred. A firmware release is scheduled that will ensure that data that has been compromised by the host SCSI HBA to the SE33X0 array controller will not be flushed to media. When this fix is delivered, there are cases in which stale data may remain in the SE33X0 array for LBAs that correspond to the failed WRITE Request. Subsequent firmware release is scheduled that will ensure that the data that has been compromised by the host SCSI HBA to the SE33X0 array will not be flushed to media AND EITHER: A. The most current version of data successfully written to the SE33X0 array by the host may be read from the SE33X0 array OR: B. A host READ request for data that could not be recovered will be returned with an error indicating MEDIA ERROR for the LBAs that can not be recovered. A successful WRITE request to the LBAs corresponding to the MEDIA ERROR will allow these LBAs to be read correctly again. Internal Contributor/submitter [email protected] Internal Eng Responsible Engineer [email protected] Internal Services Knowledge Engineer [email protected] Internal Eng Business Unit Group NWS (Network Storage) Internal Escalation ID 1-12074354 Internal Resolution Patches 113722-15, 113730-01 Internal Sun Alert & FAB Admin Info Critical Category: Data Loss Significant Change Date: 2006-01-12 Avoidance: Patch Responsible Manager: [email protected] Original Admin Info: [WF 14-Jun-2006, Dave M: updating for FW patches] [WF 15-May-2006, Dave M: pending patch number given] [WF 05-Jan-2006, Dave M: sent for review 04-Jan, reviews returned, Chessin suggestions/changes made, all docs on hold until Exec approval pending 1/12] [WF 04-Jan-2006, Dave M: final edits before sending to tech review] [WF 02-Jan-2006, Dave M: draft created] Internal Sun Alert Kasp Legacy ID 102128 Product_uuid 3db30178-43d7-4d85-8bbe-551c33040f0d|Sun StorageTek 3310 SCSI Array 95288bce-56d3-11d8-9e3a-080020a9ed93|Sun StorageTek 3320 SCSI Array ReferencesSUNPATCH:113722-15SUNPATCH:113730-01 Attachments This solution has no attachment |
||||||||||||
|