Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Sun Alert Sure Solution 1000681.1 : Failed Controller Condition May Cause Data Integrity Issues
PreviouslyPublishedAs 200893 Product Sun StorageTek 3310 SCSI Array Sun StorageTek 3510 FC Array Sun StorageTek 3320 SCSI Array Sun StorageTek 3511 SATA Array Bug Id <SUNBUG: 6355818> Date of Workaround Release 13-DEC-2005 Date of Resolved Release 15-JUN-2006 Impact On a Sun StorEdge 33x0/35xx array, when a failed RAID controller condition exists and the array is power cycled, data integrity issues may occur. Contributing Factors This issue can occur on the following platforms:
Note: This issue can occur with all current firmware revisions available for the Sun StorEdge 33x0/35xx arrays. This issue can occur when a default primary RAID controller failure condition exists and the array is power cycled during that time, resulting in stale cache data (contained in the failed controller) being written unexpectedly to disk. Note: The default primary RAID controller is the controller with the higher serial number. This can be determined via the CLI by using the sccli syntax "sccli> show redundancy," and is not the serial number on the back of the FRU. Symptoms Upon power cycling the array, the failed controller comes online and the existing filesystems on the array report fsck(1M) or other data integrity issues. Workaround The failed controller's cache needs to be discarded or the failed controller must be removed from the array prior to resetting or power cycling the array. Note: Always replace the failed controller with the power on (the array can be power cycled with just one controller). Scenario 1 - Spare controller available: With the power on and the array operational on one controller, install the replacement controller for the failed controller. (Removing the failed controller with power on before the array is power cycled will not allow any stale data from the failed controller to be written out). Scenario 2 - Spare controller unavailable: Option 1: Assumption is you have sccli in-band or out-of-band access to the array. Unfail the controller using sccli syntax "sccli> unfail." The failed controller's cache will be discarded when the controller is put back on-line as a Secondary controller. If this command fails, follow Option 2. Option 2: Prior to resetting or power cycling the array, remove the failed controller, and then remove the battery module for at least 5 seconds on the failed controller and reinsert battery to invalidate the cache on the failed controller. To maintain proper air flow, partially reinsert controller until it is 1 inch from full reseating location. Please refer to Sun documentation at: http://docs.sun.com/app/docs/doc/816-7326-20 Please refer to the corresponding documents for the required firmware levels to identify failed controllers: "Sun StorEdge 3000 Family Installation, Operation, and Service Manual" collection at http://www.sun.com/products-n-solutions/hardware/docs/Network_Storage_Solutions/Workgroup/index.html and the "Sun StorEdge 3000 Family CLI User's Guide" at http://www.sun.com/products-n-solutions/hardware/docs/html/817-4951-14 Resolution This issue is addressed on the following platforms:
Modification History 12-Jan-2006: Updated Contributing Factors and Relief/Workaround 25-Apr-2006: Updated Contributing Factors and Resolution sections 15-Jun-2006: Updated
Contributing Factors and Resolution sections 22-Jul-2010: Document republished as originally posted (2006) References<SUNPATCH: 113723-15><SUNPATCH: 113722-15> <SUNPATCH: 113730-01> <SUNPATCH: 113724-09> Previously Published As 102086 and 200893 (IBIS) Internal Comments Patches for these firmware releases were developed across all products (all arrays impacted by these issues). Therefore, some of the @patch READMEs may not reflect the BugID listed in the SunAlert, but the firmware patch listed for each product does in fact remedy @the issue for the platforms specified. The following Sun Alerts have information about other known issues for the 3000 series products: 102011 - Sun StorEdge 33x0/3510 Arrays May Report a Higher Incidence of Drive Failures With Firmware 4.1x SMART Feature Enabled 102067 - Sun Cluster 3.x Nodes May Panic Upon Controller Failure/Replacement Within Sun StorEdge 3510/3511 Arrays 102086 - Failed Controller Condition May Cause Data Integrity Issues 102098 - Insufficient Information for Recovery From Double Drive Failure for Sun StorEdge 33x0/35xx Arrays 102126 - Recovery Behavior From Fatal Drive Failure May Lead to Data Integrity Issues 102127 - Performance Degradation Reported in Controller Firmware Releases 4.1x on Sun StorEdge 3310/351x Arrays for All RAID Types and Certain Patterns of I/O 102128 - Data Inconsistencies May Occur When Persistent SCSI Parity Errors are Generated Between the Host and the SE33x0 Array 102129 - Disks May be Marked as Bad Without Explanation After "Drive Failure," "Media Scan Failed" or "Clone Failed" Events Note: One or more of the above Sun Alerts may require a Sun Spectrum Support Contract to login to a SunSolve Online account. Use best judgement from system response and logs to either replace or re-insert the failed controller. Note: If the I/O Module on the surviving controller has failed, as determined by issuing the CLI command "sccli>show ses," then do not pull the failed controller. You will need to escalate this to backline support. Internal Eng Responsible Engineer [email protected] Internal Resolution Patches 113723-15, 113722-15, 113730-01, 113724-09 Attachments This solution has no attachment |
||||||||||||
|