Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Problem Resolution Sure Solution 1008108.1 : Sun StorEdge[TM] 3310, 3510 and 3511: Avoiding double drive failure conditions on 3.x firmware
PreviouslyPublishedAs 211152 Symptoms When a disk fails on a Sun StorEdge[TM] 33x0, 3510 and 3511 array, a bad block is encountered on another disk of that logical drive during the rebuild operation and the rebuild operation fails as follows : Tue Jan 31 15:38:30 2006 [1113] #5: StorEdge Array SN#8040967 CH2 ID7: SCSI Drive ALERT: bad block encountered (02h, 03h,11/00) Tue Jan 31 15:38:30 2006 [2103] #6: LD-ID 6CC584FE on StorEdge Array SN#8040967: ALERT: rebuild failed This is known as a "double disk error" and when this happens, data Resolution This problem is due to the latent disk access and is common for all arrays. The solution was to introduce disk scrubbing which was also introduced on Sun StorEdge[TM] 3310, 3510 and 3511 arrays with 4.x firmware which automatically does media scan. To avoid this problem, upgrade to 4.x firmware which will minimize the chances of this happening. Before upgrading to 4.x firmware, it would be advisable to use the procedure mentioned in the workaround section once while on 3.x, to avoid seeing too many drive failures (see SunAlert 102011 for more details). Sample of drive related Sun Alerts while using 4.x below: Sun Alert ID: 102098 Synopsis: Insufficient Information for Recovery From Double Drive Failure for Sun StorEdge 33x0/35xx Arrays Sun Alert ID: 102129 Synopsis: Disks May be Marked as Bad Without Explanation After "Drive Failure," "Media Scan Failed" or "Clone Failed" Events Sun Alert ID: 102011 Synopsis: Sun StorEdge 33x0/3510 Arrays May Report a Higher Incidence of Drive Failures With Firmware 4.1x SMART Feature Enabled Relief/Workaround To do this, run the Parity Regenerate operation at least once a month which will read the data and compare it with the parity for all the disk blocks. This will ensure that latent disk access will be avoided. This will also help ensure that the data and parity are consistent, and will take necessary action(s) if they aren't. This can be done two ways. 1. Telnet/Serial interface To describe in detail :- 1. Telnet/Serial interface From the telnet/serial access to the array, select the RAID 5 logical drive. It is best to run parity regenerate and select the last but one option which is x reGenerate parity x This will give 2 options :- Execute Regenerate Logical Drive Parity Overwrite Inconsistent Parity - Enabled By default, the "Overwrite Inconsistency Parity" is enabled. Disable this as it will overwrite the parity, should there be a mismatch between the data and parity. After disabling the "Overwrite Inconsistency Parity", select the "Execute Regenerate Logical Drive Parity" which will start the parity regenerate. You can also track its progress. Notes: a. Only run one parity regenerate program at a time. 2. sscs GUI interface a. Launch the SSCS gui with /usr/sbin/ssconsole. Please refer Sun StorEdge[TM] 3000 Family Configuration Service User's Guide for more details on this. Product Sun StorageTek 3511 SATA Array Sun StorageTek 3510 FC Array Sun StorageTek 3320 SCSI Array Sun StorageTek 3310 SCSI Array parity, scrubbing, latent, disk, 3310, 3510, 3511, 3.x, 4.x, 3.27, 3.25, 4.11, 4.13, regen, rebuild, drive-failure Previously Published As 84456 Change History Date: 2010-01-11 User Name: [email protected] Action: Currency & Update Date: 2006-03-15 User Name: 7058 Action: Approved Comment: Trademarked where appropriate. Reworded sentences throughout document for reader clarity. Made grammar and punctuation fixes as needed. Enabled STM to bold section headers and offset preformatted text for clarity. Attachments This solution has no attachment |
||||||||||||
|