![]() | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||
Solution Type Sun Alert Sure Solution 1300555.1 : Replacement of Drives with Mechanical Positioning Errors May Cause RAID Controllers Reset or Lockdown Unexpectedly
In this Document
Applies to:Sun Storage 6180 Array - Version Not Applicable to Not Applicable [Release N/A]Sun Storage 6540 Array - Version Not Applicable to Not Applicable [Release N/A] Sun Storage 2510 Array - Version Not Applicable to Not Applicable [Release N/A] Sun Storage Flexline 380 Array - Version Not Applicable to Not Applicable [Release N/A] Sun Storage 6140 Array - Version Not Applicable to Not Applicable [Release N/A] Information in this document applies to any platform. This issue also applies to Sun Storage 2510, 2530, 2540, 6140, 6180, 6540, 6580, and 6780 Arrays. ______________________ ______________________ Date of Resolved Release: 02-Mar-2011 DescriptionDrives that have mechanical positioning errors may cause a RAID controller, see identified products, to reboot when the controllers attempt to fail that drive. The drive will be marked as failed when the controller completes SOD. Note: Attempting to manually fail an affected drive may cause a lockdown when the rebooting controller is unable to verify the DACstore on the drive that is still reporting optimal to the survivor, causing an outage.
OccurrenceThis issue can occur on the following system:
This issue only occurs on arrays with one of the following drive models:
Note: Arrays with 6.xx firmware are not affected by this issue. Only a single controller will issue the command to fail the drive, thus no race condition exists.
SymptomsThe aforementioned disk drives are failing due to a mechanical positioning error as seen in the array event logs, similar to the following: B:10/27/10 7:11:30 AM : 4050 : 0/0/0 : 6008 : Internal : Drive : Tray 33, Slot 16 : Stable storage drive unusable A:10/27/10 7:10:47 AM : 4051 : 4/15/1 : 100A : Error : Drive : Tray 33, Slot 16 : Drive returned CHECK CONDITION : Mechanical Positioning Error A:10/27/10 7:10:49 AM : 4053 : 0/0/0 : 6008 : Internal : Drive : Tray 33, Slot 16 : Stable storage drive unusable Sense: 04 HARDWARE ERROR ASC/ASCQ: 15/01 MECHANICAL POSITIONING ERROR Which is a Drive Positioning Mechanical Error. These drives, in particular, have an existing issue with the heads staying in one position after a drive error log update. This reduces the lubrication in the drive heads leading to a head crash indicated by the codes mentioned above. Manual failure of drives in an Optimal state, which results in one or more volumes in the array failing, typically lead to one or both controllers resetting, and possibly being held in a Lockdown state, as a result of access problems to the metadata on the array. This is due to a problem accessing and updating the metadata on the disk drive that is reporting the error. The lockdown state may show as LU, 88, or SD on one controller of a 6140, 6540, or Flexline 380. The lockdown state may show as a flashing display on a 6180, 6580, or 6780 of OE+ LU+ blank- Note: Controllers in a lockdown or offline state should be serviced immediately by Oracle support for correction.
B Sat Jan 01 16:30:54 PST 2011 54527 4/15/1 100A Error Drive Tray.01.Drive.03 Drive returned CHECK CONDITION - Mechanical Positioning Error B Sat Jan 01 16:30:54 PST 2011 54528 204/15/1 1012 Error Drive Tray.01.Drive.03 Destination driver event - Mechanical Positioning Error B Sat Jan 01 16:30:54 PST 2011 54529 0/0/0 6008 Notification Drive Tray.01.Drive.03 Stable storage drive unusable due to I/O errors B Sat Jan 01 16:49:29 PST 2011 54530 0/0/0 100D Error Drive Tray.01.Drive.03 Timeout on drive side of controller B Sat Jan 01 16:49:40 PST 2011 54531 0/0/0 100D Error Drive Tray.01.Drive.03 Timeout on drive side of controller B Sat Jan 01 16:49:51 PST 2011 54532 0/0/0 100D Error Drive Tray.01.Drive.03 Timeout on drive side of controller B Sat Jan 01 16:50:00 PST 2011 54533 201020b/0/0 1012 Error Drive Tray.01.Drive.03 Destination driver event - IO timeout B Sat Jan 01 16:50:00 PST 2011 54534 0/0/0 201E Notification Controller Tray.85.Controller.B VDD repair started B Sat Jan 01 16:50:00 PST 2011 54535 0/0/0 201E Notification Controller Tray.85.Controller.B VDD repair started B Sat Jan 01 16:50:00 PST 2011 54536 0/0/0 201E Notification Controller Tray.85.Controller.B VDD repair started B Sat Jan 01 16:50:00 PST 2011 54537 0/0/0 2014 Notification Controller Tray.85.Controller.B VDD logged an error B Sat Jan 01 16:50:00 PST 2011 54538 0/0/0 201F Notification Controller Tray.85.Controller.B VDD repair completed B Sat Jan 01 16:50:00 PST 2011 54539 0/0/0 201F Notification Controller Tray.85.Controller.B VDD repair completed B Sat Jan 01 16:50:00 PST 2011 54540 0/0/0 201F Notification Controller Tray.85.Controller.B VDD repair completed B Sat Jan 01 16:50:01 PST 2011 54541 0/0/0 2226 Notification Drive Tray.01.Drive.03 Drive spun down B Sat Jan 01 16:50:01 PST 2011 54542 0/0/0 226C Failure Drive Tray.01.Drive.03 Drive failure detected B Sat Jan 01 16:50:01 PST 2011 54543 0/0/0 2215 Notification Drive Tray.01.Drive.03 Drive marked failed B Sat Jan 01 16:50:01 PST 2011 54544 0/0/0 2217 Notification Drive Tray.01.Drive.03 Piece failed B Sat Jan 01 16:50:01 PST 2011 54545 0/0/0 2216 Notification Drive Tray.01.Drive.03 Piece taken out of service B Sat Jan 01 16:50:01 PST 2011 54546 0/0/0 2217 Notification Drive Tray.01.Drive.03 Piece failed B Sat Jan 01 16:50:02 PST 2011 54547 0/0/0 100D Error Drive Tray.01.Drive.03 Timeout on drive side of controller B Sat Jan 01 16:51:02 PST 2011 54548 0/0/0 400F Notification Controller Tray.85.Controller.A Controller reset by its alternate Reboot Reason: REBOOTALT_DBM_HEALTH_CHECK_EVENT Note: A drive being failed by the system does not usually result in a lockdown or offline controller state. After a power cycle or controller reset, the drives often transition to a state of INCOMPATIBLE.
Note: The possible symptoms that can occur when this issue is encountered will vary depending on the hardware configuration as well as the logical layout of the vdisks and volumes.
WorkaroundTo work around the described issue, avoid manually failing drives. This will prevent the lockdown conditions requiring service intervention. In order to service drive replacement under these conditions, use the steps below to avoid the accessibility and availability issues referenced in the symptoms section. In general, Note: Please refer to Doc ID 1296274.1 for information on how to download Common Array Manager (CAM) software and patches. HistoryDocument created March 2
The firmware versions that were originally listed in the Sun Alert at the time of release were incorrect. The CR was updated with the correct firmware versions after the Sun Alert had been released. The Sun Alert was updated because of a rework of the original bug to remove side effects. Had the side effect been known at the time, it would have been fixed in the same original CR. This is not a new bug but rather a completion of the original. If a lockdown condition does occur, evaluate and respond as follows:
References@ <BUG:6978258> - ST2:WORKFLOW AVAILABILITY FOR MZ3ST210 ENVIRONMENT@ <BUG:7012554> - EM R2 : SDC78041SVQE: HOST UNREACHABLE <NOTE:1296274.1> - How to Download Common Array Manager (CAM) Software and Patches Attachments This solution has no attachment |
||||||||||||||||||||||
|