![]() | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||||
Solution Type Troubleshooting Sure Solution 1452325.1 : Determining when Disks should be replaced on Oracle Exadata Database Machine
This document explains which I/O errors require disk replacement, which do not, and which should be investigated further. I/O errors can be reported in different places for different reasons, and not every I/O error is due to a physical hard disk problem that requires replacement. In this Document
Applies to:Enterprise Manager for ExadataExadata Database Machine X2-2 Hardware Exadata Database Machine V2 Oracle Exadata Hardware - Version 11.2.0.1 to 11.2.0.3 [Release 11.2] Exadata Database Machine X2-8 Information in this document applies to any platform. PurposeThis document explains which I/O errors require disk replacement, which do not, and which should be investigated further. I/O errors can be reported in different places for different reasons, and not every I/O error is due to a physical hard disk problem that requires replacement. Troubleshooting StepsAbout Disk Error Handling:The inability to read some sectors is not always an indication that a drive is about to fail. Even if the physical disk is damaged at one location, such that a certain sector is unreadable, the disk may be able to use spare space to replace the bad area, so that the sector can be overwritten. Physical hard disks are complex mechanical devices with spinning media, so media errors and other mechanical problems are a fact of life which is why redundancy was designed into Exadata in order to protect data against such errors. It is important to stay up to date on disk vendor's firmware which resolves known issues with internal drive mechanical control, media usage and re-allocation algorithms which are problems that can lead to premature failure if not attended to in a timely manner. The most recent Exadata patch image releases contain the latest disk firmware for each drive supported, as well as continuous improvements in ASM and Cellsrv management and handling of disk related I/O errors and failures. Refer to Note 888828.1 for the latest patch releases. Physical hard disks used on Exadata support SMART (Self-Monitoring, Analysis, and Reporting Technology) and will report their own SMART status to the RAID HBA in the event of a problem. SMART events are based on thresholds which are vendor-defined for monitoring various internal disk mechanisms and will be different for different types of errors. By definition SMART has only 2 external states – predicted failure or not failed (OK). The SMART status does not necessarily indicate the drive's past or present reliability. Exadata Storage Cells reports disks in 2 places – on the cluster by ASM and on the individual cell server by cellsrv which accesses disks through LSI's RAID HBA Megacli utility. When a failed disk has occurred, ASM will forcibly re-balance the data off the failed disk to restore redundancy if not replaced within the timout period. Performance may be reduced during a rebalance operation. The rebalance operation status should be checked to ensure that this rebalance has completed successfully, because if the rebalance fails, then redundancy will stay reduced until this is rectified or the disk replaced. On-site spare disks are provided such that failed disks can be replaced rapidly by the customer if they choose, in order to ensure maximum performance and full redundancy is maintained prior to the timeout expiring and forcing a rebalance. In normal redundancy configuration for the disk groups, 1 disk failure can be survived before ASM rebalance re-establishes data redundancy for the whole cluster; if a 2nd failure occurs before ASM rebalance has completed, then the DB may lose data and crash. In high redundancy configuration for the disk groups, 2 disk failures can be survived before ASM rebalance re-estabilishes redundancy for the whole cluster; if a 3rd failure occurs before then, the DB may lose data and crash. While the statistical chance of a 2nd disk failure is very low, the consequences are severe in normal redundancy mode. Redudancy configuration is a trade-off between higher availability for mission-critical and business-critical systems, vs. higher capacity disk groups available for data storage, and should be chosen according to individual customer need. /opt/oracle.SupportTools/sundiag.sh is a utility used to collect data for Exadata service requests, and in particular contains data specific for diagnosing disk failures. The version in the Exadata software image may not be the latest. For more details and the latest version, refer to Note 761868.1. Each of the examples below are from outputs collected by sundiag.
If there is ever a situation where 2 or more disks report critical failure within seconds of each other, in particular from more than 1 server at the same time, then a sundiag output should be collected from each server and a SR opened for further analysis.
Errors for which Disk Replacement is Recommended:Case 1. Cell's alerthistory reports the drive has changed its SMART status to "Predictive Failure": 20_1 2012-03-18T02:22:43+00:00 critical "Hard disk status changed to predictive failure. Status : PREDICTIVE FAILURE Manufacturer : SEAGATE Model Number : ST32000SSSUN2.0T Size : 2.0TB Serial Number : L1A2B3 Firmware : 0514 Slot Number : 11 Cell Disk : CD_11_exd1cel01 Grid Disk DATA_EXD1_CD_11_exd1cel01, RECO_EXD1_CD_11_exd1cel01, DBFS_DG_CD_11_exd1cel01" This indicates the drive has determined via SMART that it is predicting it will fail and a SR should be opened for a replacement as soon as is convenient, and a sundiag attached for data analysis purposes. If the system is connected to Oracle via Automatic Service Request, then a SR will automatically be opened for this event.
Case 2. Cell's alerthistory reports the drive lun has experienced a critical error for which it cannot recover from:
21 2012-03-24T10:45:41+08:00 warning "Logical drive status changed. Lun : 0_11 Status : critical Physical Hard disk : 20:11 Slot Number : 11 Serial Number : L1C4D5 Cell Disk : CD_11_edx1cel01 Grid Disks : RECO_EDX1_CD_11_edx1cel01, DBFS_DG_CD_11_edx1cel01, DATA_EDX1_CD_11_edx1cel01." This indicates the drive has experienced a critical error during the transaction causing the RAID HBA to mark the volume as critical. This often occurs concurrently with Predictive Failure in Storage cells where each volume is a single-disk RAID0, but may occur by itself in the event of a problem writing. A SR should be opened for a replacement as soon as is convenient, and a sundiag attached for data analysis purposes. If the system is connected to Oracle via Automatic Service Request, then a SR will automatically be opened for this event. Internal Note: Review sundiag's for all disk failures, and look for symptoms of Note 1360343.1 Issues 50 and 65 for which disks are requested to be CPAS'd. Issue 50 describes a scenario where the disk drive appears to be hot-removed and hot-inserted by itself on the RAID0 volumes on the storage cell within a few seconds of each other, where it would not be physically possible to remove and replace a drive that quickly. It may also be at a time when the customer was not at the data center. This will be evident in the cell alerts and LSI HBA firmware logs which appears as a disk remove followed by a disk insert event. This may offline the volume in addition to causing SCSI I/O error messages from the kernel and may have ASM write errors evident. Issue 65 describes a scenario where the disk records a critical failure and when parsing the megacli event logs, there are lots of command timeouts which end in "Error 02" when the disk is marked failed.
Case 3. DB node's where the Megacli status is shown as "Firmware state: (Unconfigured Bad)" preceded by logged errors indicating the drive was Failed or Predictive Failed:
=> cat exa1db01_megacli64-PdList_short_2012_03_30_01_23.out The above command output files are gathered by a sundiag.
Case 4. DB node's where the "Predictive Failure Count" is >0 even if the drive status shows as "Online". # cat exa1db01_megacli64-PdList_long_2012_03_30_01_23.out The above command output files are gathered by a sundiag. # /opt/MegaRAID/MegaCli/MegaCli64 -AdpSetProp -SMARTCpyBkEnbl -1 -a0 If replacing the drive before the next failure, then hot-plug remove it and wait for the controller to start copyback to the hotspare due to the missing disk.
Case 5. Storage Cell's where the drive cell status is "Warning" and Megacli status is "Firmware State: (Unconfigured Bad)". The Cell's alerthistory may report the drive with a "not present" alert. => cat exa1cel01_alerthistory_2012_08_23_18_24.out This case may occur when the drive fails during a boot cycle, before the Cell's management services are running so the Cell does not see it go offline, only that its no longer present in the OS configuration. This will be evident in the Megacli logs, however may not be obvious to an operator without analysis. A failed disk can be verified by collecting a sundiag output, and a SR should be opened for analysis. Errors for which Disk Replacement is NOT Recommended:Case 1. The Media Error counters reported by MegaCli in PdList or LdPdInfo outputs in a sundiag. On Storage Servers, these are also reported by Cellsrv in the physical disk view:
# cat exa1db01_megacli64-PdList_long_2012_03_30_01_23.out
The above command output files are gathered by a sundiag. Case 2. The Other Error counters reported by MegaCli in PdList or LdPdInfo outputs in a sundiag. On Storage Servers, these are also reported by Cellsrv in the physical disk view:
# cat exa1cel01_megacli64-PdList_long_2012_03_30_01_23.out The above command output files are gathered by a sundiag. In the example shown, all the disks have had data path errors. Slot 2 disk had some corrected read errors as a side-effect of the data path errors that are not critical, hence status is normal and therefore this disk does not match the criteria outlined above that requires replacement. One of those data path errors has triggered Slot 8 to change to critical, although it has not shown any media errors. Disk replacements of slot 8 did not resolve this problem. Data analysis of the full history of the errors and their types identified the problem component to be the SAS Expander.
Case 3. ASM logs on the DB node show I/O error messages in *.trc files similar to:
ORA-27603: Cell storage I/O error, I/O failed on disk o/192.168.10.09/DATA_CD_01_exa1cel01 at offset 212417384 for data length 1048576 That may also be accompanied by ASM recovery messages such as this: WARNING: failed to read mirror side 1 of virtual extent 1251 logical extent 0 of file 73 in group [1.1721532102] from disk DATA_EXA1_CD_01_EXA1CEL01 allocation unit 52 reason error; if possible, will try another mirror side This is a single I/O Error on a read, which ASM has recovered and corrected. There are other similar ASM messages for different types of read errors. IO Error on dev=/dev/sdb cdisk=CD_01_exa1cel01 [op=RD offset=132200642 (in sectors) sz=1048576 bytes] (errno: Input/output error [5]) This may also generate a Storage Server entry in /var/log/messages such as this:
Mar 30 15:37:08 td01cel06 kernel: sd 0:2:1:0: SCSI error: return code = 0x00070002 and will probably match an entry in the RAID HBA logs gathered by sundiag.
Case 4. Oracle Enterprise Manager users of the Exadata plug-ins may see alerts marked "Critical" for all I/O errors. From: EnterpriseManager Exadata-OracleSupport @ oracle.com> Since read errors are correctable and not truly critical, this may be a false report. A sundiag output should be collected and a SR opened for further analysis to determine if the fault is critical or not that requires replacement.
Case 5. A disk with Firmware status "Unconfigured(good)". Conclusion:Any other disk or I/O errors for which a disk may be suspect for such as device not present, device missing or timeouts, should have a sundiag output collected and a SR opened for further analysis to determine if the fault is critical. Any doubts or concerns about disk or I/O errors listed above, then a sundiag output should be collected and a SR opened for further analysis to determine whether action is necessary.
References<NOTE:1281395.1> - Steps to manually create cell/grid disks on Exadata V2 if auto-create fails during disk replacement<NOTE:761868.1> - Oracle Exadata Diagnostic Information required for Disk Failures @<NOTE:1360343.1> - INTERNAL Exadata Database Machine Hardware Current Product Issues <NOTE:888828.1> - Database Machine and Exadata Storage Server 11g Release 2 (11.2) Supported Versions <NOTE:1312266.1> - Exadata: After disk replacement ,celldisk and Griddisk is not created automatically <NOTE:1386147.1> - How to Replace a Hard Drive in an Exadata Storage Server (Hard Failure) <NOTE:1390836.1> - How to Replace a Hard Drive in an Exadata Storage Server (Predictive Failure) <NOTE:1479736.1> - How to replace an Exadata Compute (Database) node hard disk drive (Predictive or Hard Failure) Attachments This solution has no attachment |
||||||||||||||||||||||||||
|