Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Technical Instruction Sure Solution 1005520.1 : How to verify I/O errors for failing device on Sun SPARC Systems [Video]
PreviouslyPublishedAs 207650
Applies to:Sun Fire V490 ServerSun Fire T1000 Server Sun Fire V880 Server Sun Fire V890 Server Sun Fire T2000 Server All Platforms To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community, Oracle Entrylevel Servers. GoalDescriptionThis document will help the user to identify a failing disk device based on errors reported in the 'format' output, 'iostat' and /var/adm/messages. This information doesn't apply to systems, in which the disks are configured in a hardware raid volume (as 'format' will not show disks that are part of a raid volume)
Sunsolve users must download the attachment to view the video. SolutionSteps to FollowConfirming Disk failure for failing drives Most of the I/O errors for failing drives on the Sun Fire[TM] servers are related to a disk problem and not to disk backplane or cables. To confirm a disk failure from I/O errors, there are several things that can be checked. First you may need verify that 'format' is not seeing a device problem. A typical example here is when format shows 'drive type unknown' for a specific drive. Server platforms, such as 280R, V480/V490, and V880/V890 are using FC-AL disk drives. Note that the FC-AL disks have a World Wide Number (WWN) attached to each disk, which affects how devices appear in Solaris[TM] (in the format output): AVAILABLE DISK SELECTIONS: After analyzing the format output, in this case it is strongly recommended to also examine /var/adm/messages for matching disk drive errors: Dec 22 12:34:39 wspaba01 scsi: [ID 107833 kern.warning] WARNING: /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000011c6371bc0,0 (ssd0): Errors like these generally indicate that the drive listed needs to be replaced. To confirm the failing drive, the WWN of w21000011c6371bc0,0 in the above messages should be mapped to 'c1t2d0' drive shown in the output of the format command (in this case they match). Here is another example of format errors AVAILABLE DISK SELECTIONS: The following errors are in the /var/adm/messages: Nov 20 12:28:51 sg5000-maildb-0 scsi: WARNING: /pci@1f,700000/scsi@2/sd@1,0 (sd2): In the above example the device path from messages matches the disk c1t1d0 reported with When troubleshooting I/O errors for failing devices you'll also need to carefully examine the output of the 'iostat -E' (iostat -En) command, for any error events that affect the disk drives. Look for non-zero counts (usually in the 1st, 4th, and 5th lines): # iostat -En If more that one disk has a non-zero counts (as in the above example), this could be a problem on one disk and a side-effect of that problem on the other. In this case the error counts on the failing drive c1t2d0 are significantly higher compared to the other disk c1t1d0. A disk problem reported in the 'format' output (or messages) typically translates to a high error count in iostat, for example: 2. c1t2d0 # iostat -E .......... However, a non-zero count in 'iostat -E' output does not always mean an error event on a device. Some specific conditions of the target device, can cause non-zero values in the 'iostat' output. Following, is an example of such a condition where the device is working normally: # iostat -E In this case, both "Hard Errors" and "No Device" are the same. This implies that the device has gone through resets or power on. The device does not need an immediate replacement. It is recommended to monitor the value over a period of time, and if there are other related errors, this has to be investigated. Refer to Document: 1017741.1 Solaris Operating System: High Hard Error value in iostat -E output for more details. NOTE: There is a helpful utility "diskinfo.sparc", which is part of the Sun explorer. It always gives updated disk model and serial number information even after a disk hot swap. For example:# /opt/SUNWexplo/bin/diskinfo.sparc Internal Comments This document contains normalized content and is managed by the the Domain Lead(s) of the respective domains. To notify content owners of a knowledge gap contained in this document, and/or prior to updating this document, please contact the domain engineers that are managing this @ document via the "Document Feedback" alias(es) listed below: [email protected] Note: Some of the error examples in document list Vendor , Sense Key , ASC and ASCQ information. These values will vary with the type of drive error and are explained further in Doc ID 1005787.1 Kernel tips: understanding SCSI and its errors. normalized, I/O errors, failed drive, format, iostat, Problem Solved = Disk Error Verification Previously Published As 91406 Change History Date: 2009-11-18 User name: Dencho Kojucharov Action: Updated Comments: Currency check, audited by Dencho Kojucharov, Entry-Level SPARC Content Lead Attachments This solution has no attachment |
||||||||||||
|