![]() | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Technical Instruction Sure Solution 1005520.1 : How to verify I/O device errors on V210/V240/V215/V245/V440/V445, T1000/T2000, T5120/T5140/T5220/T5240/T5440, V480/V490/V880/V890 servers
PreviouslyPublishedAs 207650 Applies to:Sun Fire V210 Server - Version Not Applicable and laterSun Fire V215 Server - Version Not Applicable and later Sun Fire V240 Server - Version Not Applicable and later Sun Fire V245 Server - Version Not Applicable and later Sun Fire V440 Server - Version Not Applicable and later All Platforms To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community, Oracle Entrylevel Servers. GoalDescription This information doesn't apply to systems, in which the disks are configured in a hardware raid volume (as 'format' will not show disks that are part of a raid volume)
Sunsolve users must download the attachment to view the video. FixSteps to Follow Most of the I/O errors for failing drives on the Sun Fire[TM] servers are related to a disk problem and not to disk backplane or cables. To confirm a disk failure from I/O errors, there are several things that can be checked. First you may need verify that 'format' is not seeing a device problem. A typical example here is when format shows 'drive type unknown' for a specific drive. Server platforms, such as 280R, V480/V490, and V880/V890 are using FC-AL disk drives. Note that the FC-AL disks have a World Wide Number (WWN) attached to each disk, which affects how devices appear in Solaris[TM] (in the format output): AVAILABLE DISK SELECTIONS: After analyzing the format output, in this case it is strongly recommended to also examine /var/adm/messages for matching disk drive errors: Dec 22 12:34:39 wspaba01 scsi: [ID 107833 kern.warning] WARNING: /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000011c6371bc0,0 (ssd0): Errors like these generally indicate that the drive listed needs to be replaced. To confirm the failing drive, the WWN of w21000011c6371bc0,0 in the above messages should be mapped to 'c1t2d0' drive shown in the output of the format command (in this case they match). Here is another example of format errorsfor server platforms using SCSI drives (servers such as V215/V245, V440/445, T1000/T2000): AVAILABLE DISK SELECTIONS: The following errors are in the /var/adm/messages: Nov 20 12:28:51 sg5000-maildb-0 scsi: WARNING: /pci@1f,700000/scsi@2/sd@1,0 (sd2): In the above example the device path from messages matches the disk c1t1d0 reported within the format output, so the disk needs to be replaced. When troubleshooting I/O errors for failing devices you'll also need to carefully examine the output of the 'iostat -E' (iostat -En) command, for any error events that affect the disk drives. Look for non-zero counts (usually in the 1st, 4th, and 5th lines): # iostat -En If more that one disk has a non-zero counts (as in the above example), this could be a problem on one disk and a side-effect of that problem on the other. In this case the error counts on the failing drive c1t2d0 are significantly higher compared to the other disk c1t1d0. A disk problem reported in the 'format' output (or messages) typically translates to a high error count in iostat, for example: 2. c1t2d0 /pci@1f,700000/scsi@2/sd@2,0 # iostat -E .......... However, a non-zero count in 'iostat -E' output does not always mean an error event on a device. Some specific conditions of the target device, can cause non-zero values in the 'iostat' output. Following, is an example of such a condition where the device is working normally: # iostat -E In this case, both "Hard Errors" and "No Device" are the same. This implies that the device has gone through resets or power on. The device does not need an immediate replacement. It is recommended to monitor the value over a period of time, and if there are other related errors, this has to be investigated. Refer to Document: 1017741.1 Solaris Operating System: High Hard Error value in iostat -E output for more details. NOTE: There is a helpful utility "diskinfo.sparc", which is part of the Sun explorer. It always gives updated disk model and serial number information even after a disk hot swap. For example: # /opt/SUNWexplo/bin/diskinfo.sparc
References<NOTE:1017741.1> - Solaris Operating System High Hard Error value in iostat -E output<NOTE:778.1> - Multimedia Content Reference Attachments This solution has no attachment |
||||||||||||
|