![]() | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||
Solution Type Troubleshooting Sure Solution 1366035.1 : Sun Storage 7000 Unified Storage System: Troubleshooting Disk Drive Failures
Applies to:Sun Storage 7310 Unified Storage System - Version Not Applicable and laterSun Storage 7110 Unified Storage System - Version Not Applicable and later Sun Storage 7410 Unified Storage System - Version Not Applicable and later Sun ZFS Storage 7320 - Version Not Applicable and later Sun ZFS Storage 7420 - Version Not Applicable and later Information in this document applies to any platform. PurposeThe purpose of this document is to troubleshoot disk drive failures on a Sun Storage 7000 ZFS Appliance. To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - 7000 Series ZFS Appliances
Troubleshooting Steps1. Verify the Problems list from the ApplianceTo aid serviceability, the appliance detects persistent hardware failures (faults) and software failures (defects, often included under faults) and reports them as active problems on this screen. If the phone home service is enabled, active problems are automatically reported to Oracle where a support case may be opened depending on the service contract and the nature of the fault. From support bundle
The list below is a list of the descriptive text and the error code found on the system, as they may be seen in the customer report. Review the list shown by the appliance against the one below:
2. Are the problems all ZFS-8000-GH?If so, go to Step 3. 3. Run a scrub to attempt to correct the errors.The ZFS subsystem can generate erroneous checksum errors on the system as part of a disk replacement or normal day to day action. This does not require a disk replacement unless the
Contact Oracle if there are unrecoverable errors generated during this process. 4. Verify Part is in the Correct EnclosureAK-8000-F0 is an indication that the disk is not compatible with the enclosure, i.e. a SAS-1 drive in a SAS-2 enclosure. Unfortunately, the error is often spurious, and can disappear with after reseating a drive. This fault will almost exclusively be seen on 7410 and 7420 systems. Please reseat the drive in question and mark the problem as cleared. If the problem returns, please review your configuration. There may be a SAS-1 or SAS-2 system in the wrong enclosure. 5. Verify your system release for whether CR 6999699 has been fixed on your system AND whether you have replaced a disk drive recently<SUNBUG: 6999699> This issue is due to the creation of multiple entries for drives in the storage pools. This has been resolved in the 2010Q3.4.0 release
[email protected],1-1.14 2010-9-23 18:28:47 previous
If the steps above have not pointed you towards a resolution, please contact Oracle for further help.
Step 6 and beyond are for Internal Oracle Support, as they describe detailed how to review system logs from a support bundle to identify the reason for the fault on customer system.
6. Check debug.sys to see if there were several command timeouts for which the disk was offlined.
FRU : "SCSI Device 13" (hc://:product-id=SUN-Storage-J4400:server-id=:chassis-id=1027QAK01F:serial=9QJ410Y8:part=SEAGATE-ST31000NSSUN1.0T:revision=SU0F/ses-enclosure=6/bay=13/disk=0)
grep "Disconnected command timeout for target" logs/debug.sys | grep "9QJ2WZNT"
7. Verify whether the drive is in the head or expansion tray
FRU : "SCSI Device 13" (hc://:product-id=SUN-Storage-J4400:server-id=:chassis-id=1027QAK01F:serial=9QJ410Y8:part=SEAGATE-ST31000NSSUN1.0T:revision=SU0F/ses-enclosure=6/bay=13/disk=0)
8) Verify the status of the SIMs/SIM paths
FRU : "SCSI Device 13" (hc://:product-id=SUN-Storage-J4400:server-id=:chassis-id=1027QAK01F:serial=9QJ410Y8:part=SEAGATE-ST31000NSSUN1.0T:revision=SU0F/ses-enclosure=6/bay=13/disk=0)
hc://:product-id=SUN-Storage-J4410:product-sn=1051FMJ01V:server-id=:chassis-id=1027QAK01F:serial=2029QTF-1043QC133F:part=3753633:revision=3524/ses-enclosure=7/controller=1
If fault is not cleared collaborate to L2
9. Check whether debug.sys shows other drives impacted by timeouts
FRU : "SCSI Device 13" (hc://:product-id=SUN-Storage-J4400:server-id=:chassis-id=1027QAK01F:serial=9QJ410Y8:part=SEAGATE-ST31000NSSUN1.0T:revision=SU0F/ses-enclosure=6/bay=13/disk=0)
grep "Disconnected command timeout for target" logs/debug.sys | grep "<serial number>"
grep "Disconnected command timeout for target" logs/debug.sys | grep -v "<serial number>"
10. Check for any other failed system components in hw.akshPerform a quick review of component status in hw.aksh for any other faulty part status. Attachments This solution has no attachment |
||||||||||||||||||||||||
|