![]() | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Sun Alert Sure Solution 1437153.1 : ZFS Storage Appliances With Certain HBAs May Experience Disk Faults and Should be Updated to 2011.1.2.1 Software
In this Document
Applies to:Sun Microsystems > Storage - Disk > Unified StorageSun Microsystems > Storage Software Sun Storage 7210 Unified Storage System - Version: Not Applicable and later [Release: N/A and later] Sun Storage 7310 Unified Storage System - Version: Not Applicable and later [Release: N/A and later] Sun Storage 7410 Unified Storage System - Version: Not Applicable and later [Release: N/A and later] Information in this document applies to any platform. ___________________________________ SUNBUG:7132238 Date of Workaround release: 16-Mar-2012 Date of Resolved Release: 03-Apr-2012 ___________________________________ DescriptionAfter updating 7210, 7310 or 7410 Storage Appliances to the 2011.1.1.0 or 2011.1.1.1 Storage Appliance Software releases, systems with SAS-1 HBAs and J4400 or J4500 disk shelves may experience multiple false disk failures after an initial real disk fault. The issue is triggered by manual and automatic (phone home) support bundle creation related to diagnosing the initial disk fault. The issue can cause storage pool redundancy characteristics to be degraded and the Storage Appliance Software BUI and CLI to be unresponsive. Likelihood of OccurrenceThis issue can occur on the following: Sun ZFS 7000 Storage Appliance platforms:
- with SAS-1 HBAs (includes revisions B3 and C0) - with Sun Storage J4400 or J4500 SAS disk shelves - with ZFS Storage Appliance Software 2011.1.1.0 or 2011.1.1.1 Notes: 1. Sun ZFS platforms 7110, 7120, 7320, and 7420 are not affected by this issue. 2. To determine the current Storage Appliance Software revision, run the following command: 7000:> maintenance system updates listor: Do the following from the Browser User Interface (BUI) to access "info" about the release name: a) Navigate to: Maintenance -> SystemA pop-up will show the release, for example: "2010.Q3.4.2" 3. The issue will only occur when the SAS-1 HBA is attached to a J4400 or J4500 disk shelf, so only the disk shelf model needs to be checked. The following command can be run prior to a software update from the software CLI to determine if the system has a J4400 or J4500 Disk Shelf. For example: 7000:> maintenance hardware select chassis-001 showname = 0845QAK004 faulted = false manufacturer = Sun Microsystems, Inc. model = J4400 serial = 0845QAK004 revision = 3R53 type = storage rpm = 7200 path = 1 locate = false Possible SymptomsStorage pool redundancy characteristics can be degraded due to one or more disk faults. Normally, several false disk faults will happen after an initial real disk failure occurs. The "Configuration::Storage" screen can be used to determine if a pool is degraded, while the "Maintenance::Hardware" screen can be used to view any faulted disk drives. In addition, the Storage Appliance Software BUI and CLI will normally become unresponsive when this issue occurs. Workaround or ResolutionThis issue is addressed in the following release:
For customers that ARE experiencing the issue, the following procedure should be used to update the systems to the AK 2011.1.2.1 release. These steps should be done during a maintenance window without any client activity. This issue is triggered by automatic (phone home) and manual support bundles, so support bundles should not be performed until the update is complete. 1. Power off the storage appliance controller from the SP console (both heads in a cluster configuration). For example: -> stop /SYS -f2. Physically power off all disk shelves. Wait 30 seconds. Power on all disk shelves. 3. Power on the storage appliance controller from the SP console (just one head in a cluster configuration). For example: -> start /SYS4. Turn off the phone home service, and cancel any active support bundles. For example: 7000:> configuration services scrk disable 7000:> maintenance system bundles select 23eb4cc8-edd2-6a26-f2a4-b1cdf54a68e cancel5. Update to the AK 2011.1.2.1 release. If the update health checks find any single path or other issues, repeat the procedure starting at Step 1. If update health checks cannot be resolved, contact Oracle Support. For example: 7000:> maintenance system updates select [email protected],1-1.15 upgrade6. After the update is complete, go to "Maintenance::Problems" and mark any disk or HBA issues repaired. Normally only one issue was real and it will be re-detected automatically if it occurs again. For example: 7000:> maintenance problems select problem-000 markrepaired7. In a clustered configuration, perform steps 3 thru 6 on the other controller head. After the update is complete, the phone home service may be re-enabled and supported bundles may be taken as needed and the storage appliance may be used as normal. If you are not able to update the software on your own, contact Oracle Support for assistance. For a listing of ZFS Storage Appliance Sofware Releases and version information, see: https://wikis.oracle.com/display/FishWorks/Software+Updates Example screen capture of ZFS Storage Appliance (ZFSSA) Software GUI below: Modification History16-Mar-2012: Date of Workaround release03-Apr-2012: Update Description, Occurrence, Symptoms, and Workaround/Resolution - issue is Resolved The zpool status command can be used to view the status of the pool and determine if several disks are faulted. 7410# zpool status pool: pool-1 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Thu Jan 26 04:58:37 2012 1023G scanned out of 5.59T at 650M/s, 2h3m to go 173G resilvered, 17.87% done config: NAME STATE READ WRITE CKSUM pool-1 DEGRADED 0 0 0 mirror-0 ONLINE 0 0 0 c4t5000C50015BD3146d0 ONLINE 0 0 0 c4t5000C50015A1F4FAd0 ONLINE 0 0 0 c4t5000C50015A29579d0 ONLINE 0 0 0 (resilvering) mirror-1 ONLINE 0 0 0 c4t5000C50015B054FCd0 ONLINE 0 0 0 c4t5000C50015B4D829d0 ONLINE 0 0 0 c4t5000C50015A2A714d0 ONLINE 0 0 0 (resilvering) mirror-2 ONLINE 0 0 0 c4t5000C50015C85D98d0 ONLINE 0 0 0 c4t5000C50015AEA493d0 ONLINE 0 0 0 (resilvering) c4t5000CCA396DFA143d0 ONLINE 0 0 0 mirror-3 DEGRADED 0 0 0 c4t5000C50015BACD5Bd0 ONLINE 0 0 0 c4t5000C500268C6398d0 ONLINE 0 0 0 (resilvering) replacing-2 DEGRADED 0 0 0 c4t5000C50015A34E59d0 FAULTED 0 0 0 too many errors c4t5000C50015BB3195d0 ONLINE 0 0 0 (resilvering) mirror-4 ONLINE 0 0 0 c4t5000C50015C06936d0 ONLINE 0 0 3 (resilvering) c4t5000C50015BA853Cd0 ONLINE 0 0 0 c4t5000C50015BAD1B4d0 ONLINE 0 0 0 mirror-5 DEGRADED 0 0 0 c4t5000C50015B07195d0 FAULTED 0 0 0 too many errors c4t5000C50015BA8814d0 ONLINE 0 0 0 c4t5000C50015A34F57d0 FAULTED 0 0 0 too many errors mirror-6 ONLINE 0 0 0 c4t5000C5001951DFB4d0 ONLINE 0 0 2 (resilvering) c4t5000C50015B06692d0 ONLINE 0 0 0 c4t5000C50015B08592d0 ONLINE 0 0 2 (resilvering) mirror-7 DEGRADED 0 0 0 c4t5000C50015C6D612d0 ONLINE 0 0 0 c4t5000C50015C5EA09d0 ONLINE 0 0 0 spare-2 DEGRADED 0 0 0 c4t5000C50015A329FCd0 FAULTED 0 0 0 too many errors c4t5000C50019512713d0 ONLINE 0 0 0 (resilvering) mirror-8 ONLINE 0 0 0 c4t5000C50019511BC8d0 ONLINE 0 0 0 replacing-1 ONLINE 0 0 0 c4t5000C50015BA98DBd0 ONLINE 0 0 0 c4t5000C50015BB1B66d0 ONLINE 0 0 0 (resilvering) c4t5000C50015CE8A47d0 ONLINE 0 0 0 mirror-9 ONLINE 0 0 0 c4t5000C50015BB8730d0 ONLINE 0 0 0 c4t5000C50015BA838Ad0 ONLINE 0 0 0 c4t5000C50015A654C7d0 ONLINE 0 0 0 mirror-10 ONLINE 0 0 0 c4t5000C50019511DADd0 ONLINE 0 0 0 c4t5000C50015CF74DCd0 ONLINE 0 0 0 c4t5000C50015CE0BA8d0 ONLINE 0 0 0 mirror-11 ONLINE 0 0 0 c4t5000C50015BAA4C3d0 ONLINE 0 0 0 c4t5000C5001957A58Bd0 ONLINE 0 0 0 c4t5000C50015AD5D11d0 ONLINE 0 0 0 mirror-12 DEGRADED 0 0 0 c4t5000C50019513E61d0 ONLINE 0 0 0 spare-1 UNAVAIL 0 0 0 insufficient replicas c4t5000C5001950F3D3d0 FAULTED 0 0 0 too many errors c4t5000C50015BB3195d0 FAULTED 0 0 0 corrupted data c4t5000C50015BAC62Bd0 ONLINE 0 0 0 mirror-13 DEGRADED 0 0 0 c4t5000C50019513CBFd0 ONLINE 0 0 0 c4t5000C50015ADB62Dd0 FAULTED 0 0 0 too many errors mirror-14 DEGRADED 0 0 0 c4t5000C50019511B25d0 ONLINE 0 0 0 c4t5000C5002693BCA2d0 FAULTED 0 0 0 too many errors c4t5000C50015BACED5d0 FAULTED 0 0 0 too many errors mirror-15 ONLINE 0 0 0 c4t5000C50019517FB8d0 ONLINE 0 0 0 c4t5000C50015BAA88Bd0 ONLINE 0 0 0 (resilvering) c4t5000C500195117D1d0 ONLINE 0 0 0 mirror-16 ONLINE 0 0 0 c4t5000C500195143DEd0 ONLINE 0 0 0 c4t5000C5001A732C28d0 ONLINE 0 0 0 c4t5000C50015BAE31Bd0 ONLINE 0 0 0 (resilvering) mirror-17 DEGRADED 0 0 0 c4t5000C500195187A0d0 ONLINE 0 0 0 spare-1 UNAVAIL 0 0 0 insufficient replicas c4t5000C50015BAC4CEd0 FAULTED 0 0 0 too many errors c4t5000C50015BB1B66d0 FAULTED 0 0 0 corrupted data c4t5000C50019512471d0 ONLINE 0 0 0 logs c4tATASTECZEUSIOPS018GBYTESSTM0000C3AEAd0 ONLINE 0 0 0 c4tATASTECZEUSIOPS018GBYTESSTM0000D0CE9d0 ONLINE 0 0 0 cache c0t0d0 ONLINE 0 0 0 spares c2t5000C50015BB1B66d0 FAULTED corrupted data c2t5000C50019512713d0 INUSE currently in use c2t5000C50015BB3195d0 FAULTED corrupted data c4t5000C50015BACD4Ed0 AVAIL errors: No known data errors See CR 7132238 and 7146187 for more information. Please send technical questions to: [email protected] and copy the Responsible Engineer/Contributor listed Internal Contributor/Submitter: [email protected] Internal Eng Responsible Engineer: [email protected] Internal Services Knowledge Engineer: [email protected] Internal Eng Business Unit Group: ZFS Storage Appliance ReferencesSUNBUG:7132238Attachments This solution has no attachment |
||||||||||||
|