Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1431648.1
Update Date:2012-03-07
Keywords:

Solution Type  Sun Alert Sure

Solution  1431648.1 :   Failed Disk in a ZFS Storage Pool May Erroneously Consume Multiple Spare Disks Which Fail to Detach after Resilvering  


Related Items
  • Sun Storage 7410 Unified Storage System
  •  
  • Sun ZFS Storage 7320
  •  
  • Sun Storage 7210 Unified Storage System
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Sun ZFS Storage 7420
  •  
  • Sun Software - Generic
  •  
  • Sun Storage 7110 Unified Storage System
  •  
  • Sun ZFS Storage 7120
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun Alert
  •  
  • .Old GCS Categories>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  




In this Document
  Description
  Likelihood of Occurrence
  Possible Symptoms
  Workaround or Resolution
  Modification History
  References


Applies to:

Sun ZFS Storage 7320 - Version: Not Applicable and later   [Release: N/A and later ]
Sun ZFS Storage 7420 - Version: Not Applicable and later    [Release: N/A and later]
Sun Software - Generic - Version: Not Applicable and later    [Release: N/A and later]
Sun Microsystems > Storage - Disk > Unified Storage
Sun Microsystems > Storage Software
Information in this document applies to any platform.
_________________________________



Date of Resolved Release: 07-Mar-2012
_________________________________

Description


Systems with a failed disk in a ZFS storage pool may erroneously consume multiple spare disks due to an interaction between the fault management infrastructure and the resilvering operation. These spares then fail to be detached after the resilvering process has completed, and will not be available for subsequent failures. The resulting ZFS storage pool structure then needs to be manually cleaned up to make the spare disks available again.

Likelihood of Occurrence


This issue can occur in the following releases:
  • Sun ZFS Storage Appliance Software 2010.Q3.1.0 to 2010.Q3.3.1
Notes:

1. This issue applies to all Sun ZFS Storage Appliance platforms:
  • Sun ZFS Storage Appliance 7110, 7120, 7210, 7310, 7410, 7320, 7420
2. To determine the software release on Appliance systems, do the following from the Browser User Interface (BUI) to access "info" about the release name:
a) Navigate to: Maintenance ->  System
b) Click on the "i" next to the "Current System Software" entry in the table of available releases.
A pop-up will show the release. For example: "2010.Q3.3.1"

Possible Symptoms


The following symptoms may occur with this issue:
- Extremely long resilver times for the replaced disk
- Incorrect maintenance procedures such as physically removing drives without the prescribed administrative commands
- Failing components that cause multiple errors and faults
The following is an example "zpool status" output which shows the problem. In this example, the disk referenced as "c4t5000C5001A654F3Cd0" has failed, and been physically replaced. First the spare disc referenced as "c4t5000C5001A48520Dd0" was used to replace the failed disk, and subsequently, the spare disk referenced as "c4t5000C5001A402290d0" was also added to the ZFS Storage Pool configuration erroneously.

Note: Many forms of the invalid pool structure are possible.

In the following example, note that resilvering of the pool was still in progress at the time that the second spare was created:
pool: Pool_1
state: DEGRADED
scan: resilvered 42.7G in 2h18m with 0 errors on Fri Nov 12 12:03:57 2010
config:
NAME                                           STATE     READ WRITE CKSUM
   Pool_1                                    DEGRADED     0     0     0
raidz2-0                                     DEGRADED     0     0     0
  c4t5000C5001A6FBE6Ed0                      ONLINE       0     0     0
  c4t5000C5001A63F66Ad0                      ONLINE       0     0     0
  c4t5000C5001A69DECFd0                      ONLINE       0     0     0
  c4t5000C5001A69FE19d0                      ONLINE       0     0     0
  c4t5000C5001A71FB32d0                      ONLINE       0     0     0
  c4t5000C5001A533E93d0                      ONLINE       0     0     0
  c4t5000C50026518A89d0                      ONLINE       0     0     0
  c4t5000C5001A534BB6d0                      ONLINE       0     0     0
  c4t5000C5001A651A9Fd0                      ONLINE       0     0     0
  spare-9                                    DEGRADED     0     0     0
    replacing-0                              DEGRADED     0     0     0
      spare-0                                DEGRADED     0     0     0
        c4t5000C5001A654F3Cd0                UNAVAIL      0     0
0  cannot open
        c4t5000C5001A402290d0                ONLINE       0     0     0
      c4t5000C5001A0FE201d0                  ONLINE       0     0     0
    c4t5000C5001A48520Dd0                    ONLINE       0     0     0
raidz2-1                                     ONLINE       0     0     0
  c4t5000C5001A655CEAd0                      ONLINE       0     0     0
  c4t5000C5001A730BA4d0                      ONLINE       0     0     0
  c4t5000C5001A5347F1d0                      ONLINE       0     0     0
  c4t5000C5001A5352F4d0                      ONLINE       0     0     0
  c4t5000C5001A5354B6d0                      ONLINE       0     0     0
  c4t5000C5001A5365E6d0                      ONLINE       0     0     0
  c4t5000C50026520201d0                      ONLINE       0     0     0
  c4t5000C5001A6980EDd0                      ONLINE       0     0     0
  c4t5000C5001A7311D5d0                      ONLINE       0     0     0
  c4t5000C5001A7386D7d0                      ONLINE       0     0     0
logs
mirror-2                                     ONLINE       0     0     0
  c4tATASTECZEUSIOPS018GBYTESSTM0000C00D8d0  ONLINE       0     0     0
  c4tATASTECZEUSIOPS018GBYTESSTM0000E5994d0  ONLINE       0     0     0
cache
c0t0d0                                       ONLINE       0     0     0
c0t1d0                                       ONLINE       0     0     0

spares
c4t5000C5001A48520Dd0                        INUSE     currently in use
c4t5000C5001A402290d0                        INUSE     currently in use

Workaround or Resolution


There is no workaround for this issue.

This issue is addressed in the following release:

For all Sun Storage 7000 Series Unified Storage Systems
  • Sun ZFS Storage Appliance Software 2010.Q3.4.0 and later

For a listing of ZFS Storage Appliance Sofware Releases and version information, please see:
https://wikis.oracle.com/display/FishWorks/Software+Updates


Modification History

07-Mar-2012: Date of Resolved Release


See Also CR 6981518 (Integrated into ak-2010Q3.2.1
 - The fix for 6981518 addressed the greatest part of the failure window.
- The fix for 6999699 fixes other codepaths which could cause the same result
On some systems (eg: ZFSSA), further steps may be required.

In the following example, "c0t5000C500104FF387d0" would be replaced with the disk ID as reported by zpool status, and the subsequent values returned by mdb will differ from system to system.

An example of how to find the GUID of the drive:

  $>mdb -k
  > ::spa -v ! grep c0t5000C500104FF387d0
    ffffff84dc1992c0 CANT_OPEN OPEN_FAILED        /dev/dsk/c0t5000C500104FF387d0s0
  > ffffff84dc1992c0::print vdev_t vdev_guid | =E
               782195572428017554

The zpool command is:
zpool detach pool01a 782195572428017554
So far this has not been reported on Solaris.

Solaris 11: The resilver and disk replacement code was first putback into build snv_143, and this issue was fixed in Build snv_152.

Please send technical questions to:
[email protected]
and copy the Responsible Engineer/Contributor listed

Internal Eng Business Unit Group: Systems RPE
Oracle Knowledge Analyst: [email protected]
Internal Contributor/Submitter: [email protected]
Internal Eng Responsible Engineer: [email protected]
Internal Services Knowledge Engineer: [email protected]
Internal Eng Business Unit Group: Systems RPE
Internal Pending Patches: N/A
Internal Resolution Patches:N/A
Internal Escalation IDs:
3-2595143342, 3-2741145531, 3-3042257831, 3-3203745781, 3-3216724052,
3-3225248231, 3-3259602063, 3-3287004951, 3-3314126521, 3-3332566033,
3-3332573361, 3-3337778356, 3-3348689479, 3-3354554281, 3-3379144221,
3-3396092301, 3-3409014410, 3-3409089446, 3-3420341331, 3-3424489243,
3-3424569914, 3-3425870723, 3-3428019116, 3-3433841401

References

<SUNBUG:6999699>

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback