Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1437153.1
Update Date:2012-04-03
Keywords:

Solution Type  Sun Alert Sure

Solution  1437153.1 :   ZFS Storage Appliances With Certain HBAs May Experience Disk Faults and Should be Updated to 2011.1.2.1 Software  


Related Items
  • Sun Storage 7410 Unified Storage System
  •  
  • Sun Storage 7210 Unified Storage System
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Sun Software - Generic
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun Alert
  •  
  • .Old GCS Categories>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  




In this Document
  Description
  Likelihood of Occurrence
  Possible Symptoms
  Workaround or Resolution
  Modification History
  References


Applies to:

Sun Microsystems > Storage - Disk > Unified Storage
Sun Microsystems > Storage Software
Sun Storage 7210 Unified Storage System - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 7310 Unified Storage System - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 7410 Unified Storage System - Version: Not Applicable and later    [Release: N/A and later]
Information in this document applies to any platform.
___________________________________

SUNBUG:7132238

Date of Workaround release: 16-Mar-2012

Date of Resolved Release: 03-Apr-2012
___________________________________

Description


After updating 7210, 7310 or 7410 Storage Appliances to the 2011.1.1.0 or 2011.1.1.1 Storage Appliance Software releases, systems with SAS-1 HBAs and J4400 or J4500 disk shelves may experience multiple false disk failures after an initial real disk fault. The issue is triggered by manual and automatic (phone home) support bundle creation related to diagnosing the initial disk fault. The issue can cause storage pool redundancy characteristics to be degraded and the Storage Appliance Software BUI and CLI to be unresponsive.

Likelihood of Occurrence


This issue can occur on the following:

Sun ZFS 7000 Storage Appliance platforms:
  • Sun ZFS 7210 Storage Appliance
  • Sun ZFS 7310 Storage Appliance
  • Sun ZFS 7410 Storage Appliance
for the above platforms:

      - with SAS-1 HBAs (includes revisions B3 and C0)
      - with Sun Storage J4400 or J4500 SAS disk shelves
      - with ZFS Storage Appliance Software 2011.1.1.0 or 2011.1.1.1

Notes
:

1. Sun ZFS platforms 7110, 7120, 7320, and 7420 are not affected by this issue.

2. To determine the current Storage Appliance Software revision, run the following command:
7000:> maintenance system updates list
UPDATE                                       DATE                               STATUS
[email protected],1-1.8   2011-12-21 22:32:50     current
or:

Do the following from the Browser User Interface (BUI) to access "info" about the release name:
a) Navigate to: Maintenance ->  System
b) Click on the "i" next to the "Current System Software" entry in the table of available releases.
A pop-up will show the release, for example: "2010.Q3.4.2"

3. The issue will only occur when the SAS-1 HBA is attached to a J4400 or J4500 disk shelf, so only the disk shelf model needs to be checked. The following command can be run prior to a software update from the software CLI to determine if the system has a J4400 or J4500 Disk Shelf. For example:
7000:> maintenance hardware select chassis-001 show
Properties:
                         name = 0845QAK004
                       faulted = false
            manufacturer = Sun Microsystems, Inc.
                        model = J4400
                         serial = 0845QAK004
                     revision = 3R53
                           type = storage
                           rpm = 7200
                          path = 1
                       locate = false

Possible Symptoms


Storage pool redundancy characteristics can be degraded due to one or more disk faults. Normally, several false disk faults will happen after an initial real disk failure occurs. The "Configuration::Storage" screen can be used to determine if a pool is degraded, while the "Maintenance::Hardware" screen can be used to view any faulted disk drives. In addition, the Storage Appliance Software BUI and CLI will normally become unresponsive when this issue occurs.

Workaround or Resolution


This issue is addressed in the following release:
  • ZFS Storage Appliance Software 2011.1.2.1 or later
If the systems are already running Storage Appliance Software release 2011.1.1.0 or 2011.1.1.1 but have NOT experienced any symptoms, it is recommended that the systems be updated to the 2011.1.2.1 release immediately using standard update procedures. This issue is triggered by automatic (phone home) and manual support bundles, so support bundles should not be performed and the phone home service should be disabled until the update is complete.
 
For customers that ARE experiencing the issue, the following procedure should be used to update the systems to the AK 2011.1.2.1 release. These steps should be done during a maintenance window without any client activity. This issue is triggered by automatic (phone home) and manual support bundles, so support bundles should not be performed until the update is complete.
 
1. Power off the storage appliance controller from the SP console (both heads in a cluster configuration).
 
For example:
-> stop /SYS -f
Are you sure you want to immediately stop /SYS (y/n)? y
2. Physically power off all disk shelves. Wait 30 seconds. Power on all disk shelves.
 
3. Power on the storage appliance controller from the SP console (just one head in a cluster configuration).
 
For example:
-> start /SYS
4. Turn off the phone home service, and cancel any active support bundles.
 
For example:
7000:> configuration services scrk disable
7000:> maintenance system bundles select 23eb4cc8-edd2-6a26-f2a4-b1cdf54a68e cancel
5. Update to the AK 2011.1.2.1 release. If the update health checks find any single path or other issues, repeat the procedure starting at Step 1. If update health checks cannot be resolved, contact Oracle Support.
 
For example:
7000:> maintenance system updates select [email protected],1-1.15 upgrade
6. After the update is complete, go to "Maintenance::Problems" and mark any disk or HBA issues repaired. Normally only one issue was real and it will be re-detected automatically if it occurs again.
 
For example:
7000:> maintenance problems select problem-000 markrepaired
7. In a clustered configuration, perform steps 3 thru 6 on the other controller head.
 
After the update is complete, the phone home service may be re-enabled and supported bundles may be taken as needed and the storage appliance may be used as normal. 
 
If you are not able to update the software on your own, contact Oracle Support for assistance.

For a listing of ZFS Storage Appliance Sofware Releases and version information, see:
https://wikis.oracle.com/display/FishWorks/Software+Updates

Example screen capture of ZFS Storage Appliance (ZFSSA) Software GUI below:



Modification History

16-Mar-2012: Date of Workaround release
03-Apr-2012: Update Description, Occurrence, Symptoms, and Workaround/Resolution - issue is Resolved



The zpool status command can be used to view the status of the pool and determine
if several disks are faulted.

7410# zpool status

  pool: pool-1
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Thu Jan 26 04:58:37 2012
    1023G scanned out of 5.59T at 650M/s, 2h3m to go
    173G resilvered, 17.87% done
config:

        NAME                                         STATE     READ WRITE CKSUM
        pool-1                                       DEGRADED     0     0     0
          mirror-0                                   ONLINE       0     0     0
            c4t5000C50015BD3146d0                    ONLINE       0     0     0
            c4t5000C50015A1F4FAd0                    ONLINE       0     0     0
            c4t5000C50015A29579d0                    ONLINE       0     0     0  (resilvering)
          mirror-1                                   ONLINE       0     0     0
            c4t5000C50015B054FCd0                    ONLINE       0     0     0
            c4t5000C50015B4D829d0                    ONLINE       0     0     0
            c4t5000C50015A2A714d0                    ONLINE       0     0     0  (resilvering)
          mirror-2                                   ONLINE       0     0     0
            c4t5000C50015C85D98d0                    ONLINE       0     0     0
            c4t5000C50015AEA493d0                    ONLINE       0     0     0  (resilvering)
            c4t5000CCA396DFA143d0                    ONLINE       0     0     0
          mirror-3                                   DEGRADED     0     0     0
            c4t5000C50015BACD5Bd0                    ONLINE       0     0     0
            c4t5000C500268C6398d0                    ONLINE       0     0     0  (resilvering)
            replacing-2                              DEGRADED     0     0     0
              c4t5000C50015A34E59d0                  FAULTED      0     0     0  too many errors
              c4t5000C50015BB3195d0                  ONLINE       0     0     0  (resilvering)
          mirror-4                                   ONLINE       0     0     0
            c4t5000C50015C06936d0                    ONLINE       0     0     3  (resilvering)
            c4t5000C50015BA853Cd0                    ONLINE       0     0     0
            c4t5000C50015BAD1B4d0                    ONLINE       0     0     0
          mirror-5                                   DEGRADED     0     0     0
            c4t5000C50015B07195d0                    FAULTED      0     0     0  too many errors
            c4t5000C50015BA8814d0                    ONLINE       0     0     0
            c4t5000C50015A34F57d0                    FAULTED      0     0     0  too many errors
          mirror-6                                   ONLINE       0     0     0
            c4t5000C5001951DFB4d0                    ONLINE       0     0     2  (resilvering)
            c4t5000C50015B06692d0                    ONLINE       0     0     0
            c4t5000C50015B08592d0                    ONLINE       0     0     2  (resilvering)
          mirror-7                                   DEGRADED     0     0     0
            c4t5000C50015C6D612d0                    ONLINE       0     0     0
            c4t5000C50015C5EA09d0                    ONLINE       0     0     0
            spare-2                                  DEGRADED     0     0     0
              c4t5000C50015A329FCd0                  FAULTED      0     0     0  too many errors
              c4t5000C50019512713d0                  ONLINE       0     0     0  (resilvering)
          mirror-8                                   ONLINE       0     0     0
            c4t5000C50019511BC8d0                    ONLINE       0     0     0
            replacing-1                              ONLINE       0     0     0
              c4t5000C50015BA98DBd0                  ONLINE       0     0     0
              c4t5000C50015BB1B66d0                  ONLINE       0     0     0  (resilvering)
            c4t5000C50015CE8A47d0                    ONLINE       0     0     0
          mirror-9                                   ONLINE       0     0     0
            c4t5000C50015BB8730d0                    ONLINE       0     0     0
            c4t5000C50015BA838Ad0                    ONLINE       0     0     0
            c4t5000C50015A654C7d0                    ONLINE       0     0     0
          mirror-10                                  ONLINE       0     0     0
            c4t5000C50019511DADd0                    ONLINE       0     0     0
            c4t5000C50015CF74DCd0                    ONLINE       0     0     0
            c4t5000C50015CE0BA8d0                    ONLINE       0     0     0
          mirror-11                                  ONLINE       0     0     0
            c4t5000C50015BAA4C3d0                    ONLINE       0     0     0
            c4t5000C5001957A58Bd0                    ONLINE       0     0     0
            c4t5000C50015AD5D11d0                    ONLINE       0     0     0
          mirror-12                                  DEGRADED     0     0     0
            c4t5000C50019513E61d0                    ONLINE       0     0     0
            spare-1                                  UNAVAIL      0     0     0  insufficient replicas
              c4t5000C5001950F3D3d0                  FAULTED      0     0     0  too many errors
              c4t5000C50015BB3195d0                  FAULTED      0     0     0  corrupted data
            c4t5000C50015BAC62Bd0                    ONLINE       0     0     0
          mirror-13                                  DEGRADED     0     0     0
            c4t5000C50019513CBFd0                    ONLINE       0     0     0
            c4t5000C50015ADB62Dd0                    FAULTED      0     0     0  too many errors
          mirror-14                                  DEGRADED     0     0     0
            c4t5000C50019511B25d0                    ONLINE       0     0     0
            c4t5000C5002693BCA2d0                    FAULTED      0     0     0  too many errors
            c4t5000C50015BACED5d0                    FAULTED      0     0     0  too many errors
          mirror-15                                  ONLINE       0     0     0
            c4t5000C50019517FB8d0                    ONLINE       0     0     0
            c4t5000C50015BAA88Bd0                    ONLINE       0     0     0  (resilvering)
            c4t5000C500195117D1d0                    ONLINE       0     0     0
          mirror-16                                  ONLINE       0     0     0
            c4t5000C500195143DEd0                    ONLINE       0     0     0
            c4t5000C5001A732C28d0                    ONLINE       0     0     0
            c4t5000C50015BAE31Bd0                    ONLINE       0     0     0  (resilvering)
          mirror-17                                  DEGRADED     0     0     0
            c4t5000C500195187A0d0                    ONLINE       0     0     0
            spare-1                                  UNAVAIL      0     0     0  insufficient replicas
              c4t5000C50015BAC4CEd0                  FAULTED      0     0     0  too many errors
              c4t5000C50015BB1B66d0                  FAULTED      0     0     0  corrupted data
            c4t5000C50019512471d0                    ONLINE       0     0     0
        logs
          c4tATASTECZEUSIOPS018GBYTESSTM0000C3AEAd0  ONLINE       0     0     0
          c4tATASTECZEUSIOPS018GBYTESSTM0000D0CE9d0  ONLINE       0     0     0
        cache
          c0t0d0                                     ONLINE       0     0     0
        spares
          c2t5000C50015BB1B66d0                      FAULTED   corrupted data
          c2t5000C50019512713d0                      INUSE     currently in use
          c2t5000C50015BB3195d0                      FAULTED   corrupted data
          c4t5000C50015BACD4Ed0                      AVAIL

errors: No known data errors


See CR 7132238 and 7146187 for more information.

Please send technical questions to:
[email protected]
and copy the Responsible Engineer/Contributor listed

Internal Contributor/Submitter: [email protected]
Internal Eng Responsible Engineer: [email protected]
Internal Services Knowledge Engineer: [email protected]
Internal Eng Business Unit Group: ZFS Storage Appliance

References

SUNBUG:7132238

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback