Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1377069.1
Update Date:2011-12-01
Keywords:

Solution Type  Problem Resolution Sure

Solution  1377069.1 :   Sun Storage 7000 Unified Storage System: Shadow Migration Copy Performance Is Slow  


Related Items
  • Sun Storage 7410 Unified Storage System
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Sun ZFS Storage 7120
  •  
  • Sun ZFS Storage 7320
  •  
  • Sun Storage 7110 Unified Storage System
  •  
  • Sun ZFS Storage 7420
  •  
  • Sun Storage 7210 Unified Storage System
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>NAS>SN-DK: 7xxx NAS
  •  
  • .Old GCS Categories>Sun Microsystems>Storage - Disk>Unified Storage
  •  




In this Document
  Symptoms
  Cause
  Solution
  References


Created from <SR 3-4904244611>

Applies to:

Sun ZFS Storage 7320 - Version: Not Applicable to Not Applicable - Release: N/A to N/A
Sun ZFS Storage 7120 - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Sun Storage 7410 Unified Storage System - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Sun Storage 7310 Unified Storage System - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Sun Storage 7210 Unified Storage System - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
7000 Appliance OS (Fishworks)

Symptoms

Sun Storage 7000 Unified Storage System array Shadow Migration copy jobs have been observed via the BUI to be running for many days, even weeks in some extreme cases. The copy is still going on in the background for the share in question and the operation is taking longer than expected.

Shadow Migration Supports NFS filesystems only at this time, use NFS v4 for best results. 


Cause

As long as Shadow Migration is making progress, even if it is slow, there isn't a lot that can be done to speed it up.
If a share to be migrated contains lots (thousands or Millions) of little files and/or has lots of subdirectories, you probably don't want to use Shadow Migration, as this will take a long time to complete. Consider other options such as rsync. Shadow Migration just wasn't built for speed or performance. It was built for completeness and to complete seamlessly in the background.

Monitoring progress of a Shadow Migration is difficult given the context in which the operation runs. A single filesystem can shadow all or part of a filesystem, or multiple filesystems with nested mountpoints. As such, there is no way to request statistics about the source and have any confidence in them being 100% accurate. In addition, even with migration of a single filesystem, the methods used to calculate the available size is not consistent across systems. For example, the remote filesystem may use compression, or it may or not include the meta data overhead. For these reasons, it's impossible to display an accurate progress bar for any particular migration.

The appliance provides the following information that is guaranteed to be accurate:

*Local size of the local filesystem so far
*Logical size of the data copied so far
*Time spent migrating data so far

These values are made available in the BUI and CLI through both the standard filesystem properties as well as properties of the Shadow Migration node (or UI panel). If you know the size of the remote filesystem, you can use this to estimate progress. The size of the data copied consists only of plain file contents that needed to be migrated from the source. Directories, meta data, and extended attributes are not included in this calculation. While the size of the data migrated so far includes only remotely migrated data, resuming background migration may traverse parts of the filesystem that have already been migrated. This can cause it to run fairly quickly while processing these initial directories, and slow down once it reaches portions of the filesystem that have not yet been migrate.

While there is no accurate measurement of progress, the appliance does attempt to make an estimation of remaining data based on the assumption of a relatively uniform directory tree. This estimate can range from fairly accurate to completely worthless depending on the set of data, and is for information purposes only. For example, one could have a relatively shallow filesystem tree but have large amounts of data in a single directory that is visited last. In this scenario, the migration will appear almost complete, and then rapidly drop to a very small percentage as this new tree is discovered. Conversely, if that large directory was processed first, then the estimate may assume that all other directories have a similarly large amount of data, and when it finds them mostly empty the estimate quickly rises from a small percentage to nearly complete. The best way to measure progress is to setup a test migration, let it run to completion, and use that value to estimate progress for filesystem of similar layout and size.

Solution

As long as the shadow migration job is making progress, even if it is slow, there isn't a lot that can be done. Shadow migration just wasn't built for speed. It was built for completeness and to be seamless.

Increasing Shadow Migration Performance:

  1. Reduce the number of Shadow Migration filesystems being transferred at one time.
  2. Be aware that filesystems with Large numbers of Small files within a share to be migrated causes increased latency in transfer and increases time to completion.
  3. UTF8 file rejection can cause the Shadow Migration job to not complete. Enable UTF8 file rejection
  4. One major bug in this area has now been fixed and will be released in the next major update 2011.1.
    CR: 6975601 changing shadow migration threads or canceling a migration can lead to a Kernel deadlock and may require a restart of the akd appliance process.
  5. Possible options here are to increase the number of threads available for Shadow Migration:

EXAMPLE procedure:


CLI>:configuration services shadow> show
Properties:
<status> = online
threads = 8

CLI>:configuration services shadow> set threads=16
threads = 16 (uncommitted)
CLI>:configuration services shadow> commit
CLI>:configuration services shadow> show
Properties:
<status> = online
threads = 16


The Advice here is to increase this thread value in stages and try to gauge the impact on other services and array functionality first before increasing it again

However, See major Bug above (point 4.) ... Increasing the number of threads would give greater resources to shadow migration but it would also take away resources that may be needed for more critical work. But can potentially lead to deadlock issues and hangs, if not running appliance firmware 2011.1.0

Checking over supplied Support Bundle data from customers who have reported this type of situation has confirmed there are no problems or errors or alerts and no failures or FM events reported that would account for slow Shadow Migration progress.  Array's are functioning correctly, just very slowly in terms of progress with Shadow Migration.

If a Shadow migration job has been started and is taking a long time, you need to be patient and just let it complete. Dependent on multiple factors like incoming load or other requests and the amount and/or kind of data to copy it could take up to several weeks. Shadow Migration is a background function and will always be given lower priority in the Kernel than serving new IO for client requests.


The following section contains internal information, do not share with customers.

Useful shell commands:
#df -h | grep shadow
#df -h | grep shadow | wc -l
#iostat -xcnz

Possible influence of open bugs:
Bug: 6985747 Improving shadow migration pending list processing.
Bug: 6963751 shadow migration from netapp -> 7310 drops off to trickle
Bug: 6967206 migrating fs having large number of smaller files cause appliance to hang
Bug: 6988343 Need a summary for all shadow migration volume

References

<NOTE:1213714.1> - Sun ZFS Storage Appliance: Performance clues and considerations
<NOTE:1213705.1> - Sun Storage 7000 Unified Storage System: Performance issues - Framing the problem
<BUG:6985747> - IMPROVING SHADOW MIGRATION PENDING LIST PROCESSING.
<BUG:6963751> - SHADOW MIGRATION FROM NETAPP -> 7310 DROPS OFF TO TRICKLE
<BUG:6967206> - MIGRATING FS HAVING LARGE NUMBER OF SMALLER FILES CAUSE APPLIANCE TO HANG
<BUG:6988343> - NEED A SUMMARY FOR ALL SHADOW MIGRATION VOLUME

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback