Sun Storage 7000 Unified Storage System: Performance issue when pool is almost Full

Asset ID:	1-72-1392492.1
Update Date:	2012-05-29
Keywords:

Solution Type Problem Resolution Sure

Solution 1392492.1 : Sun Storage 7000 Unified Storage System: Performance issue when pool is almost Full

Applies to:

Sun Storage 7110 Unified Storage System - Version Not Applicable to Not Applicable [Release N/A]
Sun ZFS Storage 7120 - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 7410 Unified Storage System - Version Not Applicable to Not Applicable [Release N/A]
Sun ZFS Storage 7320 - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 7210 Unified Storage System - Version Not Applicable to Not Applicable [Release N/A]
7000 Appliance OS (Fishworks)
NAS head revision : [not dependent]
BIOS revision : [not dependent]
ILOM revision : [not dependent]
JBODs Model : [not dependent]
CLUSTER related : [not dependent]

Symptoms

Performance problems when storage pools are beyond 80% of usage.
Storage pools at more than 80% capacity may experience degraded I/O performance, especially when performing write operations. This degradation can become severe when the pool exceeds 96% full and can result in impaired manageability as the free space available in the storage pool approaches zero. When the capacity crosses the 80% threshold, we start using a better fit algorithm for writes, so it is a little slower. When we cross 96% full we use the best fit algorithm and writes are a lot slower. All storage systems get slower as they get full.

- Slow BUI/CLI operation, management hangs
- Very lengthy boot times
- Connected NFS or CIFs or iSCSI client timeout events due to increased latency
- Slow re-silvering times on Storage Pools following drive failures/replacements
- Deleting data files causing Command line and or BUI timeouts events
- Primary database backups via NDMP failing, because a NAS filesystem mount points showing it's 100% used.
- Inability to cancel an in-progress scrub on a pool
- Very lengthy or indefinite delays while restarting services such as NFS and SMB.

Observe

- Pool usage

cli> status show Storage: pool-0: Used 8.58T bytes Avail 496B bytes State online Compression: 1x Dedup: 1.03x

- Share usage

Here, default is the project, fredfs01 is a filesystem inside the project.

cli> shares select default select fredfs01 show ... space_available = 495G space_total = 49K root_group = other

From OS shell check:

# zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH pool-0 9.06T 8.58T 495G 94% 1.03x ONLINE # zfs list -o space NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD mypool01 8.92T 612K 0 31K 0 582K mypool01/local 8.92T 111K 0 31K 0 80K mypool01/local/default 8.92T 80K 0 31K 0 49K mypool01/local/default/fredfs01 8.92T 49K 18K 31K 0 0

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - 7000 Series ZFS Appliances

Cause

Storage Pools with a Capacity approaching full usage.

When a pool is close to full, the ZFS algorithm to look for space differs because it has to deal with any free (and small) blocks to retrieve some space.

When ZFS experiences pool fragmentation, higher IO and CPU consumption will occur as well during the search for suitable blocks for write operations.

Using mdb, look for routines such as "metaslab_alloc", "metaslab_ff_alloc", and "metaslab_activate, metaslab_passivate" as the highest consumers of CPU cycles.

> ::stacks -c space_map_load_wait

THREAD STATE SOBJ COUNT

ffffff007cd61c60 SLEEP CV 8

swtch+0x147

cv_wait+0x61

space_map_load_wait+0x2e

metaslab_activate+0x60

metaslab_group_alloc+0x246

metaslab_alloc_dva+0x2a6

metaslab_alloc+0x9c

zio_dva_allocate+0x57

zio_execute+0x89

taskq_thread+0x1

Solution

Administrators running into this type of performance problem with Arrays running Storage pools that are almost full, are advised to immediately seek to reduce their zpool utilization by removing older unwanted data including snapshots or to increase the size of their pools by adding further disk trays.

There are improvements in later Appliance code versions in the way ZFS looks for space when the pool becomes highly used. Check the current level of Array Appliance firmware running and upgrade to the latest Code:

For latest appliance software revisions see https://wikis.oracle.com/display/FishWorks/Software+Updates

There are improvements in the later Appliance versions in the way ZFS handles internal spacemap load and zio_wait algorithm's to improve performance when pool space usage is closer to 100%.

Back to <Document 1331769.1> Sun Storage 7000 Unified Storage System: How to Troubleshoot Performance Issues.

References

<NOTE:1331769.1> - Sun Storage 7000 Unified Storage System: How to Troubleshoot Performance Issues
APPLIANCE SOFTWARE WIKI PAGE: HTTPS://WIKIS.ORACLE.COM/DISPLAY/FISHWORKS/SOFTWARE+UPDATES
<BUG:6525233> - VDEV FULLNESS CAN DEGRADE PERFORMANCE, SHOULD CAUSE ZPOOL TO BECOME DEGRADED
<BUG:6975500> - CAN'T STOP ZFS SCRUB WHEN POOL IS FULL

Attachments

This solution has no attachment