Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1395374.1
Update Date:2012-06-21
Keywords:

Solution Type  Troubleshooting Sure

Solution  1395374.1 :   Sun Storage 7000 Unified Storage System: Troubleshooting NDMP issues on the ZFS Storage Appliance  


Related Items
  • Sun Storage 7310 Unified Storage System
  •  
  • Sun Storage 7410 Unified Storage System
  •  
  • Sun ZFS Storage 7120
  •  
  • Sun Storage 7110 Unified Storage System
  •  
  • Sun ZFS Storage 7320
  •  
  • Sun ZFS Storage 7420
  •  
  • Sun Storage 7210 Unified Storage System
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>NAS>SN-DK: 7xxx NAS
  •  
  • .Old GCS Categories>Sun Microsystems>Storage - Disk>Unified Storage
  •  




In this Document
Purpose
Troubleshooting Steps
 Specific NDMP issues:
 o NDMP service not running or offline
 o ZFS NDMP snapshot left over
 o Zombie NDMP process
 o NDMP restore in progress at mountpoint
 o Unable to do tape cloning
 o NDMP slow tape cloning performance
 o Other NDMP related documents with symptom and solution:
References


Applies to:

Sun ZFS Storage 7320 - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 7410 Unified Storage System - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 7210 Unified Storage System - Version Not Applicable to Not Applicable [Release N/A]
Sun ZFS Storage 7120 - Version Not Applicable to Not Applicable [Release N/A]
Sun ZFS Storage 7420 - Version Not Applicable to Not Applicable [Release N/A]
7000 Appliance OS (Fishworks)

Purpose

General troubleshooting for NDMP related issues on the ZFS Storage Appliance.

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - 7000 Series ZFS Appliances

Troubleshooting Steps

There is a limited amount of troubleshooting that can be done when encountering NDMP type issues, other than repairing faulted hardware, or fixing misconfiguration issues). For NDMP service problems, restarting the NDMP service may help as a workaround. Review the ZFS storage appliance version and keep it update to ensure the latest NDMP code is running.
When necessary, engage Oracle Support via a Service Request with a Supportbundle.

To check NDMP service log:
BUI -- Configurations > Services > NDMP > logs

To restart NDMP service:
BUI -- Configuration > Services > Restart NDMP services


Further details are available in the ZFS storage appliance online HELP wiki:
https://<appliance-ip-address>:215/wiki/index.php/Configuration:Services:NDMP

Specific NDMP issues:

o NDMP service not running or offline

Symptom: NDMP service not running or offline
Check: BUI "configuration > Services > NDMP > logs" for a reason. This may be due to the problem discussed in Release Note RN032 which is still an issue as of 2011.1.3.0 appliance software. If this is the case you will see errors like the following in the NDMP log:

[ Feb 10 15:29:45 Stopping because service restarting. ]
[ Feb 10 15:29:45 Executing stop method (:kill). ]
[ Feb 10 15:30:45 Method or service exit timed out.  Killing contract 163. ]

Solution: restart ndmp service by disabling, then reenabling the service:

S7000> configuration services ndmp disable
S7000> configuration ndmp services enable

Engage Oracle Support if the reason is unknown.

o ZFS NDMP snapshot left over

Symptom: NDMP snapshot is consuming available space
This is a known problem with earlier versions of appliance software prior to 2011.1.1.0.
See <Bug 6995588> ndmpd leaking snapshots when zfs_release fails
For instructions on removing the NDMP snapshots manually see the Workaround section of the CR.
Solution: Update ZFS storage appliance firmware to 2011.1.1.0 or later to prevent the problem from occurring in the first place. It will be necessary to engage Oracle Support via a Service Request to clear down the NDMP snapshots.

o Zombie NDMP process

Symptom: Zombie NDMP process.
This is another known problem with appliance versions prior to 2011.1.1.0.
See <Bug 7019787> Ndmpd zfs restore creates zombie thread
To  help diagnose this problem capture a ndmpd core from the OS shell:

# gcore -o /var/ak/dropbox/ndmpd.gcore `pgrep ndmpd`

Send in supportbundle.
Check status of NDMP processes with prstat and pstack:

# prstat -Lp `pgrep ndmpd`
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID
26306 root 81M 70M cpu7 10 0 486:20:51 7.0% ndmpd/95
26306 root 81M 70M zombie 0 - 0:00:00 0.0% ndmpd/113
26306 root 81M 70M zombie 0 - 0:00:00 0.0% ndmpd/101
26306 root 81M 70M zombie 0 - 0:00:00 0.0% ndmpd/118
26306 root 81M 70M zombie 0 - 0:00:00 0.0% ndmpd/163

pstack `pgrep ndmpd`

----------------- lwp# 17 --------------------------------
** zombie (exited, not detached, not yet joined) **
----------------- lwp# 28 --------------------------------
feed04b7 nanosleep (fc18c600, fc18c608)
feebc369 sleep (1, 812798c, 80cf2f4, 8096246) + 31
0809627e ndmp_stop_reader_thread (fc18c910, fc18c68c, fc18c660, 0) + 7e
08093715 ndmpd_tar_backup_abort_v3 (8b43608, 81279ac, 812798c, 1000001) + 31
08073561 ndmpd_data_abort_v3 (8b19488) + 91
0806e253 ndmp_process_messages (8b19488, 0, fc18c728, fee7baba) + 107
----------------- lwp# 32 --------------------------------
** zombie (exited, not detached, not yet joined) **


Solution: Update ZFS storage appliance firmware to 2011.1.1.0 to prevent further occurrences, in the interim restart NDMP service

o NDMP restore in progress at mountpoint

Symptom: BUI is reporting a mount error:

"The following errors were encountered while trying to share this filesystem.
As a result, the data may not be available over all protocols.
NDMP restore in progress at mountpoint"

Solution: a workaround is to reboot the NAS to bring back shares or alternatively engage Oracle Support via a Service Request.


This is likely <Bug 7101959> AKTXT_NAS_NDMP_BACKUP_INUSE NDMP restore in progress at mountpoint

The workaround is to mount the volume and reshare the shares.
Remote access is required to bring back the shares.

# zfs mount pool-0/local/ora-proj/ora-share
# share /export/ora-proj/ora-share

Alternatively restart akd or reboot the head.
In cluster, ensure both cluster heads are either in CLUSTERED, OWNER or STRIPPED
before restarting akd.

o Unable to do tape cloning

Symptom: Successful NDMP full backup but tape cloning failed.
Solution: Check and resolve DNS server configuration to include ZFS storage appliance forward and reverse IP address resolution.

aksh> nslookup <ZFS-SA hostname>
aksh> nslookup <ZFS-SA ip address>

For further details see <Document 1326200.1> NDMP tape cloning failed to establish connection.

o NDMP slow tape cloning performance

Symptom: Slow tape cloning activity when compared to normal backup speed.
Solution: Review Appliance Software Update Release Notes for NDMP performance fixes, in the interim engage Oracle Support via a Service Request.

For further details see <Document 1363330.1> Sun Storage 7000 Unified Storage System: Low throughput during ndmp tape cloning.


Solution: Updated NDMP code is planned for a 2011 appliance version minor release.

See <Document 1363330.1> Low throughput during ndmp tape cloning
Related CR:
<Bug 7048384> Small mover record size
<Bug 7032986> A timeout delay in the mover
<Bug 7033953> Single threaded mover reads/writes

o Other NDMP related documents with symptom and solution:

- Document 1464359.1 Sun Storage 7000 Unified Storage System - How to ensure that the NDMP Service uses the correct IP address for the NDMP datapath

- Document 1368914.1 Sun ZFS Storage Appliance: NDMP Backup Server encounters Tape IO Error during LTO-5 tape drive initial access, places drive in "Append Only Mode"

- Document 1214656.1 Sun Storage 7000 Unified Storage System: NDMP SCSI Tape performance degradation

- Document 1403222.1 Sun Storage 7000 Unified Storage System: How to backup clones with NDMP to get them restored later without the need of parent snapshot


Back to NDMP troubleshooting resolution path <Document 1387930.1>

References

<NOTE:1326200.1> - Sun Storage 7000 Unified Storage System: NDMP tape cloning failed to establish connection.
<NOTE:1363330.1> - Sun Storage 7000 Unified Storage System: Low throughput during ndmp tape cloning
<NOTE:1387930.1> - Sun Storage 7000 Unified Storage System: How to Troubleshoot NDMP Issues
@ <BUG:6995588> - NDMPD LEAKING SNAPSHOTS WHEN ZFS_RELEASE FAILS
@ <BUG:7019787> - NDMPD ZFS RESTORE CREATES ZOMBIE THREAD
@ <BUG:7032986> - NDMP THREE-WAY RESTORE TAKES TOO LONG
@ <BUG:7033953> - NDMP 3-WAY BACKUP NEEDS PERFORMANCE IMPROVEMENT
@ <BUG:7048384> - NDMPD MOVER RECORD SIZE OF 512K IS LIMITING THE USE OF LARGER TAPE BLOCK.
@ <BUG:7101959> - 2010.Q3.4 AKTXT_NAS_NDMP_BACKUP_INUSE NDMP RESTORE IN PROGRESS AT MOUNTPOINT
2011.1.3.0 Release Notes: https://wikis.oracle.com/display/FishWorks/ak-2011.04.24.3.0+Release+Notes

Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback