Asset ID: |
1-72-1451919.1 |
Update Date: | 2012-05-03 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1451919.1
:
Sun Storage 7000 Unified Storage System: Replication source node gives error "ak_notification_wait failed: remote side doesn't recognize notification cookie"
Related Items |
- Sun Storage 7310 Unified Storage System
- Sun Storage 7410 Unified Storage System
- Sun ZFS Storage 7120
- Sun ZFS Storage 7420
- Sun Storage 7110 Unified Storage System
- Sun ZFS Storage 7320
- Sun Storage 7210 Unified Storage System
|
Related Categories |
- PLA-Support>Sun Systems>DISK>NAS>SN-DK: 7xxx NAS
|
In this Document
Created from <SR 3-5332907001>
Applies to:
Sun Storage 7210 Unified Storage System - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 7310 Unified Storage System - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 7410 Unified Storage System - Version Not Applicable to Not Applicable [Release N/A]
Sun ZFS Storage 7120 - Version Not Applicable to Not Applicable [Release N/A]
Sun ZFS Storage 7320 - Version Not Applicable to Not Applicable [Release N/A]
Information in this document applies to any platform.
Replication configuration might be sensible to changes in resources assignments, when in cluster configurtion.
Symptoms
Replication from source to target.
The source reports the error "ak_notification_wait failed: remote side doesn't recognize notification cookie".
On the target there is no error and replication seems to work fine.
Changes
The issue started after unconfiguring and configuring cluster.
Cause
In this customer case, there had been a cluster "unconfigure". The result of this was resources owned by "head 1" were transferred to "head 2" when "head 1" was unconfigured out of the cluster (as part of a motherboard replacement).
Replication makes use of the notification subsystem to :
- know when replication is finished -- target calls back to source to say "replication is done", as replication is really driven by the target.
- if the target reboots, this callback never happens, so the source needs to be able to check with the target.
This means when we start a new replication update, the source contacts the targets and passes a "notification cookie" across.
When the target has finished, it calls back to the source and uses the cookie to say which replication has finished.
If the target never calls back, then the source will timeout and call the target to say "do you still have the cookie ?"
When calling back to the IP address of the replication interface on the source there's an additional bit of validation, which is the target checks the source is who it expects. However, as ownership of the interface has changed, this test fails. The cookie is then thrown away on the target.
At this point replication has finished successfully, but the target has not been able to notify the source. The source then goes through the timeout cycle and calls the target, only to find the cookie is no longer there.
Solution
On source replication side (cluster configuration), transfer the ownership back :
- Go into the cluster screen : Configuration->CLUSTER
- In the BUI and selecting the target for :
This is an example, so change accordingly.
net/aggr1
net/aggr2
zfs/pool-0
- Once you've changed these 3 to be assigned to the other cluster node, hit APPLY. This will pop up a box asking if you want to failback. Hit "APPLY". This will will transfer the ownership of these resources to the other cluster node. From now on, replication notification should work fine.
References
@ <BUG:7121594> - UNABLE TO COLLECT EXTRA FILES
Attachments
This solution has no attachment