![]() | Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Troubleshooting Sure Solution 1403503.1 : Sun Storage 7000 Unified Storage System: A cluster node fails to rejoin the cluster
This document is provided to assist in troubleshooting cluster join issues where one node of a cluster, following a reboot, fails to rejoin the cluster. In this Document
Applies to:Sun ZFS Storage 7420 - Version: Not ApplicableSun Storage 7310 Unified Storage System - Version: Not Applicable and later [Release: N/A and later] Sun Storage 7410 Unified Storage System - Version: Not Applicable and later [Release: N/A and later] Sun ZFS Storage 7320 - Version: Not Applicable and later [Release: N/A and later] 7000 Appliance OS (Fishworks) NAS head revision : [not dependent] BIOS revision : [not dependent] ILOM revision : [not dependent] JBODs Model : [not dependent] CLUSTER related : [yes] PurposeThis document is provided to assist in troubleshooting cluster join issues where one node of a cluster, following a reboot, fails to rejoin the cluster.To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - 7000 Series ZFS Appliances
Last Review DateFebruary 1, 2012Instructions for the ReaderA Troubleshooting Guide is provided to assist
in debugging a specific issue. When possible, diagnostic tools are included in the document
to assist in troubleshooting.
Troubleshooting DetailsWhen one cluster node fails to join the cluster the problem is often due to loading on the working cluster node resulting in slow communications between the nodes or sometimes due to cluster-wide locking issues on the working node.For the former issue simply leaving the system to attempt the rejoin operation may be sufficient and eventually the join operation may complete successfully. In the latter case it is unlikely the second node will manage to rejoin the cluster and this document attempts to provide a workaround for this particular issue. Note: If you wish to know the cause for the node's failure to rejoin the cluster then please contact Oracle Support so they can collect additional diagnostic information in order to determine the underlying cause of the failure. If you wish to try to resolve the issue yourself then please follow these steps. Step 1. Power down the node that is failing to join the cluster. The node must be powered off to ensure the cluster interconnect is offline. Simply shutting-down the node is not sufficient in this case. Step 2. Restart the management service (called akd) on the working node Connect to the working node and issue the following CLI command: Step 3 Wait for the system to restart the management interfaces and resume normal operation It may take several minutes for the management services to fully initialize. Once you have regained access to the Admin BUI or CLI check that the system is working correctly. Step 4 Power on the second cluster node At this point the system should be working correctly with all resources available from the single working node. We can power on the remaining node and this time it should rejoin the cluster successfully. Step 5 Check the system is working as a cluster From the Admin BUI you can check the status from the Configuration -> Cluster page. Note: If the cluster node still fails to join the cluster then further investigation will be required. Please contact Oracle Support so they can collect additional diagnostic information in order to determine the underlying cause of the failure. Attachments This solution has no attachment |
||||||||||||
|