Asset ID: |
1-75-1402545.1 |
Update Date: | 2012-02-21 |
Keywords: | |
Solution Type
Troubleshooting Sure
Solution
1402545.1
:
Sun Storage 7000 Unified Storage System: How to Troubleshoot Cluster Problems
Related Items |
- Sun Storage 7410 Unified Storage System
- Sun ZFS Storage 7320
- Sun Storage 7310 Unified Storage System
- Sun ZFS Storage 7420
|
Related Categories |
- PLA-Support>Sun Systems>DISK>NAS>SN-DK: 7xxx NAS
- .Old GCS Categories>Sun Microsystems>Storage - Disk>Unified Storage
|
This document is provided to assist in troubleshooting cluster issues on the ZFS Storage Appliance
In this Document
Purpose
Last Review Date
Instructions for the Reader
Troubleshooting Details
Identifying the problem
Setting-up the Cluster
Applies to:
Sun ZFS Storage 7420 - Version: Not Applicable and later [Release: N/A and later ] Sun Storage 7310 Unified Storage System - Version: Not Applicable and later [Release: N/A and later] Sun Storage 7410 Unified Storage System - Version: Not Applicable and later [Release: N/A and later] Sun ZFS Storage 7320 - Version: Not Applicable and later [Release: N/A and later] 7000 Appliance OS (Fishworks)
NAS head revision : [not dependent]
BIOS revision : [not dependent]
ILOM revision : [not dependent]
JBODs Model : [not dependent]
CLUSTER related : [yes]
Purpose
This document is provided to assist in troubleshooting cluster issues. It will help to frame the problem, identifies some known issues and provides some guidelines to obtain a stable clustered system. This document has been written as a resolution path, each step giving links to other specific documents.
Last Review Date
January 31, 2012
Instructions for the Reader
A Troubleshooting Guide is provided to assist
in debugging a specific issue. When possible, diagnostic tools are included in the document
to assist in troubleshooting.
Troubleshooting Details
Note: For any cluster issues that are not addressed by this document please contact Oracle Support for assistance in diagnosing the issue and be prepare that remote access maybe require. Ref: Oracle Shared Shell Document 1194226.1
Identifying the problemThe following sections address cluster issues based on when the issues are observed during the clustering life-cycle (i.e. creating the initial cluster, operating a cluster, removing nodes from a cluster.)
If you are experiencing a cluster problem: - during the initial configuration of the cluster then see the section Setting-up the Cluster - during normal cluster operations then see Problems during normal cluster operation - while removing clustering then see Removing a node from a cluster For the initial configuration steps see: 'Sun Storage 7000 Unified Storage System: How to set up NAS clustering' <Document 1329307.1>
INTERNAL: FOR TSC USE
For a system that is clustered but where the cluster is to be reset or rebuilt, see:
'Sun Storage 7000 Unified Storage System: How to factory reset a cluster node without downtime' <Document 1174473.1>
This section describes some common cluster issues that may be observed during normal operations.
1. A cluster node fails to join the cluster <Document 1403503.1>
2. A node reboots following a take-over or fail-back operation
This is indicative of a resource issue that has been recognized by the cluster node that is attempting to acquire its resources from the main node. Examples are network interfaces that are not operational on the second node so the node would be unable to provide a data service following the cluster operation. In this case the node will automatically reboot itself and thereby force the cluster resources to remain on the working node. Following an automatic reboot such as this, be sure to check network cables connecting the node to the network switches, and SAS cables connecting the node to shelves.
3. The Admin BUI does not respond when the Configuration:Clustering page is selected. This can be caused by loading issues within the management service (the akd service). If the system is busy performing a lengthy operation then it may not respond to some menu selections until the operation has completed. In case of some deletion operations, this may take several minutes. In case of large snapshot deletions, it may take even several hours. This is not necessarily a cluster issue but a management interface issue.
Note: For any other cluster issues please contact Oracle Support who will work with you in resolving the issue.
To remove a node from a cluster or to unconfigure clustering : 1. Power off the node to be removed from the cluster
2. From the remaining node, in the Admin BUI navigate to the Configuration -> Cluster page. Press the <Unconfig> button to remove the cluster configuration.
3. Detach the cluster interconnect cables and detach the
powered-off storage controller from the cluster's external storage
enclosures (shelves).
At this point both of the ZFS SA nodes will operate independently.
INTERNAL: FOR TSC USE
If the Admin BUI is inoperative then it is possible to unconfigure clustering from the CLI using the raw command:
> raw cluster.unconfigure();
see also:
'Sun Storage 7000 Unified Storage System: How to factory reset a cluster node without downtime' <Document 1174473.1>
Configuration GuidelinesThere are additional items to consider when configuring nodes to form a clustered system. For example, how to distribute the data pools and network interfaces between nodes to balance the loading on both nodes.
Oracle recommend that one network interface be dedicated on each node for use as a management interface. In this case the interface is marked as a private resource for the single node.
For more information on Clustering see the online Help pages available from the Admin BUI. You can navigate to the Configuration-> Cluster page and then press the Help word located in the top right-hand corner of the page - this will open the help pages to the cluster context. Alternatively, simply press the Help word located in the top right-hand corner of the pageto display the main help page and then navigate to Configuration and Cluster.
Other considerations Some cluster-wide resources need special attention when transitioning from one node to the other. For example, SCSI & FC LUN resources need support from the clients themselves: the clients will need to support ALUA for their FC LUNs.
Some client systems require additional configuration if they themselves are also members of a cluster. For example, for some notes on configuring Solaris Cluster see: Sun Storage 7000 Unified Storage System: Configuring the ZFS Storage Appliance to work in Oracle Solaris Cluster' <Document 1380870.1>
Terms & Definitions
Cluster : With the ZFS Storage Appliance the term cluster
is used to denote a system comprising two identical ZFS SA nodes
accessing shared storage and with access to a common network
infrastructure. In the event of a node failure the resources and
services of the failed node will be taken by the remaining working node
and the services will continue to be provided to clients and users by
that node.
- Cluster types
- active-active : a cluster in which the resources are shared between the two nodes and each provides services to clients.
- active-passive : a cluster in which one node performs most of the work
while the second node remains idle until there is a failure of the
active node at which point the passive node resumes operation as the now
active node.
- Cluster States
- AKCS_CLUSTERED : Both nodes are running in normal condition sharing resources.
- AKCS_OWNER : One node in the cluster owns all of the shared cluster resources
- AKCS_STRIPPED : One node
has joined the cluster but does not own any cluster resources (the node
is waiting for the administrator to perform a fail-back operation)
- Cluster operations
- Take over : following a node failure the remaining node takes over the resources from the failed node.
- Fail back : once a failed node has been repaired and joined the cluster the node waits for the Administrator to fail-back the node's resources from the main node (which owns all of the cluster resources). On completion on the fail back operation both nodes will be operating in a fully clustered mode (active-active).
- Shutdown : see: 'Sun Storage 7000 Unified Storage System: How To Shutdown ZFSSA Cluster' <Document 1379117.1>
ReferencesCollecting Diag data : 'Sun Storage 7000 Unified Storage System: How to collect support bundle using the BUI or CLI' <Document 1019887.1>
Online Help is available in the Admin BUI under the section: Configuration:Cluster
Sun ZFS Storage 7000 System Administration Guide http://download.oracle.com/docs/cd/E22471_01/html/820-4167 - see the section on Clustering.
Attachments
This solution has no attachment
|