VTL - How to Failback Failed Server

Asset ID:	1-71-1013440.1
Update Date:	2012-03-26
Keywords:

Solution Type Technical Instruction Sure

Solution 1013440.1 : VTL - How to Failback Failed Server

Applies to:

Sun StorageTek VTL Plus Storage Appliance - Version: 1.0 - Build 1323 to 2.0 - Build 1656 [Release: 1.0 to 2.0]
Sun StorageTek VTL Storage Appliance - Version: 4.0 - Build 1221 to 4.0 - Build 1221 [Release: 4.0 to 4.0]
All Platforms
.
***Checked for relevance on 12-07-2011*** (dd-mm-yyyy)

Goal

What is the failback procedure
How to manually failback to failed server

Solution

Check VTL server status via Consoile GUI:

Proper failed state should show the failed server in RED and takeover server in BLACK.
If either server shows YELLOW then it is likely that failover is suspended. If this is the case then place a call to Oracle support.

Check for the common causes of failovers, correct any issues found, and/or contact Sun Support for assistance (also refer to Troubleshooting server failover issues):
- Check for any network issues (this is one of the most common reasons for failovers)
- Check for storage connectivity issues (use VTL Console to check access to disk arrays are good)
- Check out health of disk arrays (use SANtricity Recovery Guru to check for errors).
Check the status of the failed VTL server.

Note: The “heartbeat” monitor IP must be used to log into failed server, as the “virtual” IP has moved over to other server that is servicing failed server resources (if you don’t know heartbeat IP, look at GUI under Failover Info tab and it will list both servers IP info):
1. Verify failed node was logged on to:
  # uname -a
2. Verify that all processes are running, issue:
  # vtl status
3. Check “FailOverStatus” status, issue:
  # sms –v
  
  Look for "FailOverStatus" in output. If status is "2 (Ready)", then the failed server is ready to be failed back to.
4. Check IPs on both servers, issue:
  # ifconfig -a
  
  Verify “variable” IP (bge1:1 or e1000g0:1) has successfully moved to surviving server.
5. Has failed server been rebooted?
  a) Depending on cause of failover, the failed server may have already been rebooted automatically, but verify the reboot was clean (review messages log). If unsure, reboot again to verify clean reboot (vtl stop, then ;sync;sync;reboot or init 6).
  
  b) If failback results in another failover multiple times (i.e., failover, failback and failed over again, etc.), AND/OR client resources did not failover to surviving server as expected, reboot the failed sever before attempting failback (even if it says 2(Ready)). This will ensure all resources are released to allow for clean failback.
If above steps are correct/verified, then proceed with failback through the GUI.

NOTE: If client interruption is not OK, wait until a maintenance window is available to failback (at times the clients do not failback successfully and have to be rebooted to reconnect with virtual devices).
If Failback through the Console GUI does not work, failback can be done via command line, by issuing the following commands

NOTE: This step should only be done with direction from Oracle VTL support.
1. From the active server (using it's Heartbeat IP), stop the failover module:
  # vtl stop fm
2. From the failed server, verify failed server has taken back control, issue:
  # sms -v (may have to issue many times until it returns "1 (UP)")
3. Once failed server is verified, from active server start the failover module:
  # vtl start fm
4. Check failover status from the GUI. It should be “Normal”.
If failback does not complete OR if RCA is required.
- Collect Xrays (both nodes) and open a case with Oracle VTL support.

================================================================
================================================================

Example of sms -v output:

After failover, secondary server took over primary server, log into primary (using Heartbeat IP) and check FailOverStatus. “2(READY)” indicates problems resolved and ready for failback.

[root@failedvtlnode]# sms -v

Last Update by SM: Sun Apr 20 16:31:50 2008
Last Access by RPC: Sun Apr 20 16:31:50 2008

FailOverStatus: 2(READY)

Status of IPStor Server (Transport) : OK
Status of IPStor Server (Application) : OK
Status of IPStor Authentication Module : OK
Status of IPStor Logger Module : OK
Status of IPStor Communication Module : OK
Status of IPStor Self-Monitor Module : OK
Status of IPStor NAS Modules: OK(0)
Status of IPStor Fsnupd Module: OK
Status of IPStor ISCSI Module: OK
Status of IPStor BMR Module: OK( 0)
Status of FC Link Down : OK
Status of Network Connection: OK
Status of force up: 0
Broadcast Arp : NO
Number of reported failed devices : 0
NAS health check : NO
XML Files Modified : NO
IPStor Failover Debug Level : 0
IPStor Self-Monitor Debug Level : 0

Do We Need To Reboot Machine(SM): NO

Do We Need To Reboot Machine(FM): NO

Nas Started: NO

During normal operating status:

[root@activevtlnode]# sms -v

Last Update by SM: Sun Apr 20 16:31:50 2008
Last Access by RPC: Sun Apr 20 16:31:50 2008

FailOverStatus: 1(UP)

Attachments

This solution has no attachment