Sun Microsystems, Inc.  Sun System Handbook - ISO 4.1 October 2012 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1013440.1
Update Date:2012-03-26
Keywords:

Solution Type  Technical Instruction Sure

Solution  1013440.1 :   VTL - How to Failback Failed Server  


Related Items
  • Sun StorageTek VTL Plus Storage Appliance
  •  
  • Sun StorageTek VTL Storage Appliance
  •  
Related Categories
  • PLA-Support>Sun Systems>TAPE>Virtual Tape>SN-TP: VTL
  •  
  • .Old GCS Categories>Sun Microsystems>Storage - Tape>Tape Virtualization
  •  

PreviouslyPublishedAs
218808


Applies to:

Sun StorageTek VTL Plus Storage Appliance - Version: 1.0 - Build 1323 to 2.0 - Build 1656   [Release: 1.0 to 2.0]
Sun StorageTek VTL Storage Appliance - Version: 4.0 - Build 1221 to 4.0 - Build 1221   [Release: 4.0 to 4.0]
All Platforms
.
***Checked for relevance on 12-07-2011*** (dd-mm-yyyy)

Goal

  • What is the failback procedure
  • How to manually failback to failed server

Solution


  1. Check VTL server status via Consoile GUI:
    1. Proper failed state should show the failed server in RED and takeover server in BLACK.
    2. If either server shows YELLOW then it is likely that failover is suspended.  If this is the case then place a call to Oracle support.

  2. Check for the common causes of failovers, correct any issues found, and/or contact Sun Support for assistance (also refer to Troubleshooting server failover issues):
    • Check for any network issues (this is one of the most common reasons for failovers)
    • Check for storage connectivity issues (use VTL Console to check access to disk arrays are good)
    • Check out health of disk arrays (use SANtricity Recovery Guru to check for errors).

  3. Check the status of the failed VTL server.

    Note: The “heartbeat” monitor IP must be used to log into failed server, as the “virtual” IP has moved over to other server that is servicing failed server resources (if you don’t know heartbeat IP, look at GUI under Failover Info tab and it will list both servers IP info):

    1. Verify failed node was logged on to:
      # uname -a

    2. Verify that all processes are running, issue:
      # vtl status

    3. Check “FailOverStatus” status, issue:
      # sms –v

      Look for "FailOverStatus" in output. If status is "2 (Ready)", then the failed server is ready to be failed back to.

    4. Check IPs on both servers, issue:
      # ifconfig -a

      Verify “variable” IP (bge1:1 or e1000g0:1) has successfully moved to surviving server.

    5. Has failed server been rebooted?
      a) Depending on cause of failover, the failed server may have already been rebooted automatically, but verify the reboot was clean (review messages log). If unsure, reboot again to verify clean reboot (vtl stop, then ;sync;sync;reboot or init 6).

      b) If failback results in another failover multiple times (i.e., failover, failback and failed over again, etc.), AND/OR client resources did not failover to surviving server as expected, reboot the failed sever before attempting failback (even if it says 2(Ready)). This will ensure all resources are released to allow for clean failback.

  4. If above steps are correct/verified, then proceed with failback through the GUI.

    NOTE: If client interruption is not OK, wait until a maintenance window is available to failback (at times the clients do not failback successfully and have to be rebooted to reconnect with virtual devices).
      1. From Console GUI, right click on active server name, select Failover>Stop Takeover…
        • Popup message may appear with message:

        WARNING: The primary server is not in a healthy state for failback. If you still want to fail back to the primary server, please type the word YES to proceed. Otherwise, click cancel to exit.

        Type YES in box and click OK.

        Note: If GUI reports back, discovering servers, close Console and reconnect. This sometimes happens when Virtual IP is switched back to primary server.

      2. Failover can take awhile depending on the number of resources to failback. Can take up to 20 minutes to complete.
        • * If able to log back into failed server via Console GUI, server name is BLACK (no longer RED) and failover status is “Normal” (select failover folder in right panel to view status), then failback is complete.
        • * Also, a message in the Console GUI Event Log will say “Primary Server Restored”
        • * Verify/check again VTL processes (vtl status), failover status (sms -v) and IPs (ifconfig -a)

  5. If Failback through the Console GUI does not work, failback can be done via command line, by issuing the following commands

    NOTE: This step should only be done with direction from Oracle VTL support.

    1. From the active server (using it's Heartbeat IP), stop the failover module:
      # vtl stop fm

    2. From the failed server, verify failed server has taken back control, issue:
      # sms -v (may have to issue many times until it returns "1 (UP)")

    3. Once failed server is verified, from active server start the failover module:
      # vtl start fm

    4. Check failover status from the GUI. It should be “Normal”.

  6. If failback does not complete OR if RCA is required.

================================================================
================================================================

Example of sms -v output:

 
After failover, secondary server took over primary server, log into primary (using Heartbeat IP) and check FailOverStatus. “2(READY)” indicates problems resolved and ready for failback.

[root@failedvtlnode]# sms -v

Last Update by SM: Sun Apr 20 16:31:50 2008
Last Access by RPC: Sun Apr 20 16:31:50 2008

FailOverStatus: 2(READY)

Status of IPStor Server (Transport) : OK
Status of IPStor Server (Application) : OK
Status of IPStor Authentication Module : OK
Status of IPStor Logger Module : OK
Status of IPStor Communication Module : OK
Status of IPStor Self-Monitor Module : OK
Status of IPStor NAS Modules: OK(0)
Status of IPStor Fsnupd Module: OK
Status of IPStor ISCSI Module: OK
Status of IPStor BMR Module: OK( 0)
Status of FC Link Down : OK
Status of Network Connection: OK
Status of force up: 0
Broadcast Arp : NO
Number of reported failed devices : 0
NAS health check : NO
XML Files Modified : NO
IPStor Failover Debug Level : 0
IPStor Self-Monitor Debug Level : 0

Do We Need To Reboot Machine(SM): NO

Do We Need To Reboot Machine(FM): NO

Nas Started: NO


 

During normal operating status:

[root@activevtlnode]# sms -v

Last Update by SM: Sun Apr 20 16:31:50 2008
Last Access by RPC: Sun Apr 20 16:31:50 2008

FailOverStatus: 1(UP)

Status of IPStor Server (Transport) : OK
Status of IPStor Server (Application) : OK
Status of IPStor Authentication Module : OK
Status of IPStor Logger Module : OK
Status of IPStor Communication Module : OK
Status of IPStor Self-Monitor Module : OK
Status of IPStor NAS Modules: OK(0)
Status of IPStor Fsnupd Module: OK
Status of IPStor ISCSI Module: OK
Status of IPStor BMR Module: OK( 0)
Status of FC Link Down : OK
Status of Network Connection: OK
Status of force up: 0
Broadcast Arp : NO
Number of reported failed devices : 0
NAS health check : NO
XML Files Modified : NO
IPStor Failover Debug Level : 0
IPStor Self-Monitor Debug Level : 0
Do We Need To Reboot Machine(SM): NO
Do We Need To Reboot Machine(FM): NO
Nas Started: NO



Attachments
This solution has no attachment
  Copyright © 2012 Sun Microsystems, Inc.  All rights reserved.
 Feedback