Asset ID: |
1-71-1013440.1 |
Update Date: | 2009-12-02 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1013440.1
:
VTL - How to Failback Failed Server
Related Items |
- Sun StorageTek VTL Storage Appliance
- Sun StorageTek VTL Plus Storage Appliance
|
Related Categories |
- GCS>Sun Microsystems>Storage - Tape>Tape Virtualization
|
PreviouslyPublishedAs
218808
DescriptionWhat is the failback procedure
How to manually failback to failed server
Steps to FollowHow to Failback Failed Server:
- First, check for the
common causes of failovers, correct any issues found, and/or contact
Sun Support for assistance:
Check for any network
issues (this is one of the most common reasons for failovers)
Check for storage
connectivity issues (use VTL Console to check access to disk arrays
are good)
Check out health of disk
arrays (use SANtricity Recovery Guru to check for errors).
- Proper failed state should show,
from GUI, the failed server in RED and
takeover server in BLACK.
- If either server shows YELLOW
then it is likely that failover is suspended. If this is the case
then place a call to Sun support.
Check
the status of the failed VTL server.
Note: The “heartbeat” monitor IP must be used to log into
failed server, as the “virtual” IP has moved over to other server
that is servicing failed server resources (if you don’t know
heartbeat IP, look at GUI under Failover Info tab and it will list
both servers IP info):
Verify
that all processes are running, issue:
# vtl status
Check “FailOverStatus”
status, issue:
# sms –v
Look for
"FailOverStatus" in output. If status is "2
(Ready)", then the failed server is ready to be failed back
to. Use the GUI to stop takeover and get back to normal.
Check IPs on both
servers, issue:
# ifconfig -a
Verify
“variable” IP (bge1:1) has successfully moved to surviving
server.
Has failed server been
rebooted cleanly?
Depending
on cause of failover, the failed server may have already been
rebooted, but verify the reboot was clean (review messages log).
If unsure, reboot again to verify clean reboot (vtl stop, then
;sync;sync;reboot or init 6).
NOTE:
If client interruption is not OK, wait until a maintenance window is
available to failback (at times the clients do not failback
successfully and have to be rebooted to reconnect with
virtual devices).
If above steps are correct,
then proceed with failback through the GUI.
From Console GUI, right
click on active server name, select Failover>Stop Takeover…
- Popup message may appear with
message:
WARNING: The primary server is not in a
healthy state for failback. If you still want to fail back to the
primary server, please type the word YES to proceed. Otherwise,
click cancel to exit.
Type YES in box and click
OK.
Note: If
GUI reports back, discovering servers, close Console and
reconnect. This sometimes happens when Virtual IP is switched
back to primary server.
Failover can take awhile
depending on the number of resources to failback. Can take up to
20 minutes to complete.
- If able to log back into failed
server via Console GUI, server name is BLACK (no longer RED)and failover status is “Normal” (select failover folder
in right panel to view status), then failback is complete.
-
Also, a message in the Console GUI
Event Log will say “Primary Server Restored”
-
Verify/check again VTL processes
(vtl status), failover status (sms -v) and IPs (ifconfig -a)
If Failback through the Console
GUI does not work, failback can be done via command line, by issuing
the following commands
From the active
server (using it's Heartbeat IP), stop the failover module:
# vtl stop fm
From the failed
server, verify failed server has taken back control, issue:
# sms -v (may have to
issue many times until it returns "1 (UP)")
Once failed server is verified,
from active server start the failover module:
# vtl start fm
Check
failover status from the GUI. It should be “Normal”.
If failback does not complete
OR if RCA is required.
- Collect Xrays (both nodes) and
open a case with Sun support.
================================================================
================================================================
Example of sms -v output:
After failover, secondary
server took over primary server, log into
primary (using Heartbeat IP) and check
FailOverStatus. “2(READY)” indicates problems resolved and ready
for failback.
[root@failedvtlnode]# sms
-v
Last Update by SM: Sun Apr
20 16:31:50 2008
Last Access by RPC: Sun
Apr 20 16:31:50 2008
FailOverStatus:
2(READY)
Status of IPStor Server
(Transport) : OK
Status of IPStor Server
(Application) : OK
Status of IPStor
Authentication Module : OK
Status of IPStor Logger
Module : OK
Status of IPStor
Communication Module : OK
Status of IPStor
Self-Monitor Module : OK
Status of IPStor NAS
Modules: OK(0)
Status of IPStor Fsnupd
Module: OK
Status of IPStor ISCSI
Module: OK
Status of IPStor BMR
Module: OK( 0)
Status of FC Link Down :
OK
Status of Network
Connection: OK
Status of force up: 0
Broadcast Arp : NO
Number of reported failed
devices : 0
NAS health check : NO
XML Files Modified : NO
IPStor Failover Debug
Level : 0
IPStor Self-Monitor Debug
Level : 0
Do
We Need To Reboot Machine(SM): NO
Do
We Need To Reboot Machine(FM): NO
Nas Started: NO
During normal operating status:
[root@activevtlnode]# sms
-v
Last Update by SM: Sun Apr
20 16:31:50 2008
Last Access by RPC: Sun
Apr 20 16:31:50 2008
FailOverStatus:
1(UP)
Status of IPStor Server
(Transport) : OK
Status of IPStor Server
(Application) : OK
Status of IPStor
Authentication Module : OK
Status of IPStor Logger
Module : OK
Status of IPStor
Communication Module : OK
Status of IPStor
Self-Monitor Module : OK
Status of IPStor NAS
Modules: OK(0)
Status of IPStor Fsnupd
Module: OK
Status of IPStor ISCSI
Module: OK
Status of IPStor BMR
Module: OK( 0)
Status of FC Link Down :
OK
Status of Network
Connection: OK
Status of force up: 0
Broadcast Arp : NO
Number of reported failed
devices : 0
NAS health check : NO
XML Files Modified : NO
IPStor Failover Debug
Level : 0
IPStor Self-Monitor Debug
Level : 0
Do We Need To Reboot
Machine(SM): NO
Do We Need To Reboot
Machine(FM): NO
Nas Started: NO
ProductSun StorageTek Virtual Tape Library Storage Appliance
Sun StorageTek Virtual Tape Library Plus Storage Appliance 1.0
Sun StorageTek Virtual Tape Library Plus Storage Appliance 2.0
VTL, Failover, Failback
Previously Published As
STKKB68135
Change History
Updated for currency...
Attachments
This solution has no attachment