[Return to Guide to failover cluster services] |
Virtualization and recovery of failover cluster nodes and services presents a high degree of complexity.
x360Recover does not directly support virtualization of a failover cluster node as a failover cluster with shared disk resources. Virtualized protected systems are presented as normal servers by default, with their shared disk resources present as local disks only.
It is possible, however, to manually recreate a failover cluster running as a set of virtual machines on the appliance using a combination of iSCSI export and Start VM, as detailed below.
Here we will briefly discuss the various failure scenarios and provide general guidelines and options for performing a recovery in each case.
Note: The recovery cases below assume that you are using Microsoft iSCSI Initiator for connectivity of your shared disk volumes.
If you are using hardware iSCSI or Fiber Channel adapters, or other solutions to provide access to your SAN disks, your options for recovering under various scenarios may be more complex.
Contact Axcient support if you require assistance performing any recovery.
Configuration of hardware-based disk sharing and SAN technologies is beyond the scope of this article and it is assumed that partners are well versed in the management and use of their own hardware deployment and platforms.
Single node failure
If a single server node within your failover cluster is compromised, the remaining nodes within the cluster should suffice to provide cluster services during the outage. (This is the purpose of failover clustering, after all!) Once you have repaired or replaced the faulted hardware, recover the server node as normal.
- For physical servers, perform a bare metal restore using the x360Recover Recovery Toolkit, which is available at Software downloads.
- For virtual servers, use the Export feature on the x360Recover appliance from the Protected Systems Details page. Select the latest snapshot of the protected system and click Export to generate a set of virtual disks. Copy the exported disk images to your hypervisor and create a new virtual machine using the exported operating system disk image. You may alternatively choose to use the bare metal restore option instead.
Once the server OS volume has been repaired, boot the system and let it rejoin the failover cluster normally.
Multiple node failure
If multiple nodes have become compromised and your remaining failover cluster nodes are insufficient to continue services, but your shared disk infrastructure is still operational, you may virtualize one or more protected systems on the appliance to recover your failover cluster nodes.
- Use StartVM to boot the most recent snapshot of each node you intend to virtualize.
- Since the backup image contains copies of your network attached disks as local disks, you will need to remove or hide these disks within the virtual machine so that they do not conflict with the existing network storage volumes still present on your SAN.
- Use Disk Manager to disable and set offline any of the cluster disk volumes that are part of the backup and that appear as local disks.
- Assuming your production server nodes are using Microsoft iSCSI Initiator to connect your live shared volumes on your SAN, the running virtual servers should be able to reconnect once their original network IP address has been restored.
- Failover cluster operations should resume normally once enough nodes become available to host services.
Shared storage becomes lost or corrupted
If one or more shared disks or Cluster Shared Volumes become compromised or corrupted but the underlying SAN server is still operational, you can recover disk images from x360Recover using the x360Recover Recovery Toolkit and the Disk Copy utility.
Follow these steps:
Step 1. Boot a Windows system on the same LAN as both the SAN server and the x360Recover BDR using the x360Recover Recovery Toolkit, which can be downloaded at Software downloads.
Step 2. If you intend to use the failover cluster node for recovery, first migrate all cluster services that are still operational off the affected server node onto other nodes within the cluster. Otherwise, you may use any system on the same network as the SAN and backup server.
Step 3. Once the Recovery Toolkit environment has been booted, connect the environment to both (A) the x360Recover backup image you wish to recover as well as (B) the SAN storage volume you are recovering to.
a) iSCSI Start the protected system snapshot containing the volume(s) you are recovering
b) Use the iSCSI Manager utility on the Recovery menu to attach to the BDR iSCSI volumes
c) Make note of which device paths have been assigned to the Source disks
d) Use the iSCSI Manager utility on the Recovery menu to attach to the SAN iSCSI volumes
e) Make note of which device paths have been assigned to the TARGET disks
f) Use the Disk Copy utility to image SOURCE to TARGET disks
Step 4. Once the disk copy operation has completed, shut down the Recovery Toolkit environment and bring your restored shared volumes back online within the failover cluster.
Storage Server (SAN) failure
If your entire SAN server has failed or been destroyed but your failover cluster nodes are still operational, the x360Recover BDR device can act as a replacement storage server while you make repairs to your original SAN hardware.
Step 1. Use iSCSI Start to export the snapshot(s) containing the most recent backup image of your affected shared disk volumes.
- This may require starting iSCSI on multiple protected systems to make the most recent copy of each shared disk accessible.
- This may also require you to selectively connect individual iSCSI target LUNs when multiple snapshots are exported by the BDR.
Step 2. On each node within the failover cluster, reconfigure your shared disks to use the x360Recover BDR instead of the SAN for storage communications.
- Remove the unavailable disks from cluster nodes and roles
- Remove iSCSI connections to the disk LUNs stored on the failed SAN
- Add iSCSI connections to the disk LUNs exported by the x360Recover BDR
- Selectively connect the correct Disk LUN’s (if necessary)
- Add the disk LUNs provided by x360Recover to the cluster and reconfigure services
Step 3. Once you have repaired or replaced your failed storage server hardware, recreate empty disk LUNs on your new storage server matching the original volumes and then perform a disk by disk recovery as described in the section above.
Step 4. Once the disk images have been recovered on the SAN, reverse the process above to transition the cluster nodes back to the SAN disk volumes and shut down iSCSI exports on the x360Recover BDR.
Site-wide disaster recovery
If your entire local infrastructure is destroyed and you need to recover both your server nodes and your network storage simultaneously (either from the local BDR or in the cloud), you can use a combination of the above two sections to manually recreate your failover cluster environment entirely on the x360Recover BDR.
Step 1. Recover enough failover cluster nodes to support required services by virtualizing it. Select an older, recent snapshot of each node that does not contain the latest backup of any shared disk resources.
Step 2. Remove local copies of shared disks using Disk Manager to set them offline.
Step 3. iSCSI Export the most recent snapshot of each protected system containing shared disk resources that need to be recovered.
Step 4. Reconfigure iSCSI Initiator on each virtual machine to remove failed SAN disks and add disks from the iSCSI shares hosted by the BDR.
Step 5. Reconfigure your failover cluster configuration to replace failed disks with instances hosted on the BDR and enable services and roles.
Step 6. Once the failed hardware has been replaced, perform recovery of the node servers and SAN disks as detailed in the sections above.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Contact Axcient support if you require assistance performing any recovery.
Configuration of hardware-based disk sharing and SAN technologies is beyond the scope of this article and it is assumed that partners are well versed in the management and use of their own hardware deployment and platforms.
SUPPORT | 720-204-4500 | 800-352-0248
- To learn more about any of our Axcient products, sign up for free one-on-one training.
- Please contact your Partner Success Manager or Support if you have specific technical questions.
- Subscribe to the Axcient Status page for a list of status updates and scheduled maintenance.