Skip to main content
Best Practices for Disaster Recovery as a Service
Updated over 5 months ago

This article describes the best practices that must be following while using Disaster Recovery.

Table of contents

Druva AWS proxy

Druva AWS proxy, also referred to as DR proxy, is an EC2 instance that runs in the customer’s AWS account. The Druva AWS proxy runs the Disaster Recovery service and is responsible for orchestrating the DR Restore, DR failback, and DR failover. The DR proxy is deployed using the AWS CloudFormation template. The DR proxy deployment takes less than 10 minutes.

  • Druva recommends that you deploy at least two DR proxies in separate availability zones for high availability.


📝 Note
Each DR proxy can run three DR restore jobs concurrently.


  • The recommended EC2 instance size for the Druva AWS proxy is c5.2xlarge.

Instance type

vCPU

Memory(GiB)

Instance Storage(GiB)

Network Bandwidth (Gbps)

EBS Bandwidth( Mbps)

c5.2xlarge

8

16

EBS-Only

Upto 10

Upto 4,750

  • The DR proxy must have access to the following services:

    • S3,

    • EC2-API, and

    • SQS

Druva CloudFormation template creates endpoints that provide connectivity to these services over AWS private network.

AWS services.png

VPC

While defining networking mappings in a DR plan, we need you to map the vCenter source network to a VPC and subnet on the target AWS account.

  • If you create a new Amazon VPC, you don’t need to attach an Internet Gateway(IGW) to it, as the Druva AWS proxy uses the AWS private link for all communication.

  • Ensure that DNS hostnames and DNS resolution are enabled within the VPC.

    AWS DNS hostname and resolution.png

📝 Note
The Druva AWS proxy can be deployed in a customer VPC that has DNS resolution enabled and provided by an Amazon-owned DNS server. In situations where DNS resolution is disabled or provided by a third-party-owned DNS server, we recommend deploying a new VPC dedicated to the Druva AWS proxy. This newly deployed VPC must have:

  • DNS resolution enabled.

  • DNS server configured in the DHCP option should be set to an Amazon-provided DNS server, not a custom DNS server.


DR Failover Checks - Guest OS

DR Failover Checks - Guest OS run while the VM backup is in progress and ensures that the VM meets all the DR failover and failback requirements. Ensure that all the DR Failover Checks - Guest OS are successful for a successful DR failback or failover.

When the DR Failover Checks - Guest OS checks do not execute

The DR Failover Checks - Guest OS may not execute at all for one or more of the following reasons:

  • The VMware Backup proxy is unable to communicate with the ESX host on port 443. Enable communication between the backup proxy and the ESX host on port 443.

  • The VMware Backup proxy is on a version older than 4.8.11. Upgrade the VMware backup proxy to the latest version, and ensure that the first VM backup after the proxy upgrade is successful.

  • The DR Failover Checks - Guest OS may not execute at all if the VM cannot connect to Druva download portal at https://downloads.druva.com/phoenix/ to download the DR Failover Checks - Guest OS executables while the VM backup is in progress. If the VM is unable to connect to Druva download portal and download the DR Failover Checks - Guest OS executables, ensure that:

Exclude the DR Failover Checks - Guest OS executables from any antivirus software running on the VM. The following table lists the DR Failover Checks - Guest OS executables that must be excluded depending upon the VM operating system.

Operating system

Prerequisite check executable

Windows

PhoenixPreflight_<version number>.exe

Linux

PhoenixPreflight_<version number>

Resolving DR Failover Checks - Guest OS errors

If the DR Failover Checks - Guest OS fail or pass with warnings, resolve the errors or warnings before re-running the backup job.

  1. Credentials: Ensure that the VMs whose disaster recovery you want to perform have credentials assigned to them. If credentials are not assigned to virtual machines or are invalid, Druva will not perform prerequisite checks. You can either assign credentials to the VMs from the VMware page, or the Disaster Recovery page.
    The user account must have the following privileges:

Windows virtual machines

  • The account must have local administrative privileges.

  • UAC must be disabled on the virtual machine. See disabling UAC on Windows server for more information.

Linux virtual machines

  • A non-root user must have sudo rights and must have the NOPASSWD: ALL tag enabled in the sudoers file. Edit the sudoers file and ensure that the non-root user has the following entry at the end:

username ALL=(ALL) NOPASSWD: ALL

Where username, is the username that can execute all commands without prompting for a password.

Verifying permissions

Login to the Linux machine using the user account that needs to be tested.

Execute the sudo -l command. If the user has sudo privileges and the NOPASSWD: ALL tag has been enabled in the sudoers file, the command will generate the following output without prompting for a password.

Sudo privilege with nopassword.png


If the user does not have sudo privileges or does not have the NOPASSWD: ALL tag enabled in the sudoers file, the command will generate the following output and will prompt for a password.

No sudo privilege.png
  • The directory /home/{username} must exist, and the non-root user must have read, write, and execute ( RWX ) permissions over this directory.

While a VM backup is in progress, the prerequisite checks use the working directory /home/{username}/Druva/Phoenix/Preflight for non-root users and the directory /home/{PreflightBinaryName}/Druva/Phoenix/Preflight for root users. Once the prerequisite checks are complete, Druva deletes the directories that it created under /home/{username} for non-root users or /home/{PreflightBinaryName} for root users.

2. Virtual Machines

  1. The VM must be running for the prerequisite check to work.

  2. The VM must have VMware tools installed on it.

  3. The VM must have at least 1 GB of free space on the boot partition.

  4. Ensure that all Druva processes are whitelisted in any antivirus software running on the virtual machine.

Here are all the 14 Windows files that must be whitelisted:

C:\Windows\System32\systeminfo.exe

C:\Druva\Vmtools\Ec2Install\Ec2Install.exe

C:\Druva\Vmtools\Citrix_xensetup.exe

C:\Druva\Vmtools\dotnetfx45.exe

C:\Druva\Vmtools\AWSPVDriverSetup8.2.1.msi

C:\Druva\Vmtools\dotNetFx40_Full_x86_x64.exe

C:\Druva\Vmtools\Ec2Install\AmazonSSMAgentSetup.exe

C:\Druva\Vmtools\XenGuestAgent.exe

C:\Druva\Vmtools\wic_x86_enu.exe

C:\Druva\Vmtools\wic_x64_enu.exe

C:\Druva\Vmtools\WiXEC2ConfigSetup_64.msi

C:\Druva\Model\cli.exe

C:\Druva\Model\run_model.bat

C:\Druva\Service\rmservice.exe

Here are all the Linux files that must be whitelisted: (The /opt/druva files are installed by Druva as part of the DR Failover operation)

/opt/druva/rm_startup.sh

/opt/druva/cli

/opt/druva/run_model.sh

/opt/druva/upload_logs.sh

/etc/rc.local

/etc/init.d/after.local

Add virtual machines to DR plan

A DR plan includes a group of virtual machines, the DR restore frequency and all the disaster recovery settings that help you perform a single click failover.

  1. When a VM is added to a DR plan, Druva automatically assigns a few default failover settings. The default settings are:

    1. instance_type = t2.medium

    2. public_ip = None

    3. private_ip = Auto Assign
      These settings can be used to spin up the VM from the DR copy in case of a failover. You can update these settings based on source VM configuration for optimum failover times.

  2. While configuring failover settings for VMs added to the DR plan, ensure that the instance type is not smaller than the virtual machine you are trying to failover. You can also use the auto-suggest instance type feature to let Druva choose the appropriate instance type.

    📝Note



    We've discontinued support for t2.micro and t2.small EC2 instance types for DR failovers. These instance types are not available for manual instance type selection or instance auto-assignment.


  3. Ensure that the Recovery Point Actual (RPA) does not exceed the backup frequency duration. RPA is the time elapsed since the last successful VM recovery point that is available for failover. For more information, see Managing Recovery Point Actual.

DR restore

DR restore (also referred to as DR copy) is the process where the Druva AWS proxy reads the VM backup data from Druva Cloud, replicates it to an EBS volume in the customer's AWS account, and creates an EBS snapshot of the EBS volume. The frequency with which the data is replicated is defined in the DR plan.

  • Ensure that the retention period for backups of large virtual machines is longer than the time it can take to create the first full DR copy, that is, transfer the VM backup data from Druva Cloud to the customer AWS account. The first DR restore can take longer. Subsequent incremental DR restores are faster.

DR Failover

Failover is the process where the DR proxy creates an EC2 instance in the customer’s EC2 account, creates an EBS volume from the EBS snapshot, attaches it to the EC2 instance, and finally spins up the instance after redirecting the network traffic to the IP addresses of the EC2 servers. A Linux VM failover can take between 15 to 30 minutes on average, while a Windows VM failover can take between 45-75 minutes. A failover can complete within the stipulated time provided the E2 instance type that is spawned from the EBS snapshot is the same type and size as the source virtual machine. Ensure that the DR failover checks are successful before initiating a production or test failover. The DR failover checks preemptively flag issues that can cause the failover jobs to fail. Fixing identified issues proactively ensures that your actual failovers are successful. For more information, see DR failover checks - environment and DR failover checks - Guest OS.

Test Failover

Druva recommends using the Test Failover option to periodically test VM failovers. You specify the production and test failover settings while creating the DR plan. As part of Test Failover Settings, you specify the instance type, the IAM role, Volume Type and Instance Tags. You can also use the same failover settings as used in Production.

On the Disaster Recovery page, select the DR Plan. On the Overview Page, click Failover > Test Failover. For more information, see Manage disaster recovery failover.

Failback

When you initiate a DR failback, the VMware backup proxy creates a target VM in the on-premise infrastructure. This target VM connects to the failed over EC2 instance and copies the data onto itself. Druva then boots up this VM.

  • Ensure that the target virtual machine in your on-premise environment to which you will failback has connectivity to the EC2 instance.

  • Ensure that the target virtual machine in your on-premise environment used for failback is reachable from the VMware backup proxy.

  • Ensure that the following ports are open on the target virtual machine:

    • Linux: Port 22 for SSH

    • Windows: Ports 445 (Used for preflight checks and control messaging) and 50000 (Used for actual data transfer in failback operation).


📝 Note
You must manually enable the SMB port for communication. See, DR8263.


  • Ensure that the administrative shares of the source EC2 instance are reachable before attempting a failback. For more information, see error DR8263 and its resolution.

Before initiating a production DR failback job, we recommend running the DR Failback Checks to ensure that your AWS environment and the destination VMware environment do not have any issues that can cause the production DR failback jobs to fail. For more information, see DR Failback Checks.

Billable AWS services

The following AWS services are deployed in your AWS account during the Druva AWS proxy deployment and are billable.

  1. The Amazon EC2 instance type (c5.2xlarge - recommended) used for the Druva AWS proxy.

  2. The following AWS VPC endpoints that are configured as part of proxy deployment:

    1. Druva Backup Service Endpoint

    2. Druva Node Service Endpoint

    3. S3 Endpoint

    4. SQS Endpoint

    5. EC2 Endpoint

    6. CloudFormation Endpoint

    7. EBS Endpoint

    8. Lambda Endpoint

    9. Logs Endpoint

The AWS service costs are to be paid to AWS. For more information on the service costs, refer to Amazon EC2 pricing and AWS PrivateLink pricing.

Did this answer your question?