Getting VMWare Disaster Recovery Right

vm disaster recovery

In a disaster, IT can (theoretically) immediately restore virtualized applications by bringing VMs back online in a secondary site. I say “theoretically” because although this is technically true, there is a lot more to VMware disaster recovery than simply booting up some copied VMs.

IT needs to understand business recovery needs and assign RTO and RPO accordingly, invest in VMware disaster recovery tools, and test DR procedures. It’s a lot of prep work but it’s worth it: you really can bring your VMs online in a matter of minutes if you’ve done your homework and made smart investments.

Know Your Recovery Goals for Virtualized Data

The first step to successful disaster recovery applies to both virtual and physical servers. Identify mission- and business-critical applications and assign optimal recovery point objectives and recovery time objectives (RPO and RTO respectively).

Once you have assigned RTO and RPO by application, plan recovery procedures to reach the objectives. Understand application dependencies and apply them to VM replication schedules, ensuring that all dependent servicers are present and accounted for in the failover environment. Now define the order that VMs should come back online following a disaster and where: failover to the cloud, secondary DR site, or in the primary site following data center recovery.

Invest in VMware Disaster Recovery Tools

IT needs specialized tools to accomplish VMware disaster recovery, and both VMware and third-party vendors are happy to provide them.

If you have a mission-critical VMware installation and specialized VMware admins, consider VMware’s native offerings. VMware Site Recovery Manager (SRM) helps VMware admins plan for VMware disaster recovery with policy-based DR management, automated recovery operations and recovery order, and non-disruptive DR testing. Additional offerings include vSphere Replication for VM-level replication and VMware High Availability that restarts VMs on available servers following server or OS failures. vSphere Storage APIs – Data Protection (VADP) tracks and replicates changed blocks for continuous data protection, enabling IT to replicate changed blocks to secondary sites.

These tools assume that you have a secondary DR site. This is common enough in the enterprise, who often owns multiple data centers and can treat each data centers as a secondary DR site for the others. Smaller companies also use secondary data centers. The most common model is to lease racks in a provider’s data center, although some companies lease and equip their own space.

One of the advantages of replicating VMs to secondary sites is that there is no need to copy the primary site’s hardware and software, which saves money and time. You still need to inventory your data center infrastructure to see what you do and don’t need in the secondary site for your VMware environment. Your secondary site could have commodity servers instead of the top-of-the-line servers in your primary data center. But the site still needs sufficient processing power, energy, and storage for successful application failover. Know what you need: don’t under- or over-spend by subscribing to racks you don’t need, or by over-building your secondary data center.

Does It Have to be VMware?

You’re not shoehorned into buying VMware’s DR tools or leasing secondary data centers. Choices abound and a good thing too, since VMware disaster recovery tools and secondary sites add to the expense and complexity of a VMware installation. If you already have or plan to use a secondary site such as a regional data center, then use it with VMware’s DR offerings or with third-party VMware replication products.

In situations where secondary sites are overkill, consider skipping the secondary site and going straight to the cloud. Here you can easily back up and recover your VMs and failover critical applications as DR as a Service (DRaaS). VMware offers its own version of cloud-based DR via vCloud Air, or go with another vendor’s DRaaS offering.

In a typical third-party DRaaS hosting scenario, IT will use a virtual appliance to run instances of the on-premise VMware environment in the host’s cloud. IT uses VMware vSphere Web Client to deploy a virtual appliance on-premise. IT defines replication groups by full environment or selected VMs, and runs the appliance to target VMs to the cloud.

The service should be capable of scaling from a single VM to an entire VMware network. When and if a disaster occurs, protected servers spin up in the host’s cloud. Authorized users can use the applications over the WAN, preferably using VPN for secure access.

Whether you use VMware’s own DR portfolio or opt for a third-party offering – or mix and match them — look for the following in VMware disaster recovery:

  1. Ease of deployment. Choose products that install easily and scale over time.
  2. Flexible and lower cost alternatives. You don’t have to go all-VMware all the time. Choose VMware products or third-party offerings that best serve your environment.
  3. Simplified management. Even if you have specialized VMware admins on staff, don’t add unnecessary management complexity. Keep it simple.
  4. Planning and testing. Strategically build your VMware disaster recovery plans and don’t skimp on necessary investments. And don’t take your plan’s word for it – test, test, and test again. Consider investing in VMware disaster recovery tool that automate DR testing.