Disaster Recovery Horror Stories

disaster recovery stories

When we hear about a data loss and no existing backup, most of us assume a careless consumer. Surely that couldn’t happen to a business, could it?

It could, and it does.

Oh, the Horror

These cautionary tales are true, and names have been changed to… you know the drill.

The Problem: Only Back Up Files

This business faithfully backed up its files but not anything else: system configuration, applications, OS’s, patches, and more were all left to chance. In addition, critical applications shared the same high performance server with no redundancy.

One day the disk system failed so spectacularly that not even disk recovery was possible. The file backups existed but the company’s VAR had to rebuild a new array from the OS up, install and patch all applications, and only then restore the data.  Downtime lasted two weeks even with the existing backup, and the VAR’s bill was well over $20,000.

The moral of the story: Remember that restore is all about the application, not just the data. Put bare metal recovery procedures in place and/or store VM images to a remote data center or the cloud.

When People are the Problem

A university owned a tape library. IT swapped out 2 dozen tapes every Monday and stored the backup tapes in an off-site vault. One employee was responsible for the process, and he and another employee were to validate backup on a monthly and quarterly basis.

One day the email servers failed and rest of the IT staff came in to fix the problem. They set up and booted new servers, then retrieved the latest tape set from the vault. They began the restore — only to find that the latest set of messages was 10 months old. As it turns out, the employee in charge of weekly swapping swapped out 25 tapes all right – but he and his partner never bothered to validate backup. The backup server had simply stopped working 10 months ago and no one ever knew. Fortunately the IT team managed to migrate current email data from the failed servers’ hard drives and saved the day. The erring employees were not so fortunate.

The moral of the story: Validate, validate, validate – and validate the people doing the validation.

The Problem: RAID Instead of Backup

A photography studio stored TBs of digital content. They protected their content with RAID 0 and believed that they did not need to backup their files. Then a drive failed. Normally the failure of a single drive would not have doomed the array – except that the owner mistakenly pulled out the good drive to replace it, not the failed one. The data was lost, and without a backup the company had to use a data recovery service to the tune of $25,000.

The moral of the story: RAID is an excellent redundancy measure but does not replace backup.

Where Laptops go to Die

Laptop owners are notorious for not backing up data stored on their hard drives. Even when the user brings a laptop into the office, there is no guarantee that something will not happen. In one case, a floor in an office building caught fire. Firefighters quickly doused the flames but the sprinkler system had already done its work. Every exposed laptop was destroyed from fire or water. A few users had backed up their laptops regularly; most did not.

The moral of the story: Invest in network software that automatically backs up connected devices, and configure automated cloud backup for all of the other times.

Ultimately three things are absolutely vital to assuring disaster recovery:

  1. Backup without fail. You might backup to tape, disk or the cloud; or replicate to a metro site or a site across the country. However you do it, do it — including mobile devices like laptops.
  2. Validate the backup. Never assume that a backup is working. Choose backup software that polices itself with native validation. On top of that, validate the validation: test to make sure that the backup has occurred and that it is accurate.
  3. Test your disaster recovery plans. At least quarterly, run a DR test to make certain that you can restore not only data but servers and applications as well. And judge the time it will take you. If you can realistically restore to your service level requirements, then great. If not, fine tune your data protection infrastructure until you can.

Remember this adage: No matter how much money you spend on disaster recovery, it will never cost as much money as a catastrophic loss. Words to live by.