When talking about data protection, there is often some confusion around the definition and roles of Backup vs. Archiving. So I’d like to give a few short examples to illustrate how the 2 are not only different, but complimentary.
There are 2 ways to think about the concept of Archiving.
- Archiving is a tool to improve server performance
- Archiving is a data protection tool
Let’s start by looking at the first example, since this is where most people get confused.
Archiving for Server Performance
Let’s suppose that I showed you 2 vehicles that each cost $100,000. One is a bulldozer, and the other is a sports car. Which would you say is of higher quality? Of course, that’s a trick question. The quality of the vehicle depends entirely on its ability to meet its intended purpose.
Your primary server is a bit like the high-end sports car.
It’s designed to get work done very quickly. But in order to have the best performance, you need to remove any excess weight. This might mean taking extra weight out of the trunk (such as luggage or the spare tire), making sure you ride alone without any extra passengers, and (if this is a short trip) only filling the gas tank part way. Every unnecessary pound you shave will improve the car’s performance.
In much the same way, your server accumulates a lot of files over time. And most of these files are either rarely used, duplicates, or completely obsolete. Storing this kind of data on your primary server will slow it down and decrease its search speed and performance.
In order to maintain optimal performance of your primary servers, you need to regularly take older or less active data and move it off to another archival storage system.
Archival storage is like a bulldozer. It’s big, slow and clunky, but it can handle very heavy loads without a problem.
If someone needs an old log file from the archival storage, the search process might require sifting through multiple terabytes of data. Having a separate archival system in place allows this laborious search to be performed in complete isolation… without slowing down or otherwise affecting the performance of the critical primary live server.
This is similar to how your laptop uses slow disk drives for storage, and fast RAM storage for live working data… but on a much larger scale.
Much in the same way that you should go through your closet once a year and throw out any clothes that you no longer wear… you should go through your computer every six months and archive data that is no longer being accessed.
Archiving For Data Protection
The biggest difference between backup and archiving are that:
- Backups give you multiple versions of files so that you can recover from a previous time frame
- Archives only store a single version of the file since archival files will probably never change
The 2 biggest problems associated with enterprise backups are recovery times and backup windows. It should be pretty simple to see that the easiest way to improve both of these processes would be to back up less data.
That’s why you should only be backing up the most critical data from your primary server on a daily/continuous basis. And data that is doesn’t change on a regular basis (Videos, Log Files, Emails, Pictures, Scanned Documents, System images, etc…) should NOT be protected as part of your regular backup routine.
Since they never change, they should be protected using archival storage and moved off of the primary backup cycle. By splitting up your data protection in this manner, you can often archive up to 80% of your data… leading to significantly faster backup/recovery times.
Another advantage of archival storage is that it can be optimized for rapid search and legal compliance. But I won’t go into this topic since it can get very complicated.
This should hopefully give you a good idea about the difference between backups and archives. Because corporate data is currently growing very quickly, I predict that archiving should become much more important within the next few years.