Can You Active Archive to the Cloud?

blue cabinet archive

The cloud is a very attractive option for cost-effectively storing growing backup volumes and static archives. Nothing is perfect of course: limited bandwidth, growing costs for growing data, and eDiscovery searches all complicate cloud-based backup and static archiving. But these problems are easily solved with solutions like WAN acceleration and backup/archiving vendor support for larger datasets. However, storing active archives to the cloud is a very different story.

Let’s first understand the distinctions between backup, static archives, and active archives.

  • Backup. Backup and restore depend on the ability to quickly find the backup you need to restore and the speed of recovery to match RPO and RTO service levels. You do need sufficient bandwidth to backup and restore in a reasonable amount of time. And restore will affect expenses: although storing on the cloud is cheap, restoring from the cloud… not so much. Still, the financial case for storing backups in the cloud is a good one. As long as you do not restore large volumes too frequently, the cloud’s scalability and simple management still come out on the positive side of the balance sheet.
  • Archives.  Although many companies use backup as a de facto archive, true archives store files that do not require the backup application to recover. Depending on permissions and an active connection, applications and users can access archived files as needed. The archive solves the problem of overloading production storage while keeping older files easily available. Archives are most suitable for required retention periods, which can reach to decades in highly regulated industries and government.
    • Static archives. Static archives are usually kept for compliance purposes. For this type of archive, the cloud serves admirably. Static archives are not subject to frequent access, which sharply raises cloud costs. This makes cloud with its scalability and long retention periods a good choice. And should IT need to access archival data, it is easily available without having to order archive tapes from offsite locations.
    • Active archives. Active archives are a different story altogether. These types of archives often constitute big data that retains business as well as legal and compliance value, and should remain reasonably present to applications and users. While the value of backup data plunges within a few months, archival data can retain its business value upwards of 10 years. Since active archives are often large and growing, storing these archives on production storage is costly and can impact application performance. IT usually stores them on nearline disk or tape for storage savings and accessibility.

How about the Cloud for Active Archives?

Nearline storage comes with its own set of capital and ongoing storage costs, and can be an expensive proposition for large data volumes. Purchase price, IT management time, rack space, and energy add up fast. Is the cloud a workable alternative for active archives like it is for backup and static archives?

The cloud is certainly attractive with its massive scalability and smooth management functions. However, there are two big issues around storing active archives in the cloud: performance and costs.


We’re not talking Amazon Glacier and cold storage here. Active archives must meet a nearline performance baseline in order to be useful to big data analysis. Although this level of performance is possible to achieve, it is far more expensive to do so. Even Google Nearline, which offers fast retrieval from cold storage, is not set up to cost-effectively provide frequent data access for analysis.


What are the likely costs to store active archives in the cloud? Transactional costs for big data volumes can be intense. For example, assume an active archive of just 10TB on S3. The monthly cost for storing the data is very reasonable, about $300. This is perfectly acceptable for cold storage. But when you add in transactional costs for a multi-terabyte active archive, you get a very different picture. Read requests (GET) at 30K IOPs can come in as high $3100 every 30 days, while write requests (PUT) of 10K IOPs are worse, averaging nearly $13,000 every 30 days.

The cloud is an excellent storage mechanism if you do not have to access it much. If your archive is static, you can skip the transactional costs – but if your users actively access big data archives, you will pay high transactional costs with no end in sight. For now, stay with nearline disk storage arrays or on-premise tape libraries for active archives. No doubt the cloud will step up to the active archive plate someday – but not today.