What’s The Difference Between Inline and Postprocess Deduplication?

It’s a common scenario that we’ve discussed before. IT Budgets are frozen or shrinking, and data growth is rising at a faster rate than storage prices are dropping. Simply throwing new hardware at this problem won’t make it go away. Instead, we need to be smarter about how we use and manage our data.

In order to deal with an exponential problem, we need an exponential solution.

This is partly why everyone is talking about the convenience and practicality of deduplication. Deduplication technology allows your backup system to eliminate all duplicate or redundant content from your backups, and instead replace this information with a pointer to the original file.

If a file is found to be 100% original, it is written directly to the backup device. But if the file is a duplicate, a placeholder called a “pointer” will be written to a listing called a “hash table”. In the event of a restore, the hash table listing will be used to copy all of the duplicate data in order to “re-inflate” the full backup.

In some cases, this is enough to decrease your storage by 80% or more.
One of the great things about data deduplication as a storage strategy is that it scales well. The more data you store, the better it works.
When it comes to data deduplication, there are generally 2 methods that you can choose from:

  • Inline
  • Postprocess

With inline deduplication, all of the compression happens between the ram and the processor. The data is analyzed and deduplicated when it first comes in, and then it’s either written to disk (if it’s an original file) or a pointer is added to the hash table (if it’s a duplicate).

In the case of postprocess deduplication, the files are firsts written to disk in their entirety. Once the files are written, the hard drive will be scanned for duplicates and compressed.

In other words, inline happens BEFORE the files are written and postprocess happens AFTER. Each approach has its benefits and its drawbacks. But we’ll have to save that debate for another post.

For now, at least you have a good idea of how both deduplication methodologies work.