What is Forward-Referencing Deduplication?

Forward-referencing deduplication is still a fairly new technology, but it’s been gaining a lot of press lately. The main benefit of this methodology is that it allows for more efficient decompression in the event that you need to restore a recent backup.

Traditional deduplication methods are optimized for restoration of the oldest full backup. If you perform full backups on a monthly basis, a 30-day-old backup would restore much more efficiently than a backup from yesterday evening. This is because the deduplication process is based on the original backup, and all subsequent backup deduplications were “seeded” from this original.

This becomes a problem because most backup recoveries don’t go more than 1 or 2 versions into the past. That creates a lot of extra work for the system, which leads to slower (and more costly) recoveries.

Forward-referencing deduplication takes the opposite approach. Once a full backup has been performed, deduplication is performed in such a way that the most recent full backup version becomes the “seed”, and all of the older versions are deduplicated based on this newest version.

Compressing in this way requires a lot more work. But when it comes time to restore in an emergency, all of your most recent data is already decompressed and ready to be transferred. Since downtime is much more expensive than backup window time, this is a good trade.

If your company has been evaluating data deduplication as a means of controlling rapid data growth, but you’ve been hesitating because of potentially long restoration times… then forward-referencing deduplication might be the right solution for you.