In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. A related and somewhat synonymous term is single-instance (data) storage. This technique is used to improve storage utilization and can also be applied to network data transfers to reduce the number of bytes that must be sent. In the deduplication process, unique chunks of data, or byte patterns, are identified and stored during a process of analysis. As the analysis continues, other chunks are compared to the stored copy and whenever a match occurs, the redundant chunk is replaced with a small reference that points to the stored chunk. Given that the same byte pattern may occur dozens, hundreds, or even thousands of times (the match frequency is dependent on the chunk size), the amount of data that must be stored or transferred can be greatly reduced. Deduplication is different from data compression algorithms, such as LZ77 and LZ78. Whereas compression algorithms identify redundant data inside individual files and encodes this redundant data more efficiently, the intent of deduplication is to inspect large volumes of data and identify large sections – such as entire files or large sections of files – that are identical, and replace them with a shared copy. For example, a typical email system might contain 100 instances of the same 1 MB (megabyte) file attachment. Each time the email platform is backed up, all 100 instances of the attachment are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; the subsequent instances are referenced back to the saved copy for deduplication ratio of roughly 100 to 1. Deduplication is often paired with data compression for additional storage saving: Deduplication is first used to eliminate large chunks of repetitive data, and compression is then used to efficiently encode each of the stored chunks. .

View More On
  1. deltadata

    Active Business Backup and win 10 music files & deduplication

    I think I want to use Active Backup for Business as my Windows 10 PC backup tool. 1/3 of my files are music, many have the same file name and song name, but are different versions and in some cases, the only way to know is in the file details(tags), they usually have different file dates, but...
  2. S

    Hyper Backup to USB - no deduplication?

    I'm doing a Hyper backup to USB drive (dest: ext4, compressed, encrypted, no backup rotation, 8TB drive) for offsite, about 5 TB data. When I do the initial backup, it takes a long time (expected). When I do the second backup on that same job, it again takes a long time, and then runs out of...
  3. paradeiser

    Deduplication on HyperBackup target?

    Hi, I use C2 for daily off-site backups - which works great (including versioning and deduplication). As a second layer I have weekly Backups on an external USB HD as single-version (for quick access to the full amount of data if something breaks). There is a lot of redundant data, as I also...