What is Data Deduplication?
Data deduplication technology identifies and eliminates redundant data segments so that what is written to a disk consumes significantly less storage capacity. Deduplication first became popular in backup and archiving processes on secondary and tertiary storage tiers because there is a lot of redundancy in backup and archive, and the deduplication process won’t impact performance of production applications.
Deduplication is a resource-intensive process. Deduplication analyzes all incoming data for duplicates versus what exists in the system. The system compares each block of data in the data set to an index of previously stored blocks for a match. The system writes the block to disk when no matches exist. When a match exists, the system stores a pointer to the unique piece of data on disk in the index—which takes up significantly less space on disk. Performing this activity in real time is the challenge. The smaller the block size of data, the greater the number of IO per second (IOPS) required.
Deduplication can occur inline or post-process. Inline deduplication identifies and eliminates duplicates as it writes data to disk; whereas post-process deduplicates data at some point after it writes data to disk. So, you can see the concern regarding inline deduplication on primary or production storage. It requires more processing power that could affect storage performance – impacting the performance of production applications.
Deduplication at the “source” of data delivers end-to-end efficiency. Data efficiency starts at inception and remains throughout the data lifecycle. Once data is in its deduplicated, compressed and optimized state, it stays that way. This introduces capacity, bandwidth and performance savings as SimpliVity more efficiently stores and moves data within and across data centers and the public cloud.
Data Deduplication on Production Storage
Deduplicating data on the primary or production tier offers several benefits. First, it reduces the footprint of data on storage, which reduces the amount of capacity required. Perhaps more importantly in a post-virtualization data center is that deduplication reduces IOPS required, which improves performance.
Today, deduplication occurs on production storage. Technology advancements allow deduplication to occur as close to the inception of data as possible. SimpliVity handles the compute-intensive tasks of deduplicating, compressing and optimizing data in real time (inline), while dramatically lowering costs when compared with competitive solutions.
SimpliVity includes a PCIe module, the OmniCube Accelerator, to process all writes and manage compute-intensive processes. OmniCube writes data to the datastore via the OmniCube Accelerator, offloading processing that would otherwise impact performance.
SimpliVity also increases the opportunity to locate more duplicates by comparing incoming data with the largest data repository. Federating OmniCube building blocks as a single resource pool enables high availability and resource sharing. It also creates a single data pool for deduplication comparison.
Enhanced Data Deduplication
In addition to deduplication, SimpliVity provides additional processes to reduce capacity and IOPS requirements: compression and optimization. Where deduplication eliminates redundant data within a data set by comparing it across the whole available data repository, compression finds and eliminates redundant data within a data set in relation to other data within the same data set. Compression complements deduplication and provides an added measure of efficiency.
SimpliVity further optimizes processing by making real-time decisions about storing data to improve storage efficiency, performance and bandwidth usage. OmniCube is operating system- and virtualization-aware, and able to identify and eliminate the processing of unnecessary data.