In a previous post, we referenced the importance of delivering data efficiency as a core part of the solution and not as a bolt on. The first data efficiency technology we will review is compression. There are two types of compression: inline and post-processing. This post will review both types. I have attempted to create formulas to help clarify the differences and make it simpler to compare compression options, as well as other data efficiency technologies. Let’s set the formulas aside for a moment and come back to them later with a discussion about how they impact hyperconverged infrastructure.
Compression technologies are designed to reduce the size of any given data element. Certain types of data, such as video, audio, or compressed data, cannot be further compressed. Other data, such as text files or even databases, compress extremely well. The question commonly asked is “how much benefit will I get from compression?” Unfortunately, there is no practical way to know how well the data will compress without actually trying to compress it, which consumes resources.
Compression is a resource-intensive operation that uses a significant amount of CPU. It also takes time to run, so it inherently adds latency. This is one of the trade-offs that must be considered when evaluating the benefits of compression. This is further complicated in a virtualized environment where the exact resource allocation (CPU, in this case) is far more complex (effectively impossible) to predict. This results in a response time for compression that is unpredictable. This is not a good thing for mission-critical systems providing storage services.
It is worth noting that decompression is a much lighter operation and uses a relatively small amount of CPU. We have all likely experienced this on our personal systems. Compressing a file with gzip or WinZip takes much longer than uncompressing the same content.
Unfortunately, there is no free lunch in data efficiency. There is always a trade-off. Ideally, a plentiful resource is invested to save a more limited resource. In the case of inline compression, CPU resources are exchanged for a reduction in the size of the data written to disk and a reduction in the HDD capacity required to store the data.
Let’s come back to the equations I mentioned earlier. The goal of these equations is to make it clear what the trade-offs are when considering the different technical solutions. All of the solutions have trade-offs and this allows us to compare them objectively and quantitatively.
Inline compression is the process of compressing data prior to it being written to HDD.
In the equation above, inline compression technology and CPU resources are applied and the result is the right-hand side of the equation. We can see the CPU resource is consumed and the result is a decrease in the consumption of HDD IOPS and HDD capacity. Unfortunately, the consumption of the CPU cycles can lead to an increase in latency and the latency drives a reduction in the total number of IOPS available for the application.
Post-processing compression has a different set of trade-offs. It first writes the data to HDD and then reads the data back from the HDD at a later time to compress it. This results in a very different equation that does not reduce HDD IOPS, one of the most precious resources in any system providing storage services. In fact, just the opposite occurs. Post-processing compression commonly requires more than double the IOPS. These additional HDD IOPS are required after the initial write to read the data, CPU is required to process it, and then more HDD IOPS are consumed to write compressed data back to disk. The system will still benefit from the capacity savings eventually, but the trade-offs have changed.
How does compression impact hyperconverged infrastructure? Any hyperconverged infrastructure system leverages its CPU resources for both business applications as well as data center infrastructure applications (storage, backup, replication, etc.). This means that any CPU cycle that is devoted to running the infrastructure components is not available to run the business applications. This limits the number of business applications that can run on any given host and increases the cost per VM. That is obviously not a good trade-off. Let’s assume for a moment that it is valuable to reduce the amount of HDD IOPS and WAN bandwidth consumed. How can we get these benefits without consuming all of these Intel CPU cycles and decreasing system performance?
This leaves us with a lose-lose set of options. Either sacrifice system performance or consume unnecessary HDD IOPS and capacity.
How does SimpiVity deliver the benefits without this CPU consumption and IO latency challenge?
The SimpliVity OmniStack Accelerator is the special sauce that literally changes the equation (it is so infrequent that I get to use the word literally in a technically-accurate fashion). The OmniStack Accelerator offloads all of the work traditionally performed on the Intel CPU. This allows SimpliVity to deliver all benefits of inline compression while leaving more CPUs available to run business applications. It is also completely predictable.
What does the SimpliVity equation look like?
SimpliVity delivers all the benefits of inline compression without any of the downsides by leveraging the OmniStack Accelerator. SimpliVity uniquely offers the benefits of data efficiency while increasing system performance. Perhaps the OmniStack Accelerator delivers that free lunch after all.
Stay tuned for a review of the complexities of deduplication and how SimpliVity addresses them in the next post.