One of the most important elements to performance on a storage system is the cache subsystem. Even All Flash Arrays utilize RAM-based cache to accelerate performance. Cache is important for reducing latency and increasing throughput by placing the most often used blocks of data on a faster tier of storage (typically SSD or RAM).
Cache is More Efficient When Deduplicated
In the SimpliVity OmniStack Data Virtualization Platform, all data is deduplicated at inception, once and forever across all stages of the data lifecycle. The data management layer of the Data Virtualization Platform tracks all of the individual references to the unique blocks that have been written to the hard disk drives (HDDs) in metadata.
This deduplication extends to data that is stored in cache. When placed into cache, a block is read from the hard disk drive (HDD), incurring a Read IO, and a copy is placed on the SSD drives. Although the block could be retrieved for a single request from a single VM, it could be requested again by the same VM (perhaps it’s a block used by multiple files) or even requested by a different VM on the same host (perhaps it’s part of a core Windows file). When the second request comes looking for the same block, it’s already been placed in cache. Now you have two VMs benefiting from a single cached block. Extend this example to ten VMs, and we have a single block in cache that could be worth ten or more blocks in a non-deduplicated environment.
Continuing this example, the nine additional blocks that were accessed directly from cache generated no IO to the back-end disks. That’s nine less HDD IOPS consumed that are now available for another read or write operation that hasn’t been cached already. Imagine the benefits that a VDI environment could realize during a boot storm. All the VMs are based on the same template, and therefore they all have the same set of files during initial boot. Normally, 100 VMs all booting at the same time would require a significant number of HDDs, but with SimpliVity’s OmniStack Data Virtualization Platform, the first VM to boot reads the block off the HDD, which promotes that block into cache. Now the next 99 VMs can all access that same block from cache. That’s a 100:1 IOPS reduction on the IOPS-bound disks!
Since SimpliVity’s Data Virtualization Platform utilizes RAM in the OmniStack Virtual Controller (OVC) as a tier of cache, we can now apply all the same benefits just discussed to the utilization and speed of memory-based cache. Blocks used by 100 VMs brought into memory cache at the cost of only a single VM can bring a significant performance increase to a VDI environment.
Another huge advantage for cache that comes from tracking all data as metadata is more intelligent cache-warming algorithms. Instead of simply grabbing the next block on the disk and betting on previous writes being sequential (see the IO Blender effect), SimpliVity hyperconverged infrastructure nodes will calculate predictive caching based on the metadata. This leads to a much more intelligent and successful predictive cache approach.
This is just one of many advantages SimpliVity’s Data Virtualization Platform provides by designing for data efficiency from the ground up. This data efficiency, when applied to cache, not only improves the space utilization of cache, logically providing a larger cache, but also prevents read operations from going to the HDDs. It helps our customers improve their application performance and realize, on average, 40:1 data efficiency.
Learn more about SimpliVity’s Data Virtualization Platform.