When you’re making a big IT investment like data center infrastructure, it’s good to approach vendor claims with a healthy degree of skepticism. In the hyperconverged category, one area that deserves a little extra scrutiny is data efficiency. Lots of vendors will claim they offer deduplication, but not all deduplication technology is created equal. If done right, global inline deduplication, compression, and optimization can deliver big benefits – improving performance, saving capacity, and lowering costs. But oftentimes, vendors treat deduplication like a checkbox. So here are four key points to help you determine if a vendor’s data efficiency is real or just window dressing:
- Scope of deduplication – What is the scope or extent of the data pool that is being deduplicated? Is it limited to the disk group or cluster level? And what about remote backup and replication? Many hyperconverged infrastructure vendors limit deduplication to the disk group, cluster, or data center level. They do not extend over the WAN. And as such, they deliver no benefits when it comes to moving large data sets between remote sites, especially the high cost of WAN bandwidth and the expense of special-purpose WAN optimization appliances.
- Primary and backup data – Does deduplication extend across primary storage and production data to include backup data as well? This is one to look out for with vendors that sell both hyperconverged infrastructure and backup solutions. Often times, they will throw in backup software at a discount, but the deduplication they offer is isolated at the primary storage tier. This will result in additional overhead of repeated deduplication, rehydration, and dehydration of data as it moves in and out of different modules in the infrastructure. Plus, backup performance will not take advantage of the data efficiencies of deduplication.
- Inline or post-process – Is the deduplication that they offer inline at ingest or post process? In other words, does it eliminate I/O or just reduce the amount of data on disk? To mitigate the performance impacts of deduplication, some vendors offer only post-process deduplication. This is helpful in terms of storage capacity, but offers not benefits when it comes to removing I/O as a performance bottleneck.
- Performance impact – Speaking of performance, how does the vendor address the impact of deduplication functions (which are very CPU intensive) on running applications? SimpliVity addresses this with a custom-built PCIe card, the OmniStack Accelerator Card that offloads all deduplication functions and frees up production CPUs to focus on the applications instead. Without some acceleration, inline deduplication will often slow down performance when turned on. A red flag is when a vendor recommends turning off deduplication in most cases. Or they only make it available in an all-flash model. This means storage efficiencies come with performance degradation. Why should you have to choose?
In the end, hyperconverged infrastructure vendors often attach so many caveats and limitations to their deduplication capability that it loses its value. To them, deduplication really is just a checkbox feature. But why is data efficiency so important?
Improve Performance While Saving on Storage – All SimpliVity customers are guaranteed an average reduction in the number of data center devices 90% — or 10:1. The average efficiency ratio is 40:1 when you include primary and backup data, with more than a third of customers averaging 100:1 or more (TechValidate). That means 100 TB is reduced to just one TB stored on disk. And by eliminating redundant I/O, more than half of customers see an increase in application performance by 50% or more (TechValidate).
Restore Data in Seconds – The power of SimpliVity data protection is based on its inherent data efficiency. By globally deduplicating all data inline, across all storage tiers, including backup, we can do some incredible things, like performing a full backup or restore of a local one TB VM in 60 seconds or less – guaranteed. And we can support 10 minute RPOs for full logical backups (no snapshots or incremental chains) with near-zero performance overhead. This is only made possible through our approach to data efficiency.
Move Applications and Data Anywhere in the World – Because SimpliVity’s deduplication extends across remote sites, our customers get the benefit of being able to move incredibly large data sets with limited bandwidth. One of our customers, a global law firm, uses SimpliVity to replicate data from newly acquired data centers to their main data center. When they acquire a new firm, they send over the SimpliVity systems, migrate the workloads locally, and then replicate that data over the WAN to SimpliVity systems in their main site. Another customer has deployed SimpliVity systems on ships at sea, under very limited bandwidth. They used to fly helicopters to each boat to collect tape backups. Now they replicate in real time using SimpliVity.
Save Big – And the financial savings can really add up. SimpliVity has been shown to reduce TCO by 73% compared to traditional IT (Forrester Consulting), and up to 49% compared to public cloud provider AWS (Evaluator Group). A Global 50 customer recently consolidated their global data centers from six data centers down to three, and from 34 racks in a single data center down to three racks of SimpliVity systems. This is what the power of data efficiency can deliver when you solve the data problem at its core, rather than papering over it with bold claims.
When you’re making a major investment, skepticism is your friend. So don’t take my word or any other vendor’s word for that matter. Ask the hard questions. If a vendor makes claims about their deduplication benefits, ask them how they guarantee data efficiency. Talk to current customers to see whether the benefits were delivered in a real-world setting, especially if that customer’s environment is similar to your own. Don’t just accept deduplication as a simple monolithic feature. Ask about the scope of deduplication and whether it extends to remote sites. Does it include both primary and backup data? Is it inline or post-process? If inline, what is the performance impact? Do they recommend turning deduplication? If so, why and for what workloads? All of this will matter once you’ve actually deployed a new solution. That’s when you’ll find out whether the vendor’s claims were real or just a checkbox.