One of the core tenants of our OmniStack Data Virtualization Platform is “deduplicating, compressing and optimizing all data inline, at inception, once and forever.” We’ve covered the majority of this, but what do we mean by “once and forever”?
The data contained in each VM goes through several potential stages in its lifetime. Obviously there’s the operational phase of the VM while it’s in “production” and serving its purpose to the business. Backups will (hopefully) be taken on a regular basis and stored both locally and remotely, and possibly in the cloud. Recovery from backup will need to occur (hopefully very rarely). Occasionally, a full clone of the VM will be taken (perhaps more often in a Development or Test environment). In a multi-site configuration, there may also be the need to move the VM between data centers, either as a data center migration, disaster avoidance or failback. Let’s investigate each of these and see how the Data Virtualization Platform can benefit your environment.
When creating a VM clone, you are creating a full copy of the VM at that point in time. In a traditional storage system, each block would be read and rewritten. Most storage arrays have implemented VMware’s vStorage API for Array Integration (VAAI) to offload operations like cloning to the array. With VAAI, the overhead of network transport and host processing is eliminated, but the need to perform read and write IO still exists. This makes the process more host and network efficient, but doesn’t make the storage itself any more efficient.
In SimpliVity’s Data Virtualization Platform, a clone of the VM can be made by performing a copy of the metadata instead of a full copy of the data. With no read or write operations to HDD, the clones are created much faster with dramatically less IO hitting the disks.
A full VM backup is also just a full copy of the VM at a point in time. Traditionally, this requires reading either all the blocks of the VM or just the “new” or “modified” blocks. Each read operation incurs IO on the array, data rehydration if the array is deduplicated, data transport across the storage network, processing by the backup engine, possibly transport again to a backup appliance, and possibly deduplication and IO on the final media. The impact of running backups has been well known for years, which is why we have “backup windows” during non-critical hours.
Backups that occur on the storage array itself are generally unaware of the individual VMs, and instead deal with the storage at the LUN or share level. It is inherently difficult to separate out the individual VMs and apply specific policies to them. In order to reduce time and IO, these “backups” are often times a special implementation of an array snapshot, creating a backup that is physically bound to the original LUN/share. If you loose the original LUN/share, you also loose the backup.
Just as the Data Virtualization Platform does with the clone operation, a SimpliVity backup is a metadata copy of an individual VM, instead of the resource-intensive processing of all the data. This creates a fully independent logical copy of the VM. Since neither owns the underlying blocks of data, and all of the metadata and data is stored on two separate nodes, there are no dependencies to the original VM or snapshot. With no data to move, a SimpliVity backup is also completed more rapidly and with dramatically less IO, so our customers can get rid of those pesky backup windows!
When creating a remote backup in a tradition backup architecture, the originating data center doesn’t know the full set of blocks stored in the destination data center. Traditional backup operations will either send all blocks or just the “new” and “modified” blocks across the WAN.
In a SimpliVity federation of hyperconverged infrastructure building blocks, the originating data center works with the destination data center to evaluate the blocks needed to recreate the backup. Only the unique blocks that have never been written in the destination data center are moved between the two data centers. It’s like only sending the instruction booklet and a few blocks of a Lego set instead of the entire set. This happens even on the first remote backup utilizing any blocks written by active VMs or other VMs’ backups already in the destination data center. Moving data in this way results in a much more efficient utilization of WAN bandwidth, eliminating the need for WAN accelerators for data migration between data centers.
By running a version of the OmniStack Virtual Controller, the same software that powers SimpliVity’s hyperconverged infrastructure appliances, in an Amazon AWS instance, customers can utilize the Remote Backup process to store backups in a public cloud instance. This provides off-premises storage to further enhance the recoverability of data without having to rehydrate.
When completing a restore of a backup, most backup systems will need to read all the blocks out of the networked backup storage device, usually a backup to disk appliance or a tape library, and write all blocks back to the array. This results in IO on both devices and latency between the two. To further compound the problem, the backup storage device isn’t built to handle a lot of IO, resulting in slow restoration times.
The underlying process of restoring a SimpliVity backup should be pretty obvious by now. A copy of the metadata is created and the resulting logical VM files are presented to the hosts and registered with vSphere. The result is the ability to restore the VM in a matter of seconds, even when restoring across a WAN.
Move Between Data Centers
Most technologies that migrate data across the WAN rely on moving every block across the WAN or staging the data on a physical device, then moving the device to the remote data center via “sneakernet.”
Within a SimpliVity federation, moving a VM across data centers works very similar to a Remote Backup. The main difference is that at the end of the Move process, the VM is removed from the originating data center and added to the inventory of the destination data center. Organizations can easily migrate a VM across very long distances and power it back on in a matter of a few minutes since only the unique blocks not in the destination data center need to be moved across the WAN.
Data Efficiency in the Real World
Using the advantages of deduplicating data inline once and forever across all stages of the data lifecycle, one SimpliVity partner was able to complete a Remote Backup of a 450GB VM across 4200 miles in 76 seconds and restore that backup in another 56 seconds.
For most customers, this massive reduction of data movement across the WAN allows them to send more frequent backups to their remote site. The deduplication of backups also allows them to retain more backups. This means less lost data in the case of a disaster.
Before turning to SimpliVity, some customers had inter-site connectivity that prevented them from achieving their disaster recovery goals (traditional replication could not complete successfully across the WAN). Utilizing SimpliVity’s hyperconverged infrastructure, our customers can now have a true disaster recovery configuration that meets their business needs.
By deduplicating, compressing and optimizing the data as soon as it is committed to the storage layer (Once), and maintaining this optimized state throughout the life of the data (Forever), including local backups, remote backups, WAN migration and cloud storage, the VM will never need to be fully reconstituted. The result is much faster VM operations, reduced TCO and improved storage performance due to reduced operations on the disk subsystem.
Watch our recent data protection webinar to learn about native backup and protection in SimpliVity’s hyperconverged infrastructure.