What is the data problem? There’s a common misconception that the data problem is the result of having too much data to manage. While data growth is a problem, it’s not the biggest problem IT organizations are facing today with virtualized workloads.
Over the last 20 years, IT organizations have been focused on the problem of relentless data growth—trending at a 40-50% increase per year, according to IDC, and expected to reach 50x the amount of information by 2020 (an estimated 40 zettabytes!). With all of the innovation in the storage industry in the last decade and a half, one might not expect storage to be a contributor to the data problem, and yet it is. Why? Well, worrying about having adequate capacity to keep pace with data growth is no longer the primary concern that keeps IT professionals up at night; it is ensuring adequate performance/IOPS to fuel application requirements—and achieving it in the most cost-efficient manner.
Fifteen years ago, each LUN in a shared storage array mapped to a single server, and that server supported a single application. The business expected the IT organization to back up that application and data to tape media once per day, and the recovery time objective (RTO) was likely measured in days. Today, a single physical server is running many applications and that creates what is commonly referred to as the IO blender effect. The 24-hour RPO and 24+-hour RTO have been replaced with an RPO measured in minutes and an RTO measured in hours (if not minutes). This has created the need for greater frequencies of backup and rapid recoveries.
Fifteen years ago, hard drives (HDDs) had capacity in the neighborhood of 36GB and delivered roughly 150 IOPS. Today, hard drives are equipped with 6+TB of capacity, but they still deliver roughly 150 IOPS. Storage infrastructure is no longer sized simply based on capacity. Today, performance requirements are an important consideration and often drive the sizing discussion for storage.
Data efficiency technologies, including compression and deduplication, were historically designed to address a capacity problem—for storage and network bandwidth. Companies like Data Domain and Diligent Technologies developed technologies with the objective of packing as much data on to a set of HDDs as possible. HDD capacity was expensive and these technologies were designed to trade CPU resources for HDD capacity—one of the reasons why the efficiency introduced via deduplication was limited to the backup storage tier.
In today’s modern data center, HDD capacity is no longer the primary concern; the bigger issue to deal with is IOPS. IOPS requirements have increased by 10x in the post-virtualization world and HDD IOPS have been stagnant and just can’t keep pace. Instead, IT organizations are using flash/SSDs more often; however, flash/SSD is pricey, and it’s only suitable for portions of the data lifecycle.
The next blog in the series will look at compression.