As a recent infrastructure architect at NewPage Corporation (now Verso Corp), I had first-hand experience in running Microsoft SQL Server on SimpliVity hyperconverged infrastructure.
NewPage runs more than 30 SQL databases, including a 3.4TB Microsoft SQL Server Data Warehouse. In most cases, these were business-critical, so performance was always a top consideration when selecting IT infrastructure equipment.
One of the key challenges for our organization was the backup and disaster recovery of Microsoft SQL Servers. For years our database administrators (DBAs) have relied on Microsoft SQL’s built-in tools, such as SQL Management Studio, for mining logs and backing up data. Even though NewPage had initially purchased a leading appliance based tapeless backup solution (I’ll refer to this vendor as “Complexity.”), the DBAs preferred their own tools because of several drawbacks with the solution.
NewPage found Complexity’s interface to be more difficult than expected, especially because of the complicated procedure of performing backups. Often, when running virtual machine (VM) snapshot-based backups, we would nearly bring down the applications due to the time to commit snapshots. The performance impact were unmanageable. The DBAs made it clear that they were not interested in changing the methods if it would cause more potential for failure and slow the restore process. After all, the DBAs are the true guardians of corporate data and their native tools had been fine for years.
Overall, Complexity’s SQL backups at NewPage involved the following limitations:
- Can only perform one backup at a time
- Cannot execute backups based on percent log full
- They were much slower than native SQL backups
- Scheduled backups are sometimes late if they run on other servers
- SLAs were missed due to “daily maintenance windows” for garbage collection
In addition, Complexity’s solution was just average for disaster recovery. Restores were slow and cumbersome. Logs had to be rerun and errors corrected. It was nearly as expensive as purchasing SAN storage, but dedicated only to backup, and it provided no additional benefits.
After the first backups with the Complexity’s solution, we needed to replicate the data to the grid off-site. It took six days initially, leaving us running tape jobs during the seed to ensure that off-site requirements were met. After the replication was in sync, we had to test disaster recovery by restoring all that data to the primary storage, which can be a very long process depending on the size of the machine. Additionally, the cost to scale out was much higher than we had anticipated. Complexity recommended adding yet another solution on top of our existing purchase, but we were not going to throw more money at the backup environment beyond what we had already spent.
Turning to Hyperconvergence
When we started looking at an infrastructure refresh, we focused on hyperconverged infrastructure, and specifically, SimpliVity. We had seen their demo at Gartner Data Center Conference in 2013 and were interested in both the VM mobility as well as the potential for cost reduction.
SimpliVity is unique versus other solutions in the market – it includes native data protection. By “native” I mean built in—at no extra charge. This includes implicit disaster recovery, if you have it at multiple sites.
After we reviewed SimpliVity’s hyperconverged infrastructure and native data protection, what struck me was that my DBA would not object to the data protection features in SimpliVity’s solution. SimpliVity would actually enhance what my DBA already does.
With SimpliVity, we would simply add a “backup disk” to our servers as a target for the native log archive, as we had been doing. Our DBA would then truncate the SQL Server log file that’s targeted for backup. SimpliVity hyperconverged infrastructure would be the destination for the DBA’s log files. Then, we could simply replicate between the instances at my local and remote sites.
Disaster recovery is native too—that means you don’t need another box for log shipping. More importantly, you don’t need extra SQL Server licenses for the disaster recovery site. The replication goes directly to the primary storage at the other end – and long restores are a thing of the past. Databases can be back in action in literally 10 seconds, regardless of size.
NewPage’s before and after scenarios for backup and disaster recovery were dramatically different. For backup, our production workloads reside side-by-side with backup workloads. We also didn’t have to worry about a single-point-of-failure since there is redundancy across systems in the federation. This all helps from an efficiency standpoint since SimpliVity’s global deduplication eliminates duplicates across both production and backup workloads. Also, it’s a zero impact backup. Backup happens in the SimpliVity file system, so no read/write IO is occurring – a major drawback of Complexity’s solution. A full recovery of a SQL Server used to take 18 hours, but with SimpliVity, it only takes nine seconds, plus time to roll logs forward. That’s 0.002% of the time it used to take, which means a savings of 99.99998.
We were thrilled with the results. 150 VMs backing up nightly, start to finish offsite, all in an hour, and we ran it at 1/3 the cost of our legacy infrastructure! Hyperconvergence was a game-changer for NewPage, no more sleepless nights for our management around recoverability. Additionally, time spent setting up for disaster recovery testing or spinning out new Dev/GA environments couldn’t be easier.