To my DBA brethren, I understand your pain. Before I took a job with a vendor in 2006, I was a DBA for close to 20 years.
Database administrators (DBAs) are the guardians of corporate data. Applications have service level agreements (SLAs), so DBAs are mindful about maximizing security and data availability, and minimizing performance impact on their servers’ hosting the application. They’re also persnickety about the database environment—very careful and particular about how their data is protected and how to minimize data corruption and downtime. It’s a lot of responsibility.
One part of this tall order is data protection. DBAs don’t trust their data to anyone, and typically rely on the built-in data protection processes in SQL Server. Similarly, data backup and recovery are essential; DBAs typically perform hourly transaction log backups. However, there’s never a good time to run a full backup on SQL Server as a transactional database. Some DBAs use AlwaysOn Availability Groups in SQL Server, which allows dual redundancy of the Server Application, but the SQL licensing cost makes it very expensive.
So what happens when you lose a SQL Server host? When I worked for an alcohol company in early-2000s I was performing daily full backups and hourly transactional log backups. Full backups took seven hours and transactional backups took 15 minutes. One day, the Dell server that ran our SQL Server application died – no root cause was identified. The Dell server was attached to EMC storage and it lost a disk at the same time.
Over the next four days, I had to take all these steps to get the SQL Server systems back online:
- Order a brand new server, which took two days to arrive on site.
- Attach the server to the EMC storage.
- Troubleshoot and solve issues with the Dell server connecting the QLogic HBAs to the EMC storage.
- Install Windows, setting up all the correct ports and Active Directory security.
- Install SQL Server in cluster mode…. ARRRRRGH!
- Re-create all the SQL Server logins and all the correct security connections.
- Recover the backups from the previous night:
- Seven hours to restore the database, which was successful after second attempt.
- Apply all transaction logs to the current state.
- Test the configuration/security settings and validate the data restored to be accurate to the last valid backup.
The SQL Server hosted the company’s JD Edwards ERP system and manufacturing systems. Since we could not post transactions, take orders, process payments, or start the manufacturing line – four days of downtime cost the company in the millions of dollars and untold amounts of customer goodwill.
Fast-forward to 2015 – companies can run a hyperconverged infrastructure environment with Microsoft SQL Server applications in a virtual machine. SimpliVity hyperconverged infrastructure has native data protection that backs up the application—that is, the application and all of the configuration settings for security, users, etc.
SimpliVity could have dramatically reduced my downtime. First, I would be locally running a two-node SimpliVity configuration and at least one node in an offsite location. I would still do nightly full backups and hourly transaction logs with SQL Server’s built-in tools, but I would augment those with a nightly SimpliVity native backup and replication to the offsite location, providing a much more secure and available environment.
If a server unexpectedly died, I would have been able to:
- Create a new VM in minutes, since the environment is virtualized.
- Restore the SimpliVity backup from the previous night.
- Restore the transactions.
- Validate the data restored for accuracy from the last valid backup.
In just a few minutes my applications are up and running. In the worst-case scenario – if I need to replace the OmniCube hyperconverged infrastructure node – I’d need to order new hardware and then take just a few more minutes to attach vCenter and deploy the new node to the Federation.
Why so fast?
In the worst-case scenario, I wouldn’t have saved much time on the restore of the nightly backup, but I would not have to build a new Windows and SQL Server application environment, saving time regarding the configuration/security issues, and there would be no compatibility issues between newer technology and older firmware. SimpliVity speeds up the time of recovery. This helps with the security testing.
Best-case scenario, I can immediately recover my SQL Server instance on another SimpliVity node, so I would just need to recover the data. Now downtime reduces to just a few minutes.
Overall, DBAs with SimpliVity hyperconverged infrastructure are able to sleep a little better at night and enjoy their weekends, knowing that any potential data recovery is only a few minutes away.