Question :
I’ve seen several online sources mentioning that you should always have a restore script automated for the possible need to restore a large number of databases at once and quickly get back up and running to deliver on your SLA.
However, considering I’m running synchronous mirroring in two availability zones, I’m struggling to find a scenario in which I would need to do mass restores on a large number of databases where scripting a restore automation solution would be necessary.
Can anyone point to me to realistic scenario in which you’d need the capability to do a mass number of restores in a scripted manner?
Answer :
So you’ve got two zones. Let’s say zone one floods or burns… it’s completely gone. All the hardware is destroyed, and the old site isn’t even usable. It could be weeks putting it back together. Thankfully, you’re still online thanks to zone two.
Of course, you might try something like short-term leasing servers to get redundancy back while you put together your real new production facility, but really the time spent getting those online is a distraction from your main task, which should be getting a new primary facility up to snuff as quickly as possible. If you do attempt the temporary facility, having the restore scripts will greatly aid getting the temporary facility online quickly, allowing you to get back to your main task.
If you don’t opt for the temporary facility, how capable is your zone two site really? I hear all the time about failover equipment that’s really just retired equipment from from the primary site. This often means equipment that is older, which raises concerns about both performance and reliability. How long do you really want to leave that facility running as your only data center? Are you taking good backups while this site runs as primary?
In short, wouldn’t you like the setup process for your new site to be as quick, reliable, and well-understood as possible? The longer it takes to get the main facility back up and running, the more likely it is for something to also go wrong at your second facility.
Of course, this is just one scenario. It’s the big scary total destruction scenario the seems to unlikely to ever actually happen for you, and maybe even that’s true. But this kind of things plays out on a smaller scale all of the time. The ability to do quick, reliable, scripted restores for your entire data center for the big emergencies implies the ability to also do quick, reliable, scripted restores for the small emergencies. And that’s a good thing.