So I think a lot of us take backups for granted. It’s one of those things you look at once and then tend to not worry about too much. As long as its working, why worry?
Except… if you don’t look at it, how do you know how well its working? I’m talking from the viewpoint of a senior engineer or manager here of course, hopefully if you’re a junior engineer who has been put in charge of backups you’re making sure that the current system works well and telling people about any concerns you might have. If not, go do that now. I’ll still be here, I promise.
So a couple of things have been stirring in my mind lately. Firstly is the incredible story of NotPetya and the impact it had on businesses all over the world. One especially poignant story is that of Maersk and it’s told very well in an incredible article on the Wired website.
Now, I’m possibly tempting fate here but I can say that my employer has not had serious threats from this kind of malware attack. I’m going to put some of that down to good work by my team, some of that down to not being a very interesting direct target and some of that down to good old-fashioned grace-of-God (or luck if you prefer to call it that).
But we shouldn’t be taking any of those things for granted. Not me, not you. So with thoughts of what malware might do to an on-site backup repository in mind, I was already thinking of alternatives.
One of the issues my employer has, as an educational establishment, is funding to undertake major projects. We’ve had a quantum tape library on-site for some time and used it with Arcserve backup and this actually worked very well. I’ve got no complaints about either product. One issue we did have due to funding was getting the money available to tackle both hardware and software components of our backup system at the same time, as the two items tended to have different life-cycles.
One of the worst reasons I can think of to do something is “because that’s always what we’ve done” so after a bit of creative work on budgets and schedules, I was able to line up all the components of our backup system to be evaluated and replaced or renewed together.
And I found something interesting.
The cost of backups
One of my frequent “go to” phrases goes along the lines of “if you think doing it right is expensive, you should see the real cost of being too cheap” and this obviously applies to backups. You need a robust and reliable system that you can count on to work without too much day-to-day intervention. You need to carry out test restores from your system so that you can trust it. Most important of all, you need a backup environment your team understands and believes in, so that they can transfer that belief to others in the business.
Now that we’ve established that whatever backup you choose needs to be one that works and that there’s more to cost than the price, we can talk about the price.
The figures below reflect the college’s investigation into our needs some time ago. They should be taken as simple examples and comparisons and are not indicative of the prices that a supplier or vendor might be using now.
On-Premises backup to tape
Tape backups have served the college well over time but I think they are becoming less useful and relevant to where we want to be as a college in 2018.
- Tapes deteriorate over time and both software and standards change. Simply having data on a tape from 6 years ago is no guarantee we can retrieve the data.
- Tapes are currently stored on-site at some expense and require manual intervention to be changed. New tapes need to be purchased and old ones need to be retired. To achieve true business resilience, we need to store data off-site and this will cost at least £1,440 a year, plus our time managing that service.
- We don’t refer to the tape backups very often. We’ve been retaining data for several years but have only used tape to restore data a couple of times in the past few years, when it’s proven to be a lengthy experience.
If we keep going with tape we will need to replace the current tape library with a new one, as it has reached the end of its useful life; it was purchased in 2012 and is based around LTO5 and is out of manufacturer’s support.
The figures below are for a similar replacement: 2 drives in a library that has a capacity of 50 drives (which we currently have) but only licenced for 25 slots. We will need to upgrade the slot licence at some point in the life-cycle but we could start with the 25 slot licence.
Quantum i3 tape library. This would cost £13,068. We would stick with Quantum because we’re familiar with it, we know it is compatible with our hardware (e.g. it should just plug into the server the current library is using) and we know it is compatible with the Arcserve software.
We will also need new tapes, fitting the new LTO 8 standard. To fill this library and have a complete set of replacements we will need three of the 20 tape library packs. The cost of these is £3,211 each.
We will also need to continue the Arcserve licence, which will cost £11,996 a year
Ongoing costs would be the Arcserve licence and a box of tapes a year, so:
If we consider the tape library to have a 6-year life span, we can divide the cost of the initial purchase of tapes and tape library over the lifespan of the device to get £3,785.50 and add that to the yearly costs to say that it would cost us £20,230.50 a year to continue tape backups for the next 6 years.
Backup to the Cloud
After some thought we decided to look at Azure for our backup storage provider, so the rest of this article talks about the cost of using Azure for backups. You might have better reasons to use AWS or someone else than we did, and the prices again should be taken as an indicator based on when the research was first done, not a guarantee of what you might pay yourself at the time of reading.
One thing that occurred to us with backup to the cloud is that we could use Microsoft System Center Data Protection Manager to manage our backups. This was a no-cost option for us due to the licence terms we already had with Microsoft. and this obviously makes a huge difference to the bottom line numbers.
“Cloud backup” is something of a marketing term. We’re going to continue to use it here, but it’s worth remembering that what we’re talking about is more formally referred to as “using Microsoft Azure Backup to store backups and replicas of our systems and data off-site in EU and UK data centres”.
With this proposal, we proposed deploying Microsoft’s backup software on-premise and have that create a backup of all our systems and data to an on-premises storage server. We will then choose to have some of these backups extended into the cloud so that they’re safely stored off-site.
Systems that will only be backed up on-site will be “non-essential”, which typically means services that can be relatively easily rebuild in the event of a major incident. Systems and data that will be backed up to the cloud will include systems that would be extremely difficult to reconstruct and/or data that would be impossible to re-create.
To give an example, the student CRM “Front End” server, which only contains information about how to render CRM data would be backed up locally, as it would be trivial to reconstruct this after a major disaster. The data held and used by the CRM system, which is stored in one of our main database servers would be backed up to the cloud as part of the backup of the database server.
How does Azure work?
Backup pricing in Azure is costed fairly simply. You pay to use backup services to the tune of £7.50 a month for a 500Gb “instance”. This means that a basic application server that uses 200Gb of space will use 1 instance of £7.50 a month. A major file share server that uses 1200Gb of space will use 3 instances, £22.50 a month.
On top of this, we also pay for storage we use, depending on redundancy and tiers.
Redundancy levels refer to how Microsoft protect data stored in Azure. LRS, ZRS or GRS refers to how the data is replicated around the Azure infrastructure, and is explained in detail here.
The choices for backup vaults seem to centre around LRS (Locally Replicated) and GRS (Globally Replicated). LRS means that the data is replicated only within the same datacentre, and GRS means that the data is replicated globally between several Microsoft data centres around the world, and costs more to use. We chose “LRS” for our backup vault as it seemed to us that the backup copy in Azure would be the third copy of the data we would have in normal operating conditions (e.g. live working server, local backup, cloud backup) and that was good enough for us.
You should make your own assessment of your requirements and base your decision on that. For example, if we had an extensive cloud deployment of VMs with irreplaceable data on them it would make sense to consider globally replicating the backup vault for that data with GRS, just in case…
Tiers relate to speed and ease of access, and are referred to as “Hot”, “Cool” and “Archive” tiers. Archive tier is designed to be written to often and read from rarely and is costed favourably to that model. As we’re anticipating keeping backups on-site and only using Azure storage for recovery from a major disaster, we’re proposing that all our backups are made to Archive storage. If we find we’re reading backups from cloud for one or two systems on a regular basis then those systems should be migrated to backup to the ‘cool’ tier.
We can combine the pricing per tier with the pricing per instance to get example pricing for the 200Gb application server we mentioned before. (Note that you pay for instances in 500Gb steps but pay for the precise storage you use per Gb.
So to backup the application server would cost £7.50 plus archive storage (0.0017 * 200) £0.34 = £7.84 per month.
To back up our example major file share server will use 3 instances (£22.50) plus archive storage (0.0017 * 1200) £2.04 = £24.52 per month.
With all this in mind, we looked at the data we would be moving off-site and felt that it should cost about £2900 to implement the project as a one-off cost in the first year, with storage costing £7000 p.a. assuming our storage space needs stay roughly neutral (1). This represents a notable saving in the first year and we’ve gone down that road quite happily.
DPM could be a little less moody to work with, admittedly, but we can re-assess whether we wish to use other backup software to write to our Azure data vault in the future.
In terms of business continuity, by moving backups off-site we have some isolation not just from the obvious (and probably least likely) disasters that people think about when talking about this kind of thing but also from malware that hijacks local servers and encrypts or deletes data. You’d have similar isolation from malware with tape of course, but you’d still need to get the data off-site.
(1) Future storage requirements are an interesting one. Normally you anticipate growth when modelling storage and backup plans for the future but as we’ve migrated all our mailboxes to Office 365 in the past and we’ve just finished migrating our SharePoint sites in to Office 365 also, coupled with a strategy to push OneDrive for Business hard at our staff and students vs. our need for ever greater report generation and processing of CRM data, I can see our storage requirements staying neutral or even decreasing in terms of capacity, while the requirement for faster storage to improve database performance increases.