What makes a good backup?

Let us begin with a parable:

Trusty Ed’s Log Aggregator is a log aggregation as a service business. One day his aged landlady decides to wash the old Pentium II that Ed is running it since it’s so dusty. The machine shorts out. The hard disks are destroyed. All the logs he was storing for people are gone. They cancel their accounts and demand damages from Ed. The business’s cashflow is destroyed, as is its reputation. It can’t get new customers, it has no revenue from existing ones, and it owes reparations. This business is dead.

Pretty much everyone hearing this immediately thinks, why didn’t the idiot have backups?

Ah, backups. Everyone knows you need them, but if anyone taught you how to do them right, you’re lucky. Backups aren’t sexy. There’s no deep mathematics to be had in them, so professors never mention them. They can’t be made right by buying a program from a vendor and forgetting about the problem, so everyone has learned to treat any vendor selling “backup solutions” as a snake oil salesman. Fortunately, it’s not that complicated.

Let’s take a simple case. My parents run a medical publishing business from their home. In the 1990’s, they had a folder on one computer on their network that had all the files they were working on. Weekly my father would burn the contents of that folder to CD, and later a DVD. This he would run down to the local bank and dump in the safe deposit box. Every couple of months he would take one of the DVDs back and try to open some random files from it to make sure it was still good.

This was a superb backup strategy for them. Let’s look at why. First, they had a copy of the files somewhere else if the hardware in the computer they were on failed. Too many people don’t have any backups at all, including organizations you would think would know better, like, say, the city of Baltimore. But it had some particular properties. It was…

Timely. They wouldn’t lose more than a week’s worth of work, which they could replace by doing the work again. Further, most of that week would be in the hands of collaborators elsewhere and could be easily restored.
Offsite. If the entire house had burned down, the data was safe in the safe deposit box. For the same reason, a hacker breaking in could wipe the files, but could not reach the backups.
Checked. My father made sure a random sample of files off a disk worked on a regular basis.
Affordable. The whole process took about ten minutes of playing solitaire while the DVD burned, and a ten minute stop at the bank while going to town for groceries.

It lacked one property that today should be considered essential:

Automated. It had to be done by hand.

At the time there wasn’t a good option. They were on 33.6kbps dialup connection to our local ISP. Services like Dropbox or Amazon S3 were years in the future. An automated offsite backup that handled these conditions would have cost more than it was worth. If they had needed backups every day or every hour, that equation would have shifted, but they didn’t.

Before you design any backup solution, you must decide how much data you can lose without it being an insurmountable problem. For a web service, and with today’s tools, the least often you should back up is nightly. But if you backed up only nightly, how much would users scream? How much business would you lose? If it’s the comments on your Wordpress blog, it’s unlikely to be a problem. Post a note apologizing and ask people to repost their comments. Done. On the other end of the spectrum, some scenarios have to be backed up continuously, every update written to an incremental backup. But each step towards continuous backup costs more time, effort, and resources. Remember, backups have to be affordable.

Once you know how often you’re going to back up, you need an offsite repository for your backups. Offsite used to mean exactly that: physically somewhere else. And it still does, but there is an added dimension. It also needs to be in a different security zone, so an attacker trashing your system can’t follow the backup script and trash your backups, too.

A few years ago that would have meant a network connection to a distant site where a robot loaded DVDs or tapes into a drive, and then dumped them into a bin that only a human could reach when the write was complete. You can get 80% of the way there today without the robots, thankfully, by having a separate, locked down cloud account. It should be disconnected from the rest of your organization. Only a handful of trusted admins should have access to it. And it should have an storage bucket, such as in S3 on AWS, that allows your backup service to write to it, but not to overwrite or delete. An attacker following the backups has to compromise this account to destroy them, which can be made a very difficult proposition.

Checking is the one that so many people forget. The solution is to automate it, like everything else, so it can’t be forgotten, and then trigger an alarm when checks don’t pass. It doesn’t have to be a line by line audit. Imagine you were poking around by hand for half an hour. What would you look for? If your backup is a database dump, will it restore into a database? Does it have the right tables or only two tables called bob and bob_backup2? Are the text fields something readable or are they gibberish? Do you have any negative integers where you expect only positive ones? Is there a sane number of rows in the primary tables, or do you have millions in production and three in your backup?

So, to recap, design your backups to be:

Timely. Figure out how much data you can afford to lose, and back up that often, so it remains
Affordable. You can’t bankrupt the business with this. Remember, your Wordpress blog probably doesn’t need a dedicated fiber link to a tape robot under Cheyenne Mountain. It does have to be
Offsite Both physically and in a separate security zone. And no matter how safe it might be there, it needs to be
Checked. This doesn’t have to be a top to bottom audit, just a sanity check. And it should all be
Automated. You shouldn’t be backing up less than nightly. And you shouldn’t be trusting a human to do a task that regularly.

« Back to Programming | Home