> Relying on consistency checks will not save you if someone does the git eq...

andrewvc · on March 24, 2013

Ummm, still not a backup. What if there's an error in the code? What if an admin runs a script directly on the server and resets all repos by mistake? What if a bad release of git comes out and corrupts the master? What if a hacker comes in and wipes them all out?

Only a backup is a backup. Backup, like security, is somewhat onion like.

mpyne · on March 24, 2013

I never claimed it was a backup, only that it wasn't completely susceptible to user error as was surmised in the comment I replied to.

However, regarding the backup question, an rsync backup would have been just as damaging to the anongit mirrors as git --mirror was. The whole point to git for KDE is that that "distributed" part of the VCS would help handle backups. And it has, that part has not really been in question.

The 'luck' came in where there happened to be an anongit mirror that was fully synced up so that we didn't have to crowdsource repo restoration, which saved a lot of time and anguish.

Had we at KDE ensured that the git repository being synced to an anongit mirror was fully consistent we wouldn't even be speaking about this: the git.kde.org repos would be shut down until the box could be restored and we'd have any of 5 easy backups to choose from to restore the repositories (the rest of the files on the box would be restored from the normal backups used).

I want to stress that this is the larger point here: It's possible in some ways to corrupt a git repository and have its subcommands not notice. You must use the provided git-fsck (directly or indirectly) before backing up a git repository, especially if you don't use git for the backup, or use git-clone --mirror.

The error wasn't that we weren't doing backups, the error was that we were making corrupted backups. tar | /dev/tape will do this to you just as badly if you get the right FS corruption.

COW snapshotting filesystems can help (if they have no bugs) but the KDE sysadmins were working under the errant assumption that git would make the integrity check in situations where that wasn't true, not that backups are simply not required.

derleth · on March 24, 2013

What if your backup systems get compromised? What if your backup systems get destroyed by inclement weather? Not a backup then, obviously.

Only a backup is a backup, until it's not a backup anymore.

MaulingMonkey · on March 24, 2013

This is why, if your data is sufficiently important, you'll want to:

1) Test your backups, to detect when your backups are no longer backups.

2) Make geographically diverse backups, so a single tidal wave can't wipe out your data. For bonus points, have enough geographically diverse backups that the world is probably ending if they're all being wiped out -- at which point you have bigger problems to take care of.

3) Make backups with a diverse set of mechanisms, so the failure or compromise of one (or N-1) can't fail and compromise all backup copies. Making backups on write-only media and hiding them means current failure or compromise can't fail and compromise previous backups, and may help back your data up against theft, landlords, angry neighbors, spurned girlfriends, or even the occasional corrupt government official.

Mirroring (be it software or RAID) is not a backup system: It is far too dumb, far too happy to overwrite your old good data with new bad data. You want a history, where old good data is not replaced.

Git is not a backup system: It is a version control system. While it may have some of the properties of a backup system as goals, that is not it's primary use case. As a result we see articles like this where we've seen how it can fail in achieving the goals of a backup system as a practical matter in this very article, even when intentionally attempting to use it as a poor man's backup system in the form of mirrors.

Such problems are not unique to git, of course. On a personal note, I've managed to wipe data with both git and perforce in moments of weakness. If you want to treat me kindly about it, you could say I used both to the point where the statistics were against me not shooting myself in the foot. And, fortunately so far, the use of proper, separate backup mechanisms have always allowed me to restore the majority of my data and left me relatively unscathed.

mpyne · on March 24, 2013

We're already doing #1, 2, and 3 on the list you provided, just so you know (although we missed out on some areas for #1 in retrospect).

chris_wot · on March 24, 2013

That's kind of ridiculous. The point that most people are making is that if someone does something incredibly stupid, or there is corruption in the system that follows down the line (like what happened here), it doesn't matter whether you have a repository.

A backup clearly would have helped here.

mpyne · on March 24, 2013

A backup of a corrupt repository would have been just as corrupt though.

This is the big thing I can't figure out what people are not understanding. git does consistency checking for you already, tar|rsync|etc. don't, so it makes sense to take advantage of that.

What we had was an instance of some of the underlying data becoming corrupt on the filesystem (with indications of that starting on Feb 22!). The big mistake was considering the source repositories as consistent and canonical at the remote anongit end, but the data would have been just as corrupt if we had scp'ed the repos from git.kde.org to the anongit mirrors around the world, since we would have bypassed git's internal checking in that way.

Is it safe to rsync a running mysql database at random times, or are you supposed to use mysql-provided tools to perform a backup?

tempestn · on March 25, 2013

OK, but what stops them from daily performing a mirror clone, checking it for consistency, then backing that up? As mentioned in the linked update, 30 complete backups would consume only 900GB, so you could keep weeks of daily backups, plus weekly and/or monthlies going back much further, for a terabyte of space. That way, in the worst case, you could go back to a backup before the corruption began. Obviously you would want to have plenty of safeguards in place so that that never happened, but just in case, it's good to have an honest to goodness backup too.