How many times does it have to be said? Mirroring is NOT a backup strategy! The ...

richo · on March 24, 2013

Did you actually read the article? The mirroring in question is NOT block level like you'd see with DRDB or RAID.

It's the --mirror option to much of git's plumbing, and it's not the same thing.

jlgreco · on March 24, 2013

I think you should re-read chris_wot's post.

> Mostly it's because they have decided that RAID 1 or RAID 5 should be a decent "backup" strategy, but then there are those who believe mirroring systems is how to do backups.

I think he is implying that this is a case of the second. That is to say, I think he is saying that --mirror is not a backup strategy.

chris_wot · on March 24, 2013

That's correct. Yes, I read the article :-)

jefferai · on March 24, 2013

You didn't. Mirroring in this case refers to using git --mirror.

You're assuming it works like a traditional file system or block level mirror, but it doesn't. Corruption would in most cases have been caught. The weak (and accidental) link was relying on the server to give us a proper accounting of the current valid repositories.

jlgreco · on March 24, 2013

> You didn't. Mirroring in this case refers to using git --mirror.

We have established that he knows they used git --mirror, and I am pretty certain that you could not possibly know that he did not read the article.

kzrdude · on March 24, 2013

still mirroring, still no old snapshots saved.

richo · on March 24, 2013

Snapshots are stupid in the case of a content addressable, immutable data store.

You're better off asserting that your objects haven't changed (Which they weren't, and I agree that they should have been) and were valid in the first place (See above).

With snapshots, you'd invevitably want to dedupe them, which would be basically the same thing since it's append only, but with the dedupe infrastructure as another failure point.

XorNot · on March 24, 2013

If you're using copy-on-write snapshots, then the total size of your snapshots should be small since most of the data in said immutable content store never changes. But the benefit is a bit-error between one mirroring operation and the next doesn't overwrite your unchanged, good data on the slaves.

The problem I think needs more attention here is ext4 silently corrupting data. ZFS has it exactly right with the built-in checksumming on write and read - it can't stop a disk going bad, but it can tell you exactly what's affected and _when_ - corruption would've been caught the moment the mirroring operation tried to read back bad data (and would've faulted the process, rather then happily return bad data).

kzrdude · on March 24, 2013

The point of a backup is to have redundancy, if backups are too integrated with their target, it becomes one complicated system together (that needs backup). This holds as a general rule.

marcosdumay · on March 24, 2013

> Snapshots are stupid in the case of a content addressable, immutable data store.

Well, the KDE incident proves otherwise.

mpyne · on March 24, 2013

Well, not necessarily. The issue is that filesystem corruption lead to undetected Git repository corruption, which is what made it possible to push corrupted repos to the mirrors.

It would have been just as easy to push those corrupted repos to all of the backup tapes in the rotating snapshot set. A snapshotting filesystem could be a good backup (and seems to be what one of the sysadmins is pushing for).

But even more important is to fail fast and identify git repo corruption as soon as it can be detected so that further damage can be avoided.

rciorba · on March 24, 2013

Git is not an immutable data store. The refs are very mutable and change-sets get garbage collected.

mpyne · on March 24, 2013

The KDE sysadmins are well aware of that, at least. Mutable operations that would leave dangling blobs cause a backup copy of the appropriate ref to be generated before the force-push/branch-deletion/etc. are run so that there's nothing for git to garbage collect.

richo · on March 24, 2013

Yeah, if you're incapable of accepting that complicated scenario is complicated.

The next two paragraphs identified two things that they weren't doing that they should have been. Otherwise they'd just have lots of snapshots of bad data.

DougBTX · on March 24, 2013

For everything with a sha1 hash, I see where you're coming from. And most of the data in a repo is covered by them. But things like tags, branches and reflogs don't themselves have hashes, they are just metadata referencing content in the append-only store. It sounds like they were backing up their reflogs, which is great, so they could recover if a user, say, accidentally deleted all the branches off the central server.

chris_wot · on March 24, 2013

You really aught to reread my comment. I mentioned RAID because it's the most common form of this mistake.

Let me repeat what I said, italicizing the important parts:

Mostly it's because they have decided that RAID 1 or RAID 5 should be a decent "backup" strategy,* but then there are those who believe mirroring systems is how to do backups.