The interesting thing about erasure codes is that you need to checksum your shar...

nemo1618 · on April 20, 2018

Vivint published a Go package a while back that does both Reed Solomon and Forward Error Correction: https://innovation.vivint.com/introduction-to-reed-solomon-b...

pas · on April 20, 2018

For small scale, use Dropbox or Google Drive, or whatever, because for small scale the most important part of backup is actually reliably having it done. If you rely on manual process, you're doomed. :)

For large scale in house things: Ceph regurarly does scrubbing of the data. (Compares checksums.) and DreamHost has DreamObjects.

Thanks for mentioning borg/restic, I have never heard of them. (rsnapshot [rsync] works well, but it's not so shiny) Deduplication sounds nice. (rsnapshot uses hardlinks.)

That made me look for something btrfs based, and here's this https://github.com/digint/btrbk seems useful (send btrfs snapshots to a remote somewhere, also can be encrypted), could be useful for small setups.

blattimwind · on April 20, 2018

I think rsync/rsnapshot aren't really appropiate for backups:

(1) They need full support for all FS oddities (xattrs, rforks, acls etc.) wherever you move the data

(2) They don't checksum the data at all.

The newer tools don't have either problem that much: For (1) they pack/unpack these in their own format which doesn't need anything special, so if you move your data twice in a circle you won't lose any (but their support for strange things might not be as polished as e.g. rsync's or GNU coreutils). And for deduplication they have to do (2) with cryptographic hashes.

However (as an ex-dev of one of these) they all have one or the other problem/limitations that won't go away. (Borg has its cache and weak encryption, restic iirc has difficult-to-avoid performance problems with large trees etc.)

Something that nowadays might also need to be discussed is if and how vulnerable your on-line backup is against BREACH-like attacks. E.g. .tar.gz is pretty bad there.

pas · on April 20, 2018

Hm, rsync does MD5 checking automatically. Which doesn't do much against bitrot [0], but it should help with the full circle thing. (And maybe it'll be SHA256+ in newer versions? Though there's not even a ticket in their bugzilla about this. And maybe MD5 is truly enough against random in-transit corruption.)

Yeah, crypto is something that doesn't play well with dedupe, especially if you don't trust the target backup server.

Uh, BREACH was a beast (he-he). I'm a bit still uneasy after thinking about how long these bugs were lurking in OpenSSL. Thankfully the splendid work of Intel engineers quickly diverted the nexus of our bad feels away from such high level matters :|

[0] That's something that the btrfs/ZFS/Ceph should/could fix. (And btrfs supports incremental mode for send+receive.)