The interesting thing about erasure codes is that you need to checksum your shards independently from the EC itself. If you supply corrupted or wrong shards, you get corrupted data back.
I think for backup (as in small-scale, "fits on one disk") error-correcting codes are not a really good approach, because IME hard disks with one error you notice usually have made many more errors - or will do so shortly. In that case no ECC will help you. If, on the other hand, you're looking at an isolated error, then only very little data is affected (on average).
For example, a bit error in a chunk in a tool like borg/restic will only break that chunk; a piece of a file or perhaps part of a directory listing.
So for these kinds of scenarios "just use multiple backup drives with fully independent backups" is better and simpler.
For small scale, use Dropbox or Google Drive, or whatever, because for small scale the most important part of backup is actually reliably having it done. If you rely on manual process, you're doomed. :)
For large scale in house things: Ceph regurarly does scrubbing of the data. (Compares checksums.) and DreamHost has DreamObjects.
Thanks for mentioning borg/restic, I have never heard of them. (rsnapshot [rsync] works well, but it's not so shiny) Deduplication sounds nice. (rsnapshot uses hardlinks.)
That made me look for something btrfs based, and here's this https://github.com/digint/btrbk seems useful (send btrfs snapshots to a remote somewhere, also can be encrypted), could be useful for small setups.
I think rsync/rsnapshot aren't really appropiate for backups:
(1) They need full support for all FS oddities (xattrs, rforks, acls etc.) wherever you move the data
(2) They don't checksum the data at all.
The newer tools don't have either problem that much: For (1) they pack/unpack these in their own format which doesn't need anything special, so if you move your data twice in a circle you won't lose any (but their support for strange things might not be as polished as e.g. rsync's or GNU coreutils). And for deduplication they have to do (2) with cryptographic hashes.
However (as an ex-dev of one of these) they all have one or the other problem/limitations that won't go away. (Borg has its cache and weak encryption, restic iirc has difficult-to-avoid performance problems with large trees etc.)
Something that nowadays might also need to be discussed is if and how vulnerable your on-line backup is against BREACH-like attacks. E.g. .tar.gz is pretty bad there.
Hm, rsync does MD5 checking automatically. Which doesn't do much against bitrot [0], but it should help with the full circle thing. (And maybe it'll be SHA256+ in newer versions? Though there's not even a ticket in their bugzilla about this. And maybe MD5 is truly enough against random in-transit corruption.)
Yeah, crypto is something that doesn't play well with dedupe, especially if you don't trust the target backup server.
Uh, BREACH was a beast (he-he). I'm a bit still uneasy after thinking about how long these bugs were lurking in OpenSSL. Thankfully the splendid work of Intel engineers quickly diverted the nexus of our bad feels away from such high level matters :|
[0] That's something that the btrfs/ZFS/Ceph should/could fix. (And btrfs supports incremental mode for send+receive.)
I think for backup (as in small-scale, "fits on one disk") error-correcting codes are not a really good approach, because IME hard disks with one error you notice usually have made many more errors - or will do so shortly. In that case no ECC will help you. If, on the other hand, you're looking at an isolated error, then only very little data is affected (on average).
For example, a bit error in a chunk in a tool like borg/restic will only break that chunk; a piece of a file or perhaps part of a directory listing.
So for these kinds of scenarios "just use multiple backup drives with fully independent backups" is better and simpler.