Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> "not all file formats are suitable for long term preservation, even if they have an open specification. Some lossy and compressed file formats pose a higher risk of total loss if even a single bit is lost."

Wouldn't this issue apply more to OpenDocument, which is compressed (into a ZIP archive), than SQLite, which (at least by default) is not?

But then, I question the appropriateness of the advice. If you're serious about archiving, you should be using error-correcting codes in some form, so that the archived data will remain recoverable bit-for-bit even with a large number of bit errors in the underlying medium. To be honest, I'm not that familiar with long-term archiving practices, but if you have some kind of RAID setup, that should give you both error correction (for bit errors) and drive redundancy (for loss of entire drives). Alternatively, you could use dedicated ECC tools like par2.

True, most data that gets preserved will probably be preserved by chance, by people who are not serious about archiving, and may not take sufficient steps to prevent errors. But they're also not going to choose a format for optimal archiving either, so you're kind of stuck with the fact that many modern file formats have built-in compression and/or checksums, and thus don't hold up well when corrupted. We could keep the issue in mind when designing new formats, but is resilience to corruption really worth the additional storage cost of leaving data uncompressed? Or perhaps we could design formats to have built-in ECC instead of just checksums, but that would also waste space...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: