How does the deduplication work, in terms of user interface and performance? For...

antongribok · on Aug 22, 2017

I believe the correct terms are inline vs. post-processing deduplication.

There are pros and cons to each approach, and as usual with storage, a lot depends on implementation and workload.

Dylan16807 · on Aug 22, 2017

> I believe the correct terms are inline vs. post-processing deduplication.

Thanks.

> There are pros and cons to each approach, and as usual with storage, a lot depends on implementation and workload.

I would probably argue that the biggest factors in different performance between ZFS and BTRFS deduplication are nearly independent of when the deduplication happens, and boil down to implementation decisions specific to those two filesystems.

CogitoCogito · on Aug 22, 2017

If you happen to know, would the right comparison to elsewhere in computing be between something like reference count garbage collection (ala python ignoring its cycle detector) versus a sweeping garbage collector?

snovv_crash · on Aug 22, 2017

The term online and offline are also correct, although they tend to be used in the machine learning field more.

antongribok · on Aug 22, 2017

Offline is a completely different term when speaking about filesystems. Offline means that the filesystem needs to be unmounted before an operation can proceed.

For example some filesystems can be grown online, but can shrink only offline.

Dylan16807 · on Aug 22, 2017

The specific term "offline deduplication" does not mean you unmount the filesystem. Perhaps it's a bad term, but it exists.