How does the deduplication work, in terms of user interface and performance?
For comparison:
ZFS has an online-only dedup, so it can save space as data is written, but it can't combine identical pieces of already-written data. It also scatters the segments of files to the winds, needing lots of ram and fast disks or it slows to a crawl.
BTRFS has mostly-offline dedup, so it doesn't save space as data is written, but you can combine identical pieces of data later. You can also make CoW copies of files instantly. It has minimal performance impact.
> I believe the correct terms are inline vs. post-processing deduplication.
Thanks.
> There are pros and cons to each approach, and as usual with storage, a lot depends on implementation and workload.
I would probably argue that the biggest factors in different performance between ZFS and BTRFS deduplication are nearly independent of when the deduplication happens, and boil down to implementation decisions specific to those two filesystems.
If you happen to know, would the right comparison to elsewhere in computing be between something like reference count garbage collection (ala python ignoring its cycle detector) versus a sweeping garbage collector?
Offline is a completely different term when speaking about filesystems. Offline means that the filesystem needs to be unmounted before an operation can proceed.
For example some filesystems can be grown online, but can shrink only offline.
For comparison:
ZFS has an online-only dedup, so it can save space as data is written, but it can't combine identical pieces of already-written data. It also scatters the segments of files to the winds, needing lots of ram and fast disks or it slows to a crawl.
BTRFS has mostly-offline dedup, so it doesn't save space as data is written, but you can combine identical pieces of data later. You can also make CoW copies of files instantly. It has minimal performance impact.