Currently one can change the compression setting on a dataset and this will compress new blocks using the new algorithm. This works perfectly fine for many people during normal use.
However, there are 3 scenario's where we would want an easy way to recompress a complete dataset:
While it's perfectly possible to send data to a new dataset and thus trigger a recompression, this has a few downsides:
A prefered way to handle this would be a features which recompresses current data on the drive, "in the background", just like a scrub or resilver. this also has the added benefid of making us able to force it if we depricate/replace/remove an algorithm.
This feature would enable us to go byond the requested deprication in #9761.
How do you plan to implement this? What is to happen to snapshots?
Recv already does this.
@scineram Snapshots would be a problem indeed.
I don't have a "plan" to implement this, otherwise I wouldn't file an issue ;)
How to you suggest to do future removal of compression algorithms and zero-downtime change of on-disk compression otherwise, I don't think recv covers this usecase, or does it?
If so: Where is the documentation about using recv in this way?
it would have a very low downtime ofcourse...
this requires block pointer rewrite
@richardelling Precisely, I didn't say it was going to be easy ;)
this requires block pointer rewrite
I personally would be fine if this feature initially behaved like/leveraged an auto-resumed local send/receive and some clone/upgrade-like switcheroo (and obeyed the same constraints, if unavoidable even temporarily using twice the required storage of the dataset being 'transformed') in the background with the user interface of a scrub (i.e. trigger it through a zfs subcommand, appears in zfs?/zpool status, gets resumed after reboots, can me paused, stopped, etc.).
The applications for this go beyond just applying a different compression algorithm:
One could hack something like this together using zfs send/recv, it'd probably involve a clone receive and some upgrade shenanigans, but it would definitely not be the same as having a canonical zfs subcommand with the above-mentioned UX; especially since it would somewhat cleanly resolve some "please unshoot my foot" situations that inexperienced and/or sleep deprived users might get themselves into, for example choosing the wrong compression algorithm/level a year before realizing it, without the need to figure out and possibly script (recursive) zfs send and receive.
Also, zfs is probably in a better position to do a much cleaner in-place swap of the two versions of the dataset when the 'rewrite' is done, probably like a snapshot rollback, and will most likely not forget to delete the old version afterwards, unlike my hacky scripts, which break all the time. 馃槈
Future future work ideas:
-o encryption=on from off might be a useful thing to support, now that we allow unencrypted children@InsanePrawn
One could hack something like this together using zfs send/recv, it'd probably involve a clone receive and some upgrade shenanigans, but it would definitely not be the same as having a canonical zfs subcommand with the above-mentioned UX
Yes, thats mostly the point... I think more advanced users can do things that get pretty close (and pretty hacky), but creating it to be "as easy as possible" for the median user was the goal of my feature request...
@InsanePrawn, given enough space, yes, a transparent ZFS send/receive would be a way to go. All new writes go to the new dataset, and any read not yet available in the new dataset would fall back to the old dataset. Whence the entire dataset is received, the old dataset is destroyed.
Theoretically, we could almost do it without enough space for the whole dataset. Whence one file is entirely copied to the new dataset, the file could be deleted from the source dataset.
If something like this were implemented, a resume after Zpool export would also have to be part of the work. Otherwise, the pool would remain in a partially migragted state.
This does have the advantage of re-striping the data. Simple example, you have 1 vDev and when it get fullish, you add a second vDev. The data from the first, (if not changed), remains only on the first vDev. Even newly written data may have to favor the second vDev as it has the most free space. Something like suggested above can help balance data, even if we don't need to change checksum, compress or encryption algorythms.
Back to reality, snapshots & possibly even bookmarks would be a problem. Even clones of snapshots that reference the old dataset would still reference the old data & metadata, (be it compression, checksum or encryption changes).
I think a simple "reseat" of a file/dir interface would be the most practical. I.e. an operation that did this transparently:
cp A TMP
rm A
mv TMP A
Perhaps not the easiest to implement. Lustre has a similar feature called "migrate", which is more about re-stripping data.
Snapshots etc should just keep referencing the old data.
Most helpful comment
I personally would be fine if this feature initially behaved like/leveraged an auto-resumed local send/receive and some clone/upgrade-like switcheroo (and obeyed the same constraints, if unavoidable even temporarily using twice the required storage of the dataset being 'transformed') in the background with the user interface of a scrub (i.e. trigger it through a
zfssubcommand, appears inzfs?/zpoolstatus, gets resumed after reboots, can me paused, stopped, etc.).The applications for this go beyond just applying a different compression algorithm:
One could hack something like this together using zfs send/recv, it'd probably involve a clone receive and some upgrade shenanigans, but it would definitely not be the same as having a canonical zfs subcommand with the above-mentioned UX; especially since it would somewhat cleanly resolve some "please unshoot my foot" situations that inexperienced and/or sleep deprived users might get themselves into, for example choosing the wrong compression algorithm/level a year before realizing it, without the need to figure out and possibly script (recursive) zfs send and receive.
Also, zfs is probably in a better position to do a much cleaner in-place swap of the two versions of the dataset when the 'rewrite' is done, probably like a snapshot rollback, and will most likely not forget to delete the old version afterwards, unlike my hacky scripts, which break all the time. 馃槈
Future future work ideas:
-o encryption=onfromoffmight be a useful thing to support, now that we allow unencrypted children-> A future鲁 pr might add a way to migrate between crypto ciphers.