This issue is created to track the proposal from the OpenZFS Developer Summit 2018 that dedup send could be deprecated and removed from the filesystem. For full context, watch the presentation at https://youtu.be/WvBURTKUy1o?t=5177 . A shortened version of the argument is:
When considered as a group, these issues with the feature make it a solid candidate for removal from ZFS. This thread is intended to be a hub for discussion and comments around this proposal, as well as to track its progress.
This is a significant milestone not only because we're potentially removing this feature from ZFS, but because we're potentially removing a feature at all. No user-facing feature has ever been removed from ZFS in this manner before, and the protocol undertaken as part of this effort could define how we remove features in the future. And the ability to reduce code complexity and volume is an important part of any open source project, so we need a plan for this at some point.
The proposed next step is that a warning would be added to the man page for ZFS and printed out whenever dedup send is used as soon as possible. The actual work of removing functionality would be delayed for a significant time to allow end users and vendors to accommodate the change. This timeline is just a proposal, however. We want input from all interested parties on the right way to proceed with this, or even whether to proceed with this. We especially want to hear from users and vendors that take advantage of this feature, and what their use case is.
I've never used dedup send/recv, for our setup raw send of compressed data is far more relevant.
I wouldn't mind if it went away - and I bet I'm not alone. I'd be interested to hear about what other people use it for - specifically, how big the datasets are and how much time/bandwidth does it save in those cases.
I'd love to see this go away. I think maintaining receivability of dedup streams by "re-duping" them in userland would be a great way to get 99% of the benefit while not actually abandoning this aspect of the send stream format.
I have made an argument in 2016 to deprecate the current use of -r and -R on zfs rollback (to replace it with a real recursive flag, after it being non-functional for long enough so unintended use won't happen).
The main argument against this was:
There really is no "reasonable length of time" for phase 2 or 3 of this plan. The "zfs" and "zpool" commands are a committed, documented interface, which people are allowed (and encouraged!) to depend on.
It's unfortunate when design decisions get made that turn out to be suboptimal in hindsight, but these are lessons to be observed when making future design decisions rather than opportunities to make breaking changes.
Along that argument: should dedup send be removed as a feature the -D switch must be kept, even when turned into a NOP.
Nevertheless:
I would prefer (instead of having to drag every wart along for eternity) ZFS having a clear deprecation process that enables to remove _any_ bad (in hindsight) design decision - certainly over a reasonable timespan (and with warnings to users of the feature in question) to not cause problems in production.
ZFS can't be the last word in filesystems (google search) - or in anything - as long as it's unable to rid itself of diagnosed problems.
I think there's a substantive difference between this proposal (which at a high level would "just" reduce the dedup ratio achieved by zfs send -D), and the previous proposal which would eventually make zfs rollback -r/-R destroy different data than it used to. I'd like to redirect discussion of zfs rollback to a different channel so that we can concentrate on dedup send/recv in this issue. I've just followed up on the rollback mailing list thread.
I agree that zfs send -D should continue to be a valid argument, and at some point it would just cause a warning message to be printed, and no actual deduping to happen.