zfs send should probably not abort when it receives a signal unless the signal is one of the default-fatal signals.
For example, start a send and pipe it however you want. Press CTRL+Z to pause. Type "fg" to resume. The send immediately aborts.
# zfs send poolname/fsname@today | bzip2 -9 > /external/poolname-today.bz2 ^Z [1]+ Stopped zfs send poolname/fsname@today | bzip2 -9 > /external/poolname-today.bz2 # fg zfs send poolname/fsname@today | bzip2 -9 > /external/poolname-today.bz2 warning: cannot send 'poolname/fsname@today': signal received #
In this situation the send should probably not abort, or maybe the signal should be ignored entirely. It makes sense for other signals such as SIGTERM to cause an abort though.
Indeed, the signal handlers seem overly broad here.
@behlendorf Should we add some (void) sigignore(SIGxxx); in zfs_do_send()? Which ones should we ignore in that case?
@FransUrbo Good question. Someone would probably need to spend some time investigating that. Either we could just ignore the relevant signals or add handlers for them if we want/need to do something more clever.
I just got hit by this being half way in to what seemed like it was going to be a 60 hour transfer.. seriously this needs to be fixed or at least documented. Sending Ctrl-Z and fg/bg should do what it does for every other sane process...
this happened to me when I attached with strace to the zfs send process. on ctrl-c to stop strace, zfs exited with message described in first post
I just ran into this today trying to use kill -STOP / kill -CONT on ZFS send.
I don't know if there's an architectural reason why this wouldn't work (i.e. zfs receive must time out if no data for X seconds), but if not, it would be nice to be able to use STOP/CONT signals to control IO load.
There is a dirty workaround: send the signal only to another process involved in the processing. If you use mbuffer, pv or similar, SIGSTOP it only.
Typically works unless you run zfs send ... | zfs receive ... since here there's nothing suitable there. So always use a buffer app. :)
This should be easy to fix.. where in the code do we need to fix it? Can someone point me in the general direction?
I didn't think to try stopping pv (I was piping send | pv | receive) since I assumed send would just keep sending it data until we ran out of memory or something. That's a great work-around for now if it works (I'll try it next time I'm moving data around).
@ioquatix it would be great if you could dig in to this. My first guess would be that zfs_ioc_send() is returning EINTR possibly do to the write() call getting interrupted in dump_record(). It looks like we might need to modify the logic right here for Linux.
Actually this may be as straight forward as fully implementing issig() in the spl. Right now it simply returns if any signal is pending and ignores the why flag which is passed in.
zfs_ioc_send()
dmu_send_obj()
dmu_send_impl()
do_dump()
For the record, I can confirm that kill -STOP/CONT on a process piped between send and receive (i.e. pv in my case, mbuffer, etc) works fine as a work around for now.
zfs-0.7.12
Sigh - I just got bit by this tonight with a ctrl-z and then a fg to restart the process. Is there any movement on a solution being this issue is over 6 years old now. zfs send/receive should have sane handling for STOP/CONT signals.
FWIW - I would consider this a bug, because it's certainly not a feature and was surprising.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
This should be resolved in OpenZFS by #10843.
Better late than never! Well done and thank you :)
Most helpful comment
This should be resolved in OpenZFS by #10843.