Zfs: 'zfs send' aborts on any signal

Created on 7 Jul 2012 · 15Comments · Source: openzfs/zfs

zfs send should probably not abort when it receives a signal unless the signal is one of the default-fatal signals.

For example, start a send and pipe it however you want. Press CTRL+Z to pause. Type "fg" to resume. The send immediately aborts.

# zfs send poolname/fsname@today | bzip2 -9 > /external/poolname-today.bz2
^Z
[1]+  Stopped     zfs send poolname/fsname@today | bzip2 -9 > /external/poolname-today.bz2
# fg
zfs send poolname/fsname@today | bzip2 -9 > /external/poolname-today.bz2
warning: cannot send 'poolname/fsname@today': signal received
#

In this situation the send should probably not abort, or maybe the signal should be ignored entirely. It makes sense for other signals such as SIGTERM to cause an abort though.

Inactive Stale Defect

Source

DeHackEd

👍1

Most helpful comment

This should be resolved in OpenZFS by #10843.

behlendorf on 11 Nov 2020

🎉3

All 15 comments

Indeed, the signal handlers seem overly broad here.

behlendorf on 10 Jul 2012

@behlendorf Should we add some (void) sigignore(SIGxxx); in zfs_do_send()? Which ones should we ignore in that case?

FransUrbo on 8 Jun 2014

@FransUrbo Good question. Someone would probably need to spend some time investigating that. Either we could just ignore the relevant signals or add handlers for them if we want/need to do something more clever.

behlendorf on 10 Jun 2014

I just got hit by this being half way in to what seemed like it was going to be a 60 hour transfer.. seriously this needs to be fixed or at least documented. Sending Ctrl-Z and fg/bg should do what it does for every other sane process...

ioquatix on 21 Jun 2014

this happened to me when I attached with strace to the zfs send process. on ctrl-c to stop strace, zfs exited with message described in first post

mailinglists35 on 8 Mar 2016

I just ran into this today trying to use kill -STOP / kill -CONT on ZFS send.

I don't know if there's an architectural reason why this wouldn't work (i.e. zfs receive must time out if no data for X seconds), but if not, it would be nice to be able to use STOP/CONT signals to control IO load.

jonathanvaughn on 20 Sep 2016

There is a dirty workaround: send the signal only to another process involved in the processing. If you use mbuffer, pv or similar, SIGSTOP it only.

Typically works unless you run zfs send ... | zfs receive ... since here there's nothing suitable there. So always use a buffer app. :)

DeHackEd on 20 Sep 2016

👍2

This should be easy to fix.. where in the code do we need to fix it? Can someone point me in the general direction?

ioquatix on 20 Sep 2016

I didn't think to try stopping pv (I was piping send | pv | receive) since I assumed send would just keep sending it data until we ran out of memory or something. That's a great work-around for now if it works (I'll try it next time I'm moving data around).

jonathanvaughn on 20 Sep 2016

@ioquatix it would be great if you could dig in to this. My first guess would be that zfs_ioc_send() is returning EINTR possibly do to the write() call getting interrupted in dump_record(). It looks like we might need to modify the logic right here for Linux.

Actually this may be as straight forward as fully implementing issig() in the spl. Right now it simply returns if any signal is pending and ignores the why flag which is passed in.

zfs_ioc_send()
  dmu_send_obj()
    dmu_send_impl()
      do_dump()

behlendorf on 21 Sep 2016

For the record, I can confirm that kill -STOP/CONT on a process piped between send and receive (i.e. pv in my case, mbuffer, etc) works fine as a work around for now.

jonathanvaughn on 21 Sep 2016

zfs-0.7.12

Sigh - I just got bit by this tonight with a ctrl-z and then a fg to restart the process. Is there any movement on a solution being this issue is over 6 years old now. zfs send/receive should have sane handling for STOP/CONT signals.

FWIW - I would consider this a bug, because it's certainly not a feature and was surprising.

TerraTech on 3 Feb 2019

👍1

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.