Hello,
First a big thanks for your work and the great functionalities you brings to linux.
Type | Version/Name
--- | ---
Distribution Name | Centos
Distribution Version | 7
Linux Kernel | 3.10.0-327.10.1.el7.x86_64
Architecture | x86_64
ZFS Version | 0.6.5.6-1
SPL Version | 0.6.5.6-1
When receiving a stream, the zfs process stucks at 100%, for weeks (waited more than 24 days), without any errors on dmesg/messages/tty. I tried receiving with the 0.6.5.8, same happens. I tried on FreeBSD 9.3, same happens. I can use zfs on other pools/fs, but no way to kill the zfs process, in R state.
Tried with and without selinux, no difference.
Tried to export the source dataset again, same result.
I exported a jail created by FreeNAS, and imported with -udv.
Main question: does any special file attribute in the unmounted data receiving operation can lead to this situation ?
Second question: is there any way to receive the other datasets in the stream ? I tried to manually create a local dataset with the same name, but any failure will stop receiving, ignoring the subsequents datasets.
The strace result:
...
write(1, "received 0B stream in 1 seconds "..., 41) = 41
ioctl(3, 0x5a12, 0x7fff3b5a0870) = 0
read(0, "\0\0\0\0\0\0\0\0\254\313\272\365\2\0\0\0\21\0\0\0\0\0\0\0\356\330lX\0\0\0\0"..., 312) = 312
read(0, "", 0) = 0
ioctl(3, 0x5a12, 0x7fff3b5a0870) = 0
ioctl(3, 0x5a12, 0x7fff3b5a0870) = -1 ENOENT (No such file or directory)
ioctl(3, 0x5a12, 0x7fff3b5a0420) = 0
ioctl(3, 0x5a12, 0x7fff3b5a0860) = 0
write(1, "receiving full stream of backup/"..., 123) = 123
ioctl(3, 0x5a1b
Stops there....
When receiving a stream, the zfs process stucks at 100%, for weeks (waited more than 24 days), without any errors on dmesg/messages/tty.
@nfinformatique which version of FreeNAS is/was used to generate the stream? IIRC a similar problem has been reported to the illumos-discuss mailing list (title "ZFS recv hangs"). Can you pipe the stream through zstreamdump and grep for "FREEOBJECTS"?
FreeNAS: FreeBSD HOSTNAME 10.3-STABLE FreeBSD 10.3-STABLE
cat stream | zstreamdump | grep FREEOBJECTS
Total DRR_FREEOBJECTS records = 4
The full stream dump information (it is a full send, no increment)
BEGIN record
hdrtype = 1
features = 30007
magic = 2f5bacbac
creation_time = 586cd8ee
type = 2
flags = 0x4
toguid = d82dd66022fb32f3
fromguid = 0
toname = backup/jails/.XXX@clean
END checksum = 89615b28a8fa539/1c0f0949fe8388e6/5aec4c0ad9c876bf/4daa1504ad6ff08b
SUMMARY:
Total DRR_BEGIN records = 1
Total DRR_END records = 1
Total DRR_OBJECT records = 258577
Total DRR_FREEOBJECTS records = 4
Total DRR_WRITE records = 215675
Total DRR_WRITE_BYREF records = 8830
Total DRR_WRITE_EMBEDDED records = 39090
Total DRR_FREE records = 262463
Total DRR_SPILL records = 0
Total records = 784641
Total write size = 1982718464 (0x762de200)
Total stream length = 2274396968 (0x87908b28)
@nfinformatique It would seems that the stable branch of freebsd10 did indeed get OpenZFS 6393 (_zfs receive a full send as a clone_):
https://github.com/freenas/os/commits/freebsd10-stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c
Try again with zstreamdump -v, look for something like:
FREEOBJECTS firstobj = $some_number numobjs = $huge_number_here
Indeed, there is such huge number of free objects.
cat stream | zstreamdump -v | grep FREEOBJECTS
FREEOBJECTS firstobj = 0 numobjs = 1
FREEOBJECTS firstobj = 258578 numobjs = 14
FREEOBJECTS firstobj = 258592 numobjs = 36028797018705376
FREEOBJECTS firstobj = 0 numobjs = 0
Total DRR_FREEOBJECTS records = 4
Thank you very much for your help. I will manage to get a local up-to-date version to import properly.
I installed a temporary Ubuntu and compiled the last module (0.7.0-rc3), did a send/receive and it worked !
Thanks for saving my week !
@nfinformatique feel free to open the issue if you need. Closed.
@nfinformatique keep in mind that 0.7.0 is not recommended for production use yet.
Alternatively you'll want to send the data without freeobjects records, but you'll need OpenZFS 6536 (_zfs send: want a way to disable setting of DRR_FLAG_FREERECORDS_) for that, which i could only find in freebsd11 https://github.com/freenas/os/commit/d29ea6698cc7869406874fd2bafa12b223243fa1.
seems like updating to the freenas 9.10 nightlies (which are based on freebsd11 and should have 6536) and setting the tunable to 0 does not help.
at least, I still get a stream with a huge FREEOBJECTS record which hangs zfs recv under ZoL 0.6.5.9..
Bonjour,
J'ai bien reçu votre message, qui sera traité rapidement.
Veuillez néanmoins noter que l'adresse [email protected] sera bientôt désactivée. Vous devriez donc modifier votre carnet d'adresse pour la nouvelle adresse [email protected].
Je vous souhaite une excellente journée
Cordialement
Nicolas Frey
@Fabian-Gruenbichler i'm reading again the original issue, maybe the second part of the request was not implemented? The title was also changed from "_zfs send: want a way to disable sending of free records_" to "_zfs send: want a way to disable setting of DRR_FLAG_FREERECORDS_".
It seems that it might also be desireable to have support for generating streams without free records for other reasons, such as in cases where we definitely know that the receiving end is not receiving the stream as a clone. An example of this might be when using some kind of automatic replication process. I propose that we add a new option to zfs send (-F) that turns off sending of free records.
It that's the case you have to use pre- OpenZFS 6393 version to send the stream: sorry about that.
I just hit this issue as well.
What is the fix for this?
I upgraded the sending side to 0.7.0 and re-run zfs send/receive and now I have two zfs receive processes stuck on my storage. I can't upgrade the receiving side.
Is there anything to at least kill those processes? Will they ever finish?
Bonjour,
J'ai bien reçu votre message, qui sera traité rapidement.
Veuillez néanmoins noter que l'adresse [email protected] sera bientôt désactivée. Vous devriez donc modifier votre carnet d'adresse pour la nouvelle adresse [email protected].
Je vous souhaite une excellente journée
Cordialement
Nicolas Frey
FYI a pool I created with 0.7.0 to get around this issue got into problems when run with 0.6.5.9 and 0.6.5.11 afterwards where it was not freeing space. (not even upgrading to 0.7.1 fixed it)
The obvious fix - do a zfs send/receive.
but
local zfs send/receive of the offending pool results in zfs recv stuck like before
Not sure if this was "expected", I did not expect it
Applying this patch on top of 0.6.5.11 fixed it (or at least it got further and is still running)
https://github.com/zfsonlinux/zfs/pull/6602/commits/397f816f43455cd63851036b663da5d83413ea9d
Bonjour,
J'ai bien reçu votre message, qui sera traité rapidement.
Veuillez néanmoins noter que l'adresse [email protected] sera bientôt désactivée. Vous devriez donc modifier votre carnet d'adresse pour la nouvelle adresse [email protected].
Je vous souhaite une excellente journée
Cordialement
Nicolas Frey
Argh, I just ran into this same bug, when doing incremental snapshot send/receive on a Mint 18.1 server (running an older 0.6 version of ZFS), reading from a zpool created on a Mint 19 box (using a newer 0.7 version of ZFS). The ZFS receive process hung with 100% CPU usage, and could not be killed even with signal 9. The only way to get out of this was to reboot the server.
MORAL: on hosts with older versions of ZFS, do not try to read zpools created by newer versions of ZFS.
Bonjour,
J'ai bien reçu votre message, qui sera traité rapidement.
Veuillez néanmoins noter que l'adresse [email protected] sera bientôt désactivée. Vous devriez donc modifier votre carnet d'adresse pour la nouvelle adresse [email protected].
Je vous souhaite une excellente journée
Cordialement
Nicolas Frey