Type | Version/Name
--- | ---
Distribution Name | Arch Linux
Distribution Version | Rolling release / latest
Linux Kernel | 5.3.8
Architecture | x86_64
ZFS Version | 0.8.2
SPL Version | 0.8.2
EDIT: Running zfs mount -a multiple times when the mountpoint is not empty causes a double free / segfault.
EDIT: See the comment further down. https://github.com/zfsonlinux/zfs/issues/9560#issuecomment-551219341
I'm seeing two different kinds of coredumps.
Type 1:
PID: 1415 (zfs)
UID: 0 (root)
GID: 0 (root)
Signal: 11 (SEGV)
Timestamp: Thu 2019-11-07 12:50:21 CET (1h 47min ago)
Command Line: zfs mount -a
Executable: /usr/bin/zfs
Control Group: /user.slice/user-1000.slice/session-1.scope
Unit: session-1.scope
Slice: user-1000.slice
Session: 1
Owner UID: 1000 (xxxxxxxxx)
Boot ID: aba23826a7c54c36856aadbbbf311b65
Machine ID: b7c6e36412f14ccdbb6e8ae854ae3ee2
Hostname: xxxxxxxxxx
Storage: /var/lib/systemd/coredump/core.zfs.0.aba23826a7c54c36856aadbbbf311b65.1415.1573127421000000000000.lz4 (inaccessible)
Message: Process 1415 (zfs) of user 0 dumped core.
Stack trace of thread 1417:
#0 0x00007f0c495de560 __memchr_avx2 (libc.so.6)
#1 0x00007f0c494f74fc _IO_getline_info (libc.so.6)
#2 0x00007f0c495011c5 fgets_unlocked (libc.so.6)
#3 0x00007f0c49578e0e __getmntent_r (libc.so.6)
#4 0x00007f0c496f1f45 _sol_getmntent (libuutil.so.1)
#5 0x00007f0c496f1fd4 getmntany (libuutil.so.1)
#6 0x00007f0c49689930 libzfs_mnttab_find (libzfs.so.2)
#7 0x00007f0c49695c7f is_mounted (libzfs.so.2)
#8 0x0000561d088c6913 n/a (zfs)
#9 0x0000561d088c7efc n/a (zfs)
#10 0x00007f0c49695acc n/a (libzfs.so.2)
#11 0x00007f0c496ca092 n/a (libzfs.so.2)
#12 0x00007f0c496514cf start_thread (libpthread.so.0)
#13 0x00007f0c495802d3 __clone (libc.so.6)
Stack trace of thread 1416:
#0 0x00007f0c49503466 _IO_file_underflow@@GLIBC_2.2.5 (libc.so.6)
#1 0x00007f0c49504676 _IO_default_uflow (libc.so.6)
#2 0x00007f0c494f753c _IO_getline_info (libc.so.6)
#3 0x00007f0c495011c5 fgets_unlocked (libc.so.6)
#4 0x00007f0c49578e0e __getmntent_r (libc.so.6)
#5 0x00007f0c496f1f45 _sol_getmntent (libuutil.so.1)
#6 0x00007f0c496f1fd4 getmntany (libuutil.so.1)
#7 0x00007f0c49689930 libzfs_mnttab_find (libzfs.so.2)
#8 0x00007f0c49695c7f is_mounted (libzfs.so.2)
#9 0x0000561d088c6913 n/a (zfs)
#10 0x0000561d088c7efc n/a (zfs)
#11 0x00007f0c49695acc n/a (libzfs.so.2)
#12 0x00007f0c496ca092 n/a (libzfs.so.2)
#13 0x00007f0c496514cf start_thread (libpthread.so.0)
#14 0x00007f0c495802d3 __clone (libc.so.6)
Stack trace of thread 1415:
#0 0x00007f0c49657c45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f0c496ca98e tpool_wait (libzfs.so.2)
#2 0x00007f0c496972e0 zfs_foreach_mountpoint (libzfs.so.2)
#3 0x0000561d088cb322 n/a (zfs)
#4 0x0000561d088be3b7 n/a (zfs)
#5 0x00007f0c494a8153 __libc_start_main (libc.so.6)
#6 0x0000561d088be4de n/a (zfs)
Stack trace of thread 1419:
#0 0x00007f0c49657c45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f0c496c9e24 n/a (libzfs.so.2)
#2 0x00007f0c496514cf start_thread (libpthread.so.0)
#3 0x00007f0c495802d3 __clone (libc.so.6)
Stack trace of thread 1418:
#0 0x00007f0c49657c45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f0c496c9e24 n/a (libzfs.so.2)
#2 0x00007f0c496514cf start_thread (libpthread.so.0)
#3 0x00007f0c495802d3 __clone (libc.so.6)
Type 2:
PID: 8301 (zfs)
UID: 0 (root)
GID: 0 (root)
Signal: 11 (SEGV)
Timestamp: Thu 2019-11-07 12:57:30 CET (1h 41min ago)
Command Line: zfs mount -a
Executable: /usr/bin/zfs
Control Group: /user.slice/user-1000.slice/session-3.scope
Unit: session-3.scope
Slice: user-1000.slice
Session: 3
Owner UID: 1000 (xxxxxxxx)
Boot ID: aba23826a7c54c36856aadbbbf311b65
Machine ID: b7c6e36412f14ccdbb6e8ae854ae3ee2
Hostname: xxxxxxxx
Storage: /var/lib/systemd/coredump/core.zfs.0.aba23826a7c54c36856aadbbbf311b65.8301.1573127850000000000000.lz4 (inaccessible)
Message: Process 8301 (zfs) of user 0 dumped core.
Stack trace of thread 8305:
#0 0x00007f4ffa0705a0 __malloc_arena_thread_freeres (libc.so.6)
#1 0x00007f4ffa1b54f5 start_thread (libpthread.so.0)
#2 0x00007f4ffa0e42d3 __clone (libc.so.6)
Stack trace of thread 8301:
#0 0x00007f4ffa2677e3 n/a (libnvpair.so.1)
#1 0x00007f4ffa26770a nvlist_free (libnvpair.so.1)
#2 0x00007f4ffa26770a nvlist_free (libnvpair.so.1)
#3 0x00007f4ffa1ecebd zfs_close (libzfs.so.2)
#4 0x00005652dd8f6345 n/a (zfs)
#5 0x00005652dd8e93b7 n/a (zfs)
#6 0x00007f4ffa00c153 __libc_start_main (libc.so.6)
#7 0x00005652dd8e94de n/a (zfs)
zfs --version to see if you have correct version of kernel module and userland utility.
$ zfs --version
zfs-0.8.2-1
zfs-kmod-0.8.2-1
Something is strange. zfs mount -a doesn't want to mount all datasets.
$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
zroot 30.5G 192G 96K /zroot
zroot/ROOT 28.9G 192G 96K none
zroot/ROOT/default 28.9G 192G 28.9G /
zroot/cache 1.58G 192G 1.58G /zroot/cache
zroot/home 96K 192G 96K /home
~ $ zfs mount -a
cannot mount '/home': directory is not empty
cannot mount '/zroot': directory is not empty
double free or corruption (fasttop)
[1] 2304 abort (core dumped) zfs mount -a
~ [134]$ zfs mount -a
cannot mount '/zroot':
cannot mount '/zroot': mount failed
~ [1]$
Also, I got another different core dump this time.
PID: 2304 (zfs)
UID: 1000 (xxxxxxx)
GID: 1000 (xxxxxxx)
Signal: 6 (ABRT)
Timestamp: Thu 2019-11-07 15:53:26 CET (5min ago)
Command Line: zfs mount -a
Executable: /usr/bin/zfs
Control Group: /user.slice/user-1000.slice/session-1.scope
Unit: session-1.scope
Slice: user-1000.slice
Session: 1
Owner UID: 1000 (xxxxxxx)
Boot ID: 5d11adef8b744cb9b3bf67dd33002281
Machine ID: b7c6e36412f14ccdbb6e8ae854ae3ee2
Hostname: xxxxxxx
Storage: /var/lib/systemd/coredump/core.zfs.1000.5d11adef8b744cb9b3bf67dd33002281.2304.1573138406000000000000.lz4
Message: Process 2304 (zfs) of user 1000 dumped core.
Stack trace of thread 2305:
#0 0x00007f1e887a2f25 raise (libc.so.6)
#1 0x00007f1e8878c897 abort (libc.so.6)
#2 0x00007f1e887e6258 __libc_message (libc.so.6)
#3 0x00007f1e887ed77a malloc_printerr (libc.so.6)
#4 0x00007f1e887ef295 _int_free (libc.so.6)
#5 0x00007f1e8896f820 libzfs_mnttab_fini (libzfs.so.2)
#6 0x00007f1e8896f9e1 libzfs_mnttab_find (libzfs.so.2)
#7 0x00007f1e8897bc7f is_mounted (libzfs.so.2)
#8 0x000055d8087ff913 n/a (zfs)
#9 0x000055d808800efc n/a (zfs)
#10 0x00007f1e8897bacc n/a (libzfs.so.2)
#11 0x00007f1e889b0092 n/a (libzfs.so.2)
#12 0x00007f1e889374cf start_thread (libpthread.so.0)
#13 0x00007f1e888662d3 __clone (libc.so.6)
Stack trace of thread 2306:
#0 0x00007f1e8885742c __read (libc.so.6)
#1 0x00007f1e887e9442 _IO_file_underflow@@GLIBC_2.2.5 (libc.so.6)
#2 0x00007f1e887ea676 _IO_default_uflow (libc.so.6)
#3 0x00007f1e887dd53c _IO_getline_info (libc.so.6)
#4 0x00007f1e887e71c5 fgets_unlocked (libc.so.6)
#5 0x00007f1e8885ee0e __getmntent_r (libc.so.6)
#6 0x00007f1e889d7f45 _sol_getmntent (libuutil.so.1)
#7 0x00007f1e889d7fd4 getmntany (libuutil.so.1)
#8 0x00007f1e8896f930 libzfs_mnttab_find (libzfs.so.2)
#9 0x00007f1e8897bc7f is_mounted (libzfs.so.2)
#10 0x000055d8087ff913 n/a (zfs)
#11 0x000055d808800efc n/a (zfs)
#12 0x00007f1e8897bacc n/a (libzfs.so.2)
#13 0x00007f1e889b0092 n/a (libzfs.so.2)
#14 0x00007f1e889374cf start_thread (libpthread.so.0)
#15 0x00007f1e888662d3 __clone (libc.so.6)
Stack trace of thread 2308:
#0 0x00007f1e8893dc45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f1e889afe24 n/a (libzfs.so.2)
#2 0x00007f1e889374cf start_thread (libpthread.so.0)
#3 0x00007f1e888662d3 __clone (libc.so.6)
Stack trace of thread 2307:
#0 0x00007f1e8893dc45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f1e889afe24 n/a (libzfs.so.2)
#2 0x00007f1e889374cf start_thread (libpthread.so.0)
#3 0x00007f1e888662d3 __clone (libc.so.6)
Stack trace of thread 2304:
#0 0x00007f1e8893dc45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f1e889b098e tpool_wait (libzfs.so.2)
#2 0x00007f1e8897d2e0 zfs_foreach_mountpoint (libzfs.so.2)
#3 0x000055d808804322 n/a (zfs)
#4 0x000055d8087f73b7 n/a (zfs)
#5 0x00007f1e8878e153 __libc_start_main (libc.so.6)
#6 0x000055d8087f74de n/a (zfs)
I'm now able to reproduce the issue on a different machine (the same system information from the first post still apply). This example is probably not minimal (I did the same error twice), but matches the procedure where the bug appeared the first time.
Boiled down: The bug seems to occur when a dataset cannot be mounted, because the mountpoint is not empty.
mkdir -p /tmp/zfs
truncate -s 1G /tmp/zfs/disk1
truncate -s 1G /tmp/zfs/disk2
zpool create -o ashift=12 tank mirror /tmp/zfs/disk1 /tmp/zfs/disk2
zpool export tank
zpool import -R /mnt -d /tmp/zfs tank
zfs create -o canmount=off tank/ROOT
zfs create -o canmount=on tank/ROOT/default
zfs set mountpoint=/ tank/ROOT/default
mkdir /mnt/home
dd if=/dev/urandom bs=32M count=4 of=/mnt/home/out.dat
zfs create tank/home
zfs set mountpoint=/home tank/home
mkdir /mnt/tank/cache
dd if=/dev/urandom bs=32M count=1 of=/mnt/tank/cache/out.dat
zfs set mountpoint=none tank/ROOT
zfs create tank/cache
zpool export tank
(I did a reboot here, not sure if this is necessary to reproduce)
zpool import -R /mnt -d /tmp/zfs tank
zfs mount -a
zfs mount -a
zfs mount -a
Output
/mnt # zfs mount -a
cannot mount '/mnt/home': directory is not empty
cannot mount '/mnt/tank/cache': directory is not empty
/mnt [1]# zfs mount -a
cannot mount '/mnt/home': directory is not empty
cannot mount '/mnt/tank/cache': directory is not empty
free(): double free detected in tcache 2
[1] 3701 abort (core dumped) zfs mount -a
/mnt [134]# zfs mount -a
cannot mount '/mnt/home': directory is not empty
cannot mount '/mnt/tank/cache': directory is not empty
[1] 4142 segmentation fault (core dumped) zfs mount -a
/mnt [139]# zfs mount -a
Coredump 1:
PID: 3701 (zfs)
UID: 0 (root)
GID: 0 (root)
Signal: 6 (ABRT)
Timestamp: Thu 2019-11-07 19:51:37 CET (11min ago)
Command Line: zfs mount -a
Executable: /usr/bin/zfs
Control Group: /user.slice/user-1000.slice/session-1.scope
Unit: session-1.scope
Slice: user-1000.slice
Session: 1
Owner UID: 1000 (xxxxxxxx)
Boot ID: d65d5dfe747745d69d0d9e1890a8feca
Machine ID: 8f0c7880d3ae4a0dbf4a428281f020ad
Hostname: xxxxxxxxx
Storage: /var/lib/systemd/coredump/core.zfs.0.d65d5dfe747745d69d0d9e1890a8feca.3701.1573152697000000000000.lz4
Message: Process 3701 (zfs) of user 0 dumped core.
Stack trace of thread 3706:
#0 0x00007f7cfa4c8f25 raise (libc.so.6)
#1 0x00007f7cfa4b2897 abort (libc.so.6)
#2 0x00007f7cfa50c258 __libc_message (libc.so.6)
#3 0x00007f7cfa51377a malloc_printerr (libc.so.6)
#4 0x00007f7cfa51559d _int_free (libc.so.6)
#5 0x00007f7cfa695815 libzfs_mnttab_fini (libzfs.so.2)
#6 0x00007f7cfa6959e1 libzfs_mnttab_find (libzfs.so.2)
#7 0x00007f7cfa6a1c7f is_mounted (libzfs.so.2)
#8 0x000055afe1285913 n/a (zfs)
#9 0x000055afe1286efc n/a (zfs)
#10 0x00007f7cfa6a1acc n/a (libzfs.so.2)
#11 0x00007f7cfa6d6092 n/a (libzfs.so.2)
#12 0x00007f7cfa65d4cf start_thread (libpthread.so.0)
#13 0x00007f7cfa58c2d3 __clone (libc.so.6)
Stack trace of thread 3709:
#0 0x00007f7cfa57d42c __read (libc.so.6)
#1 0x00007f7cfa50f442 _IO_file_underflow@@GLIBC_2.2.5 (libc.so.6)
#2 0x00007f7cfa510676 _IO_default_uflow (libc.so.6)
#3 0x00007f7cfa50353c _IO_getline_info (libc.so.6)
#4 0x00007f7cfa50d1c5 fgets_unlocked (libc.so.6)
#5 0x00007f7cfa584e0e __getmntent_r (libc.so.6)
#6 0x00007f7cfa6fdf45 _sol_getmntent (libuutil.so.1)
#7 0x00007f7cfa6fdfd4 getmntany (libuutil.so.1)
#8 0x00007f7cfa695930 libzfs_mnttab_find (libzfs.so.2)
#9 0x00007f7cfa6a1c7f is_mounted (libzfs.so.2)
#10 0x000055afe1285913 n/a (zfs)
#11 0x000055afe1286efc n/a (zfs)
#12 0x00007f7cfa6a1acc n/a (libzfs.so.2)
#13 0x00007f7cfa6d6092 n/a (libzfs.so.2)
#14 0x00007f7cfa65d4cf start_thread (libpthread.so.0)
#15 0x00007f7cfa58c2d3 __clone (libc.so.6)
Stack trace of thread 3710:
#0 0x00007f7cfa663c45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f7cfa6d5e24 n/a (libzfs.so.2)
#2 0x00007f7cfa65d4cf start_thread (libpthread.so.0)
#3 0x00007f7cfa58c2d3 __clone (libc.so.6)
Stack trace of thread 3701:
#0 0x00007f7cfa663c45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f7cfa6d698e tpool_wait (libzfs.so.2)
#2 0x00007f7cfa6a32e0 zfs_foreach_mountpoint (libzfs.so.2)
#3 0x000055afe128a322 n/a (zfs)
#4 0x000055afe127d3b7 n/a (zfs)
#5 0x00007f7cfa4b4153 __libc_start_main (libc.so.6)
#6 0x000055afe127d4de n/a (zfs)
Stack trace of thread 3708:
#0 0x00007f7cfa663c45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f7cfa6d5e24 n/a (libzfs.so.2)
#2 0x00007f7cfa65d4cf start_thread (libpthread.so.0)
#3 0x00007f7cfa58c2d3 __clone (libc.so.6)
Stack trace of thread 3713:
#0 0x00007f7cfa663c45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f7cfa6d5e24 n/a (libzfs.so.2)
#2 0x00007f7cfa65d4cf start_thread (libpthread.so.0)
#3 0x00007f7cfa58c2d3 __clone (libc.so.6)
Stack trace of thread 3705:
#0 0x00007f7cfa663c45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f7cfa6d5e24 n/a (libzfs.so.2)
#2 0x00007f7cfa65d4cf start_thread (libpthread.so.0)
#3 0x00007f7cfa58c2d3 __clone (libc.so.6)
Stack trace of thread 3703:
#0 0x00007f7cfa663c45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f7cfa6d5e24 n/a (libzfs.so.2)
#2 0x00007f7cfa65d4cf start_thread (libpthread.so.0)
#3 0x00007f7cfa58c2d3 __clone (libc.so.6)
Stack trace of thread 3702:
#0 0x00007f7cfa663c45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f7cfa6d5e24 n/a (libzfs.so.2)
#2 0x00007f7cfa65d4cf start_thread (libpthread.so.0)
#3 0x00007f7cfa58c2d3 __clone (libc.so.6)
Stack trace of thread 3704:
#0 0x00007f7cfa663c45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f7cfa6d5e24 n/a (libzfs.so.2)
#2 0x00007f7cfa65d4cf start_thread (libpthread.so.0)
#3 0x00007f7cfa58c2d3 __clone (libc.so.6)
Stack trace of thread 3712:
#0 0x00007f7cfa663c45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f7cfa6d5e24 n/a (libzfs.so.2)
#2 0x00007f7cfa65d4cf start_thread (libpthread.so.0)
#3 0x00007f7cfa58c2d3 __clone (libc.so.6)
Stack trace of thread 3711:
#0 0x00007f7cfa663c45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f7cfa6d5e24 n/a (libzfs.so.2)
#2 0x00007f7cfa65d4cf start_thread (libpthread.so.0)
#3 0x00007f7cfa58c2d3 __clone (libc.so.6)
Stack trace of thread 3707:
#0 0x00007f7cfa663c45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f7cfa6d5e24 n/a (libzfs.so.2)
#2 0x00007f7cfa65d4cf start_thread (libpthread.so.0)
#3 0x00007f7cfa58c2d3 __clone (libc.so.6)
Coredump 2:
PID: 4142 (zfs)
UID: 0 (root)
GID: 0 (root)
Signal: 11 (SEGV)
Timestamp: Thu 2019-11-07 19:51:42 CET (12min ago)
Command Line: zfs mount -a
Executable: /usr/bin/zfs
Control Group: /user.slice/user-1000.slice/session-1.scope
Unit: session-1.scope
Slice: user-1000.slice
Session: 1
Owner UID: 1000 (xxxxxxxxxxx)
Boot ID: d65d5dfe747745d69d0d9e1890a8feca
Machine ID: 8f0c7880d3ae4a0dbf4a428281f020ad
Hostname: xxxxxxxx
Storage: /var/lib/systemd/coredump/core.zfs.0.d65d5dfe747745d69d0d9e1890a8feca.4142.1573152702000000000000.lz4
Message: Process 4142 (zfs) of user 0 dumped core.
Stack trace of thread 4150:
#0 0x00007f6fd8847466 _IO_file_underflow@@GLIBC_2.2.5 (libc.so.6)
#1 0x00007f6fd8848676 _IO_default_uflow (libc.so.6)
#2 0x00007f6fd883b53c _IO_getline_info (libc.so.6)
#3 0x00007f6fd88451c5 fgets_unlocked (libc.so.6)
#4 0x00007f6fd88bce0e __getmntent_r (libc.so.6)
#5 0x00007f6fd8a35f45 _sol_getmntent (libuutil.so.1)
#6 0x00007f6fd8a35fd4 getmntany (libuutil.so.1)
#7 0x00007f6fd89cd930 libzfs_mnttab_find (libzfs.so.2)
#8 0x00007f6fd89d9c7f is_mounted (libzfs.so.2)
#9 0x000055b0d0bf2913 n/a (zfs)
#10 0x000055b0d0bf3efc n/a (zfs)
#11 0x00007f6fd89d9acc n/a (libzfs.so.2)
#12 0x00007f6fd8a0e092 n/a (libzfs.so.2)
#13 0x00007f6fd89954cf start_thread (libpthread.so.0)
#14 0x00007f6fd88c42d3 __clone (libc.so.6)
Stack trace of thread 4143:
#0 0x00007f6fd899bc45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6fd8a0de24 n/a (libzfs.so.2)
#2 0x00007f6fd89954cf start_thread (libpthread.so.0)
#3 0x00007f6fd88c42d3 __clone (libc.so.6)
Stack trace of thread 4148:
#0 0x00007f6fd899bc45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6fd8a0de24 n/a (libzfs.so.2)
#2 0x00007f6fd89954cf start_thread (libpthread.so.0)
#3 0x00007f6fd88c42d3 __clone (libc.so.6)
Stack trace of thread 4145:
#0 0x00007f6fd899bc45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6fd8a0de24 n/a (libzfs.so.2)
#2 0x00007f6fd89954cf start_thread (libpthread.so.0)
#3 0x00007f6fd88c42d3 __clone (libc.so.6)
Stack trace of thread 4152:
#0 0x00007f6fd88b542c __read (libc.so.6)
#1 0x00007f6fd8847442 _IO_file_underflow@@GLIBC_2.2.5 (libc.so.6)
#2 0x00007f6fd8848676 _IO_default_uflow (libc.so.6)
#3 0x00007f6fd883b53c _IO_getline_info (libc.so.6)
#4 0x00007f6fd88451c5 fgets_unlocked (libc.so.6)
#5 0x00007f6fd88bce0e __getmntent_r (libc.so.6)
#6 0x00007f6fd8a35f45 _sol_getmntent (libuutil.so.1)
#7 0x00007f6fd8a35fd4 getmntany (libuutil.so.1)
#8 0x00007f6fd89cd930 libzfs_mnttab_find (libzfs.so.2)
#9 0x00007f6fd89d9c7f is_mounted (libzfs.so.2)
#10 0x000055b0d0bf2913 n/a (zfs)
#11 0x000055b0d0bf3efc n/a (zfs)
#12 0x00007f6fd89d9acc n/a (libzfs.so.2)
#13 0x00007f6fd8a0e092 n/a (libzfs.so.2)
#14 0x00007f6fd89954cf start_thread (libpthread.so.0)
#15 0x00007f6fd88c42d3 __clone (libc.so.6)
Stack trace of thread 4151:
#0 0x00007f6fd899bc45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6fd8a0de24 n/a (libzfs.so.2)
#2 0x00007f6fd89954cf start_thread (libpthread.so.0)
#3 0x00007f6fd88c42d3 __clone (libc.so.6)
Stack trace of thread 4154:
#0 0x00007f6fd899bc45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6fd8a0de24 n/a (libzfs.so.2)
#2 0x00007f6fd89954cf start_thread (libpthread.so.0)
#3 0x00007f6fd88c42d3 __clone (libc.so.6)
Stack trace of thread 4153:
#0 0x00007f6fd899bc45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6fd8a0de24 n/a (libzfs.so.2)
#2 0x00007f6fd89954cf start_thread (libpthread.so.0)
#3 0x00007f6fd88c42d3 __clone (libc.so.6)
Stack trace of thread 4147:
#0 0x00007f6fd899bc45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6fd8a0de24 n/a (libzfs.so.2)
#2 0x00007f6fd89954cf start_thread (libpthread.so.0)
#3 0x00007f6fd88c42d3 __clone (libc.so.6)
Stack trace of thread 4146:
#0 0x00007f6fd899bc45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6fd8a0de24 n/a (libzfs.so.2)
#2 0x00007f6fd89954cf start_thread (libpthread.so.0)
#3 0x00007f6fd88c42d3 __clone (libc.so.6)
Stack trace of thread 4149:
#0 0x00007f6fd899bc45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6fd8a0de24 n/a (libzfs.so.2)
#2 0x00007f6fd89954cf start_thread (libpthread.so.0)
#3 0x00007f6fd88c42d3 __clone (libc.so.6)
Stack trace of thread 4142:
#0 0x00007f6fd899bc45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6fd8a0e98e tpool_wait (libzfs.so.2)
#2 0x00007f6fd89db2e0 zfs_foreach_mountpoint (libzfs.so.2)
#3 0x000055b0d0bf7322 n/a (zfs)
#4 0x000055b0d0bea3b7 n/a (zfs)
#5 0x00007f6fd87ec153 __libc_start_main (libc.so.6)
#6 0x000055b0d0bea4de n/a (zfs)
Stack trace of thread 4144:
#0 0x00007f6fd899bc45 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0)
#1 0x00007f6fd8a0de24 n/a (libzfs.so.2)
#2 0x00007f6fd89954cf start_thread (libpthread.so.0)
#3 0x00007f6fd88c42d3 __clone (libc.so.6)
I can confirm I'm also seeing this issue on "CentOS Linux release 7.7.1908 (Core)"
Kernel version: 3.10.0-1062.4.1.el7.x86_64
ZFS release: 0.8.2
Once I clear the files from the directory, I can issue a "zfs mount _mountpoint_ and it works fine.
As an aside, On Solaris, this is a boot halting event. I'd like to argue for this being an option that could be enabled here as well.
Same issue on Debian 9 with ZFS 0.8.2.
I'll take a look if I can reproduce this with Fedora (haven't tried yet).
Hit this yesterday, same scenario - zfs mount -a
zfs 0.8.0-rc1
Unfortunately it was in single user and no syslog running.
Then this isn't a regression from recent code. I don't recall anyone reported this.
I come to a different conclusion based on git bisect. Tested on 4.19.89-1-lts in a virtualbox.
git bisect reset
git bisect start
git bisect good zfs-0.8.0
git bisect bad master
git bisect run zsh -c 'make clean; make distclean; ./autogen.sh; ./configure; make -j4; sudo ./mytest.zsh'
Where mytest.zsh is this one: https://gist.github.com/steven-omaha/0c722b678e30030390c803c6ec3853b8
Output of git bisect log: https://gist.github.com/steven-omaha/497a6c83420beed56704fca51b79aa1e
The culprit seems to be a9cd8bfde7, which kind of makes sense.
Paging @don-brady
I am hitting this exact same bug on a Gentoo system:
linux 4.19.97
zfs 0.8.3 (builtin module + tools)
Is there a patch available, or is reverting https://github.com/zfsonlinux/zfs/commit/a9cd8bfde73a78a0ba02e25b712fe28d11019191 a viable solution?
Seeing this on Debian, current buster-backports dkms:
zfs-0.8.3-1~bpo10+1
zfs-kmod-0.8.3-1~bpo10+1
4.19.0-8-amd64
[ 938.580748] ZFS: Loaded module v0.8.3-1~bpo10+1, ZFS pool version 5000, ZFS filesystem version 5
[ 967.821373] zd0: p1 p2 p3
[ 967.857106] zd16: p1
[ 967.913742] zd32: p1
[ 967.981277] zd48: p1 p2 < p5 >
[ 968.044034] zd64: p1
[ 968.097512] zd80: p1
[ 976.060629] zfs[3534]: segfault at 0 ip 00007faa7f0d6694 sp 00007faa75ff3420 error 4 in libc-2.28.so[7faa7f07c000+148000]
[ 976.060638] zfs[3618]: segfault at 0 ip 00007faa7f1b201c sp 00007faa757f2478 error 4
[ 976.060665] in libc-2.28.so[7faa7f07c000+148000]
[ 976.060683] Code: 29 f2 41 ff 55 70 48 85 c0 7e 3b 48 8b 93 90 00 00 00 48 01 43 10 48 83 fa ff 74 0a 48 01 d0 48 89 83 90 00 00 00 48 8b 43 08 <0f> b6 00 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 66 2e 0f 1f
[ 976.060691] Code: 29 c8 c5 f8 77 c3 0f 1f 84 00 00 00 00 00 48 85 d2 0f 84 5a 02 00 00 89 f9 c5 f9 6e c6 c4 e2 7d 78 c0 83 e1 3f 83 f9 20 77 44 <c5> fd 74 0f c5 fd d7 c1 85 c0 0f 85 c4 01 00 00 48 83 ea 20 0f 86
I'll see the segfault with 'zfs' on the command line until I clean up the existing mountpoint (we had two pools with datasets referencing the same mountpoints).
On @steven-omaha 's stackstraces, there are two threads using fgets_unlocked - is that OK since it's not threadsafe?
I'm experiencing probably the same problem, details are there - https://forum.proxmox.com/threads/zfs-mount-on-start-problem-segfault-at-0-error-4-in-libc-2-28-so-subvolumes-not-mounted.68519/ .
Is there anything I can provide to solve this problem? Thank you.
@Dacesilian Reverting https://github.com/openzfs/zfs/commit/a9cd8bfde73a78a0ba02e25b712fe28d11019191 has been working fine for me as a bandaid until a formal solution is implemented.
I've reverted commit a9cd8bf to resolve the segfault. The added After systemd dependency should prevent this issue on boot.
Most helpful comment
I come to a different conclusion based on
git bisect. Tested on4.19.89-1-ltsin a virtualbox.Where
mytest.zshis this one: https://gist.github.com/steven-omaha/0c722b678e30030390c803c6ec3853b8Output of
git bisect log: https://gist.github.com/steven-omaha/497a6c83420beed56704fca51b79aa1eThe culprit seems to be a9cd8bfde7, which kind of makes sense.