Zfs: zpool events: internal error: Bad file descriptor, zed goes 100% CPU

Created on 2 Jun 2016 · 28Comments · Source: openzfs/zfs

I am at zfs-0.6.5-300-g8fbbc6b, on a 3.16.7-35 kernel ( openSUSE 13.2) on two different machines that got upgraded to that version, the following errors appear:

on one machine zed goes into some internal loop trying to do an ioctl on a file descriptor from a previous open of /dev/zfs, the other is fine.

On _both_ machines, "zpool events" yields:
TIME CLASS
internal error: Bad file descriptor

which leads me to suspect that the actual problem is in some shared infrastructure code rather than zed.

ZED

Source

bknl

👍2

Most helpful comment

I never saw this with ZFS in the xenial kernel, or xenial's zfs-dkms with a mainline kernel. But after building an Ubuntu linux-generic kernel with zfs af4db70f368d0e9c9ad90f0837a77e511a77b0a7 and spl 0d267566, I saw exactly what kicmuc sees. Maybe this is just an incompatibility between the older libzfs2linux/zfsutils-linux (0.6.5.6-0ubuntu13) I have installed and the newer zfs module in the kernel? If that were the case, it would still be better if zfs-zed didn't go into a loop and consume 100% CPU.

ivan on 12 Oct 2016

👍2

All 28 comments

@bknl thanks for the bug report. Do you happen to have test case we can use to reproduce this?

behlendorf on 3 Jun 2016

This probably isn't relevant, but we had this happen if /dev/zfs was not available (unloaded the kmods) then it spins in tight loop trying to reopen libzfs. But doesn't look like they unload kmods here :)

lundman on 3 Jun 2016

Mmh. I pulled forward to zfs-0.6.5-301-g1eeb456 and spl-0.6.5-62-g16fc1ec. Rebooted, same behaviour.
ZFS works normally (It's my production machine at work), snapshots can be created, the full shebang.
But still fishy:

% zpool status micron
pool: micron
state: ONLINE
scan: scrub repaired 0 in 0h7m with 0 errors on Tue Feb 23 10:23:57 2016
config:

    NAME                                        STATE     READ WRITE CKSUM
    micron                                      ONLINE       0     0     0
      ata-MTFDDAK512MAM-1K1_131509387BA2        ONLINE       0     0     0
      ata-MTFDDAK512MAM-1K1_131509387B02-part2  ONLINE       0     0     0

% zpool events micron
zpool events micron
TIME CLASS
internal error: Bad file descriptor
Abort

It definitely has something to with /dev/zfs, looking in the attached strace log, it opens /dev/zfs 3 times near the end of the trace, returning fd 3 and a few calls later when the ioctl tries to use fd 3 it went sour.

The test case to reproduce it for me is "zpool events"....
bad-zpool-events.txt

bknl on 3 Jun 2016

Mmh.

Pulled one machine forward to zfs-0.6.5-304-gf74b821, rebuild, rebooted. Now "zpool events" works as expected:

zpool events

TIME CLASS
Jun 8 2016 09:05:22.448859670 resource.fs.zfs.statechange
Jun 8 2016 09:05:22.448859670 resource.fs.zfs.statechange
Jun 8 2016 09:05:22.586859681 ereport.fs.zfs.config.sync
Jun 8 2016 09:05:22.645859686 ereport.fs.zfs.config.sync
Jun 8 2016 09:17:55.863234433 ereport.fs.zfs.scrub.start
Jun 8 2016 09:30:40.500236439 ereport.fs.zfs.scrub.finish

I cannot currently pull the other one forward and on that one the issue is persistent.

bknl on 8 Jun 2016

Pulled the other machine forward, rebuilt, rebooted. Issue is gone. As I cannot reproduce it anymore, I'm currently inclined to close the issue.

bknl on 9 Jun 2016

Hi,

the very same is now happening to me on today's trunk (d02ca379795a37b8541da5a1b37528f7130de1e6, due to kernel 4.7.2-201.fc24.x86_64 on Fedora 24). zed is at 100% CPU usage.

"zpool events" fails during ioctl ("Bad file descriptor"):

[root@node zfs-0.6.5.7]# zpool events
TIME                           CLASS
internal error: Bad file descriptor
Aborted (core dumped)

relevant strace part of this command:

open("/dev/zfs", O_RDWR)                = 3
open("/etc/mtab", O_RDONLY)             = 4
open("/etc/dfs/sharetab", O_RDONLY)     = -1 ENOENT (No such file or directory)
open("/dev/zfs", O_RDWR)                = 5
open("/dev/zfs", O_RDWR)                = 6
open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=2997, ...}) = 0
read(7, "# Locale name alias data base.\n#"..., 4096) = 2997
read(7, "", 4096)                       = 0
close(7)                                = 0
open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US.utf8/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.UTF-8/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en.utf8/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
write(1, "TIME                           C"..., 37TIME                           CLASS
) = 37
ioctl(3, _IOC(0, 0x5a, 0x81, 0x00), 0x7ffcb996bf00) = -1 EBADF (Bad file descriptor)

kic

kic68 on 7 Sep 2016

Thanks for the report, I'll reopen this issue so we can address it. Let @don-brady so he's aware of this too.

behlendorf on 7 Sep 2016

Thanks a lot, Brian!

Maybe this is also interesting:

strace -fp 746
(which is the zed process running at 100% CPU usage) results in

ioctl(5, _IOC(0, 0x5a, 0x81, 0x00), 0x7ffec6becdc0) = -1 EBADF (Bad file descriptor)
ioctl(5, _IOC(0, 0x5a, 0x81, 0x00), 0x7ffec6becdc0) = -1 EBADF (Bad file descriptor)
ioctl(5, _IOC(0, 0x5a, 0x81, 0x00), 0x7ffec6becdc0) = -1 EBADF (Bad file descriptor)
ioctl(5, _IOC(0, 0x5a, 0x81, 0x00), 0x7ffec6becdc0) = -1 EBADF (Bad file descriptor)
ioctl(5, _IOC(0, 0x5a, 0x81, 0x00), 0x7ffec6becdc0) = -1 EBADF (Bad file descriptor)
ioctl(5, _IOC(0, 0x5a, 0x81, 0x00), 0x7ffec6becdc0) = -1 EBADF (Bad file descriptor)
^Cioctl(5, _IOC(0, 0x5a, 0x81, 0x00)strace: Process 746 detached
 <detached ...>

lsof -p 746
gives the following:

[root@node ~]# lsof -p 746
COMMAND PID USER   FD   TYPE             DEVICE SIZE/OFF    NODE NAME
zed     746 root  cwd    DIR                8,3     4096       2 /
zed     746 root  rtd    DIR                8,3     4096       2 /
zed     746 root  txt    REG                8,3    53160 1856329 /usr/sbin/zed
zed     746 root  mem    REG                8,3  2089496 1838697 /usr/lib64/libc-2.23.so
zed     746 root  mem    REG                8,3   142312 1838725 /usr/lib64/libpthread-2.23.so
zed     746 root  mem    REG                8,3    89520 1839077 /usr/lib64/libz.so.1.2.8
zed     746 root  mem    REG                8,3    19536 1839160 /usr/lib64/libuuid.so.1.3.0
zed     746 root  mem    REG                8,3    43664 1838729 /usr/lib64/librt-2.23.so
zed     746 root  mem    REG                8,3    73872 1850640 /usr/lib64/libuutil.so.1.0.1
zed     746 root  mem    REG                8,3    19736 1838703 /usr/lib64/libdl-2.23.so
zed     746 root  mem    REG                8,3  1168552 1838705 /usr/lib64/libm-2.23.so
zed     746 root  mem    REG                8,3    15032 1850657 /usr/lib64/libzfs_core.so.1.0.0
zed     746 root  mem    REG                8,3   275296 1850655 /usr/lib64/libzfs.so.2.0.0
zed     746 root  mem    REG                8,3  1278120 1850653 /usr/lib64/libzpool.so.2.0.0
zed     746 root  mem    REG                8,3    81368 1850647 /usr/lib64/libnvpair.so.1.0.1
zed     746 root  mem    REG                8,3   172080 1838690 /usr/lib64/ld-2.23.so
zed     746 root    0r   CHR                1,3      0t0       6 /dev/null
zed     746 root    1u  unix 0xffff880079ee6000      0t0   14718 type=STREAM
zed     746 root    2u  unix 0xffff880079ee6000      0t0   14718 type=STREAM
zed     746 root    3uW  REG               0,20        4   14769 /run/zed.pid
zed     746 root    4uW  REG               0,20        0   14770 /run/zed.state
zed     746 root    5u   CHR              10,58      0t0   10748 /dev/zfs
zed     746 root    6r   REG                0,4        0   14771 /proc/746/mounts
zed     746 root    7u   CHR              10,58      0t0   10748 /dev/zfs
zed     746 root    8u   CHR              10,58      0t0   10748 /dev/zfs

kic68 on 8 Sep 2016

ivan on 12 Oct 2016

👍2

I realise I'm probably not contributing much here but I thought I'd report my situation too. I just noticed zed was using a full core on my machine.

My pools are happy. zpool events reports a Bad file descriptor too.

$ sudo zpool events
TIME                           CLASS
internal error: Bad file descriptor

I performed updates but have not rebooted yet since I am running a server in a VM that I need to wait for a good time to interrupt.

Running Ubuntu 14.04 LTS with a ZFS root and a second data pool.
Software versions reported by apt search zfs

libzfs-dev/trusty 0.7.0-rc1-3~trusty~gbp7bd9cb amd64
libzfs2/trusty,now 0.7.0-rc1-3~trusty~gbp7bd9cb amd64 [installed,automatic]
libzfs2-dbg/trusty 0.7.0-rc1-3~trusty~gbp7bd9cb amd64
libzpool2/trusty,now 0.7.0-rc1-3~trusty~gbp7bd9cb amd64 [installed,automatic]
ubuntu-zfs/trusty,now 8~trusty amd64 [installed]
zfs-auto-snapshot/trusty 1.1.0-0ubuntu1~trusty all
zfs-dkms/trusty,now 0.7.0-rc1-3~trusty~gbp7bd9cb amd64 [installed]
zfs-fuse/trusty 0.7.0-10.1 amd64
zfs-initramfs/trusty,now 0.7.0-rc1-3~trusty~gbp7bd9cb amd64 [installed]
zfs-zed/trusty,now 0.7.0-rc1-3~trusty~gbp7bd9cb amd64 [installed]
zfs-zed-dbg/trusty 0.7.0-rc1-3~trusty~gbp7bd9cb amd64
zfsutils/trusty,now 0.7.0-rc1-3~trusty~gbp7bd9cb amd64 [installed]
zfsutils-dbg/trusty 0.7.0-rc1-3~trusty~gbp7bd9cb amd64

UPDATE: I decided it was a good time to reboot and zed is still using a full core.

SeanTasker on 24 Oct 2016

I am having this problem. zed keeps using 100% of a CPU core even after reboot.
I can't execute zpool upgrade -a, either.

$ sudo zpool upgrade -a
Password:
This system supports ZFS pool feature flags.

cannot set property for 'os': invalid argument for this pool operation

Is there a workaround?

crocket on 1 Aug 2017

Possibly related to #6440. By chance do you have a large number of delay events logged in the zpool events output.

behlendorf on 2 Aug 2017

$ sudo zpool events
Password:
TIME                           CLASS
internal error: Bad file descriptor
Aborted

crocket on 2 Aug 2017

@crocket which disto/kernel/zfs version are you using?

tonyhutter on 3 Aug 2017

Gentoo Linux

$ uname -r
4.9.34-gentoo

sys-fs/zfs-0.7.0
sys-fs/zfs-kmod-0.7.0
sys-kernel/spl-0.7.0

crocket on 3 Aug 2017

@crocket thanks. Can you also run the following commands when zed is at 100%?

ZPID=$(pgrep zed)
sudo cat /proc/$ZPID/stack 
sudo strace -p $ZPID

I'd also like to see the strace output from this when you get the "Bad file descriptor" error:

sudo strace zpool events

I'm curious if it matches https://github.com/zfsonlinux/zfs/issues/4720#issuecomment-245410201

Please post the output in an attachment if it's really long.

tonyhutter on 3 Aug 2017

$ sudo cat /proc/$ZPID/stack
[<ffffffffffffffff>] 0xffffffffffffffff

$ sudo strace -p $ZPID
ioctl(6, _IOC(0, 0x5a, 0x81, 0), 0x7ffdeb81e7c0) = -1 EBADF (Bad file descriptor)
ioctl(6, _IOC(0, 0x5a, 0x81, 0), 0x7ffdeb81e7c0) = -1 EBADF (Bad file descriptor)
ioctl(6, _IOC(0, 0x5a, 0x81, 0), 0x7ffdeb81e7c0) = -1 EBADF (Bad file descriptor)
ioctl(6, _IOC(0, 0x5a, 0x81, 0), 0x7ffdeb81e7c0) = -1 EBADF (Bad file descriptor)
ioctl(6, _IOC(0, 0x5a, 0x81, 0), 0x7ffdeb81e7c0) = -1 EBADF (Bad file descriptor)
....

sudo strace -p $ZPID keeps spitting out

ioctl(6, _IOC(0, 0x5a, 0x81, 0), 0x7ffdeb81e7c0) = -1 EBADF (Bad file descriptor)

strace-zpool-events.txt is the output of sudo strace zpool events.

crocket on 3 Aug 2017

Looks like it's the same issue as https://github.com/zfsonlinux/zfs/issues/4720#issuecomment-245410201 in that zed or zpool events is trying to do a ioctl(ZFS_IOC_EVENTS_NEXT) which is returning EBADF. From the strace, it doesn't look like a double close() issue. I can see a possible codepath that could give us EBADF from zfs:

zpool_do_events()
    zpool_do_events_next()
        zpool_events_next()
            ioctl(ZFS_IOC_EVENTS_NEXT)
                zfs_zevent_fd_hold()
                    zfsdev_getminor() - can return EBADF

https://github.com/zfsonlinux/zfs/blob/master/module/zfs/zfs_ioctl.c#L6307-L6335

If you don't mind continuing to troubleshoot, can you add in this patch, reload the module, and check the dmesg output after running zpool events?:

diff --git a/module/zfs/zfs_ioctl.c b/module/zfs/zfs_ioctl.c
index 21fefe57f..167baf3dc 100644
--- a/module/zfs/zfs_ioctl.c
+++ b/module/zfs/zfs_ioctl.c
@@ -6312,12 +6312,14 @@ zfsdev_getminor(struct file *filp, minor_t *minorp)
        ASSERT(!MUTEX_HELD(&zfsdev_state_lock));

        fpd = filp->private_data;
+       printk("%s: fpd = %p\n", __func__, fpd);
        if (fpd == NULL)
                return (EBADF);

        mutex_enter(&zfsdev_state_lock);

        for (zs = zfsdev_state_list; zs != NULL; zs = zs->zs_next) {
+               printk("%s: zs = %p, zs->zs_minor = %d\n", __func__, zs, zs ? zs->zs_minor : 0);

                if (zs->zs_minor == -1)
                        continue;

If it's working correctly, it should show something like this after zpool events:

[32870.715781] zfsdev_getminor: fpd = ffff9f4d7a83c000
[32870.715785] zfsdev_getminor: zs = ffff9f4d4c627940, zs->zs_minor = 8
[32870.715786] zfsdev_getminor: zs = ffff9f4d52749dc0, zs->zs_minor = 9
[32870.715788] zfsdev_getminor: zs = ffff9f4d7a83c000, zs->zs_minor = 10

tonyhutter on 3 Aug 2017

I applied the patch to sys-fs/zfs-kmod which includes kernel modules and rebooted.

zpool events printed nothing on dmesg.

crocket on 3 Aug 2017

Thanks, I'm going to setup a VM to see if I can reproduce this.

tonyhutter on 3 Aug 2017

I'm able to reproduce this on a ubuntu 14.04 by building/installing 0.6.5.11 deb packages (possibly overwriting a previous zfs install in the process). I'm still looking into it.

@crocket do you see any warnings in dmesg when you first modprobe zfs? I'm seeing this from SPL:

[  267.194264] VERIFY3(typemask & 0x01 == 0x01) failed (0 == 1)
[  267.194271] PANIC at spl-vmem.c:43:vmem_size()
[  267.194273] Showing stack for process 2231
[  267.194278] CPU: 0 PID: 2231 Comm: arc_reclaim Tainted: P           OE   4.2.0-27-generic #32~14.04.1-Ubuntu
[  267.194281] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  267.194285]  00000000 00000000 f6b9fd6c c170aa8c f89df6e7 f6b9fd7c f89d7a61 f89e1755
[  267.194292]  000008b7 f6b9feac f89d7afd f89e1778 f89df6e7 0000002b f89dd8e5 f89df6f4
[  267.194298]  8b00080a 49524556 28335946 65707974 6b73616d 30202620 20313078 30203d3d
[  267.194305] Call Trace:
[  267.194323]  [<c170aa8c>] dump_stack+0x41/0x52
[  267.194336]  [<f89d7a61>] spl_dumpstack+0x41/0x50 [spl]
[  267.194344]  [<f89d7afd>] spl_panic+0x8d/0xc0 [spl]
[  267.194356]  [<c109447b>] ? update_curr+0x12b/0x1d0
[  267.194361]  [<c1094981>] ? dequeue_entity+0x171/0x7c0
[  267.194368]  [<c1055519>] ? pvclock_clocksource_read+0xb9/0x150
[  267.194371]  [<c1055519>] ? pvclock_clocksource_read+0xb9/0x150
[  267.194379]  [<f89d3cbc>] vmem_size+0xcc/0x110 [spl]
[  267.194438]  [<f8e59006>] arc_available_memory+0xc6/0x1b0 [zfs]
[  267.194489]  [<f8e64ab0>] ? arc_shrink+0x250/0x250 [zfs]
[  267.194535]  [<f8e64b1a>] arc_reclaim_thread+0x6a/0x480 [zfs]
[  267.194565]  [<c1197945>] ? kfree+0xe5/0x120
[  267.194572]  [<f89d0845>] ? spl_kmem_free+0x45/0x50 [spl]
[  267.194577]  [<f89d0845>] ? spl_kmem_free+0x45/0x50 [spl]
[  267.194620]  [<f8e64ab0>] ? arc_shrink+0x250/0x250 [zfs]
[  267.194627]  [<f89d3eb3>] thread_generic_wrapper+0x63/0xa0 [spl]
[  267.194633]  [<f89d3e50>] ? __thread_exit+0x20/0x20 [spl]
[  267.194638]  [<c1081391>] kthread+0xa1/0xc0
[  267.194645]  [<c1711041>] ret_from_kernel_thread+0x21/0x30
[  267.194649]  [<c10812f0>] ? kthread_create_on_node+0x140/0x140
[  267.194797] ZFS: Loaded module v0.6.5.11-1 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5

tonyhutter on 7 Aug 2017

I don't see PANIC in dmesg.

dmesg.txt

crocket on 8 Aug 2017

@croket ok, maybe you're not hitting that error, or you're building without --with-debug when you build spl (which is fine).

Overall, I think the problem may be caused by complications from upgrading from one version of zfs packages to another. Maybe slightly mismatched versions of zpool & zfs.ko. I don't have any conclusive proof, but I was able to fix my broken ubuntu VM by doing the following:

Remove all traces of previous zfs installations.

# remove the debs
sudo dpkg --purge kmod-zfs-`uname -r` kmod-zfs-devel kmod-zfs-devel-`uname -r` libzfs2 libzpool2 libzfs2-devel zfs zfs-dracut zfs-initramfs zfs-test kmod-spl-`uname -r` kmod-spl-devel kmod-spl-devel-`uname -r` spl

# sanity check that all zfs stuff is removed.  If not, remove it
sudo dpkg -l | egrep -i 'zfs|spl'
which zfs
ls -l /sbin | grep zfs
find /lib/modules/ | egrep -i 'zfs|spl'

reboot to start with a clean slate
If you're running 0.6.5.11 or older, make sure you:

git cherry-pick 17f6f4a

... in your spl directory. This will add in the "Fix vmem_size()" patch, which prevents the spl assertion:

[  267.194264] VERIFY3(typemask & 0x01 == 0x01) failed (0 == 1)
[  267.194271] PANIC at spl-vmem.c:43:vmem_size()

You may not see this assertion if you're not building spl with --with-debug, but that doesn't mean it won't bite you.

Build/install spl & zfs as normal (https://github.com/zfsonlinux/zfs/wiki/Custom-Packages)

We may want to add some automated tests that upgrade from various older versions of zfs to the latest version.

tonyhutter on 8 Aug 2017

After reinstalling zfs, zfs-kmod, and spl with debug USE flag, zed didn't consume CPU time.
I reinstalled zfs, zfs-kmod, and spl without debug USE flag, and zed still didn't consume CPU time.

Now, zpool events works fine.

$ sudo zpool events
Password:
TIME                           CLASS
dropped 458 events
Aug  8 2017 17:04:56.028895050 sysevent.fs.zfs.history_event
Aug  8 2017 17:04:56.053895048 sysevent.fs.zfs.config_sync
Aug  8 2017 17:04:56.053895048 sysevent.fs.zfs.pool_import
Aug  8 2017 17:04:56.054895048 sysevent.fs.zfs.history_event
Aug  8 2017 17:04:56.079895046 sysevent.fs.zfs.config_sync
Aug  8 2017 17:04:59.554999959 sysevent.fs.zfs.history_event
Aug  8 2017 17:04:59.815999940 sysevent.fs.zfs.config_sync
Aug  8 2017 17:04:59.815999940 sysevent.fs.zfs.pool_import
Aug  8 2017 17:04:59.817999940 sysevent.fs.zfs.history_event
Aug  8 2017 17:05:00.046999923 sysevent.fs.zfs.config_sync
Aug  8 2017 17:05:06.768999431 sysevent.fs.zfs.vdev_autoexpand
Aug  8 2017 17:05:06.768999431 sysevent.fs.zfs.vdev_autoexpand
Aug  8 2017 17:10:01.595217224 sysevent.fs.zfs.history_event
Aug  8 2017 17:10:01.774307320 sysevent.fs.zfs.history_event

Thus, I can say this issue is solved.

However, I encountered a new issue.

Now, it takes ~36 seconds to import os ZFS pool during boot. os pool contains the root partition for gentoo. I just reported this new issue on github.

crocket on 8 Aug 2017

@crocket good to hear zpool events is working for you again. If anyone else hits this issue, please do a clean uninstall and reinstall zfs & spl, and see if that fixes it.

Closing issue.

tonyhutter on 8 Aug 2017

I had the same issue. @ivan was right, I've upgraded zfs and spl without upgrading the other tools, so there was some mismatch with the libraries and such. I just put all zfs-related packages in sync, and the problem has gone.

nwolovick on 25 Nov 2017

👍1

Just a me-too, had this problem after a partial upgrade.
IMHO a version check would be a great idea; a version mismatch might create more severe problems than just one cpu maxed out.