Zfs: ZFS: Unable to set "noop" scheduler

Created on 15 Aug 2017 · 23Comments · Source: openzfs/zfs

Type | Version/Name
--- | ---
Distribution Name | Gentoo
Linux Kernel | 4.12.5-gentoo
Architecture | amd64
ZFS Version | 0.7.1-r0-gentoo
SPL Version | 0.7.1-r0-gentoo

[ 48.738987] ZFS: Unable to set "noop" scheduler for /dev/sda1 (sda): 256

# cat /sys/block/sda/queue/scheduler
[mq-deadline] kyber bfq none

oops, just a minor issue about using new kernel,
as they chaged scheduler quite a lot.

Documentation

Source

AndCycle

All 23 comments

@AndCycle what is the issue here? And why is this a ZFS issue?

AFAIK "noop" is still a valid scheduler, maybe you just forgot to enable it in your Kconfig?

loli10K on 15 Aug 2017

no, it's not, you can see the name changed to none, zfs has defined noop

https://github.com/zfsonlinux/zfs/blob/36ba27e9e07b35340ba388e6624e65995595ed92/include/linux/blkdev_compat.h#L590

so that require detect kernel version during build time to choose right name for it.

AndCycle on 15 Aug 2017

I don't think "noop" and "none" are the same thing just with a different name, i'm going to build 4.12.5 now but from the looks of it "noop" is still there.

http://elixir.free-electrons.com/linux/v4.12.5/source/block/noop-iosched.c#L104

static struct elevator_type elevator_noop = {
    .ops.sq = {
        .elevator_merge_req_fn      = noop_merged_requests,
        .elevator_dispatch_fn       = noop_dispatch,
        .elevator_add_req_fn        = noop_add_request,
        .elevator_former_req_fn     = noop_former_request,
        .elevator_latter_req_fn     = noop_latter_request,
        .elevator_init_fn       = noop_init_queue,
        .elevator_exit_fn       = noop_exit_queue,
    },
    .elevator_name = "noop",
    .elevator_owner = THIS_MODULE,
};

EDIT: spelling

loli10K on 15 Aug 2017

👍1

Is this an NVMe drive? The mq- prefix suggests this is the multi-queue variant which is relatively new in the kernel and only makes sense for multi-queue block devices.

DeHackEd on 15 Aug 2017

Archlinux:

4.12.6-1-ARCH
cat /sys/block/sda/queue/scheduler
[noop] deadline cfq

Noop still there...

ssergiienko on 15 Aug 2017

@DeHackEd noop, it's SATA device, attached by LSI SAS card,
so am I the only one got this problem here?

they are SATA drive attached by LSI SAS/SP1200 onboard/USB
guess I need to dig out why all of my device changed to multi-queue,

I just use kernel 4.9 config on 4.12

AndCycle on 15 Aug 2017

Prompt: SCSI: use blk-mq I/O path by default                                                                                                                                                                                         
 Location:                                                                                                                                                                                                                          
  -> Device Drivers                                                                                                                                                                                                                
   -> SCSI device support

ok, I have been messed by this.

AndCycle on 15 Aug 2017

I'd vote for re-opening this issue. The noop is called none in mq- io scheduling. It would be good for ZFS to support both, as single queue schedulers will eventually be deprecated.

rugubara on 31 Dec 2017

👍1

The problem is the scsi_mq driver is a boot-time option, not a runtime option. Thus when the scsi_mq driver is the default for block devices, it cannot be changed at runtime. You can safely ignore the error message indicating this.

richardelling on 31 Dec 2017

@richardelling , I partially agree with you. Ignoring these messages will not break your rig immediately.

In order to maintain 100% consistency, zfs makes an assumption that the write requests are not re-ordered by the io stack - i.e. writing a reference for the data not yet written is a sure way to corruption if something bad happens between the two.
Any scheduler except noop/none would re-order the writes.

You can have either SCSI_MQ or SCSI_SQ active, not both (splitting by a device).
I would prefer to have BFQ_MQ manage the devices with other linux FSs, and make sure zfs managed devices get noop/none

rugubara on 3 Jan 2018

@rugubara the statement:

In order to maintain 100% consistency, zfs makes an assumption that the write requests are not re-ordered by the io stack - i.e. writing a reference for the data not yet written is a sure way to corruption if something bad happens between the two.

is incorrect. In fact there are very few devices today that perform ordered I/O (CR-ROMS?). This has really been true for decades. ZFS uses the appropriate barriers (eg. SCSI SYNCHRONIZE_CACHE) to ensure consistency on-disk.

We use scsi_mq extensively and it behaves properly. I see no problem for folks who want to use scsi_mq or any other multiqueue device drivers with ZFS. IMHO, the attempt to change scheduler is the bug, that should not be necessary and only exists for legacy reasons.

richardelling on 4 Jan 2018

I believe this issue should be reopened. (I would do it myself but it looks like I don't have the permission to do so.)

When Debian packaged the 4.17 kernel which recently hit unstable/sid, they enabled MQ by default in the kernel options:

http://ftp-master.metadata.debian.org/changelogs//main/l/linux/linux_4.17.8-1_changelog

linux (4.17~rc7-1~exp1) experimental; urgency=medium
  [ Ben Hutchings ]
  * SCSI: Enable SCSI_MQ_DEFAULT. This can be reverted using the kernel
    parameter: scsi_mod.use_blk_mq=n
 -- Ben Hutchings <[email protected]>  Tue, 29 May 2018 09:54:12 +0100

Consequently, as soon as I updated my Debian Unstable install to that kernel, all my SATA drives suddenly switched to MQ, and I got spammed with ZFS: Unable to set "noop" scheduler for /dev/sdXX (sdX): 256 log messages.

$ cat /sys/block/sdX/queue/scheduler
[mq-deadline] none

Since this change is on the Debian release train, it would be wise to fix this in one way or another. Otherwise, pretty much everyone is going to see that message eventually.

dechamps on 6 Aug 2018

@dechamps thanks for the heads up, I'm reopening this issue. Redirecting the warning to the internal log is straightforward and has already been done in master. Can anyone share their performance results for mq-deadline vs none?

behlendorf on 14 Aug 2018

@behlendorf

Can anyone share their performance results for mq-deadline vs none?

I don't know about performance, but it definitely looks like switching to the mq scheduler is making ZFS prone to deadlocks on my end: #7772

dechamps on 14 Aug 2018

just reminder, scsi-mq have made default y on stable kernel 4.19

AndCycle on 5 Nov 2018

My initial thoughts while writing #8004 was to accommodate this for initramfs-tools partitions, but did not see consensus of whether "none" or "mq" was correct.

ghfields on 5 Nov 2018

@behlendorf I am currently running ZFS from master on 4.20.5. Today I was a bit surprised to see at random that I have set mq-deadline for all disks (ssd and spinning). Since I did not notice any problems or even performance issue I doubt there is any difference between mq-deadline vs. noop/none when using ZFS. I do not have any measurements but this is just my feeling.

c0d3z3r0 on 29 Jan 2019

Hi,

I also had that issue with kernel 4.17. We discussed about it in the openmediavault forum:

https://forum.openmediavault.org/index.php/Thread/23727-Upgrade-Kernel-4-17-ZFS-Unable-to-set-noop-scheduler-for-dev-sdd1-sdd-256/

For me it was solved by a kernel update to 4.18.

At the moment I am at latest 4.19 Debian kernel.

Regards Hoppel

hoppel118 on 16 Feb 2019

@kpande
Did your advice regarding mq-deadline change since this comment? https://github.com/zfsonlinux/zfs/issues/8506#issuecomment-473182030

CMCDragonkai on 15 Mar 2019

iSCSI operations are completed in order, so there is limited benefit to filling the queue very far. OOB ZFS will happily try to do so, though. Also, very likely the initiator timeout settings need to be adjusted to be more reasonable. And then it can get complicated...

richardelling on 15 Mar 2019

The current initramfs contrib code zfs.in only checks for the "noop" scheduler and does not attempt to set "none" (this is the additional scheduler setting from https://github.com/zfsonlinux/zfs/pull/6807).

It's not clear to me from the above discussion that "none" is preferred over mq-deadline, but if it is, this code should be modified to set "none" if it's an available scheduler.

https://github.com/zfsonlinux/zfs/pull/9042 is a suitable patch to implement this logic.