Zfs: ZFS: Unable to set "noop" scheduler

Created on 15 Aug 2017  路  23Comments  路  Source: openzfs/zfs

Type | Version/Name
--- | ---
Distribution Name | Gentoo
Linux Kernel | 4.12.5-gentoo
Architecture | amd64
ZFS Version | 0.7.1-r0-gentoo
SPL Version | 0.7.1-r0-gentoo

[ 48.738987] ZFS: Unable to set "noop" scheduler for /dev/sda1 (sda): 256

# cat /sys/block/sda/queue/scheduler
[mq-deadline] kyber bfq none

oops, just a minor issue about using new kernel,
as they chaged scheduler quite a lot.

Documentation

All 23 comments

@AndCycle what is the issue here? And why is this a ZFS issue?

AFAIK "noop" is still a valid scheduler, maybe you just forgot to enable it in your Kconfig?

no, it's not, you can see the name changed to none, zfs has defined noop

https://github.com/zfsonlinux/zfs/blob/36ba27e9e07b35340ba388e6624e65995595ed92/include/linux/blkdev_compat.h#L590

so that require detect kernel version during build time to choose right name for it.

I don't think "noop" and "none" are the same thing just with a different name, i'm going to build 4.12.5 now but from the looks of it "noop" is still there.

http://elixir.free-electrons.com/linux/v4.12.5/source/block/noop-iosched.c#L104

static struct elevator_type elevator_noop = {
    .ops.sq = {
        .elevator_merge_req_fn      = noop_merged_requests,
        .elevator_dispatch_fn       = noop_dispatch,
        .elevator_add_req_fn        = noop_add_request,
        .elevator_former_req_fn     = noop_former_request,
        .elevator_latter_req_fn     = noop_latter_request,
        .elevator_init_fn       = noop_init_queue,
        .elevator_exit_fn       = noop_exit_queue,
    },
    .elevator_name = "noop",
    .elevator_owner = THIS_MODULE,
};

EDIT: spelling

Is this an NVMe drive? The mq- prefix suggests this is the multi-queue variant which is relatively new in the kernel and only makes sense for multi-queue block devices.

Archlinux:

4.12.6-1-ARCH
cat /sys/block/sda/queue/scheduler
[noop] deadline cfq

Noop still there...

@DeHackEd noop, it's SATA device, attached by LSI SAS card,
so am I the only one got this problem here?

they are SATA drive attached by LSI SAS/SP1200 onboard/USB
guess I need to dig out why all of my device changed to multi-queue,

I just use kernel 4.9 config on 4.12

Prompt: SCSI: use blk-mq I/O path by default                                                                                                                                                                                         
 Location:                                                                                                                                                                                                                          
  -> Device Drivers                                                                                                                                                                                                                
   -> SCSI device support     

ok, I have been messed by this.

I'd vote for re-opening this issue. The noop is called none in mq- io scheduling. It would be good for ZFS to support both, as single queue schedulers will eventually be deprecated.

The problem is the scsi_mq driver is a boot-time option, not a runtime option. Thus when the scsi_mq driver is the default for block devices, it cannot be changed at runtime. You can safely ignore the error message indicating this.

@richardelling , I partially agree with you. Ignoring these messages will not break your rig immediately.

In order to maintain 100% consistency, zfs makes an assumption that the write requests are not re-ordered by the io stack - i.e. writing a reference for the data not yet written is a sure way to corruption if something bad happens between the two.
Any scheduler except noop/none would re-order the writes.

You can have either SCSI_MQ or SCSI_SQ active, not both (splitting by a device).
I would prefer to have BFQ_MQ manage the devices with other linux FSs, and make sure zfs managed devices get noop/none

@rugubara the statement:

In order to maintain 100% consistency, zfs makes an assumption that the write requests are not re-ordered by the io stack - i.e. writing a reference for the data not yet written is a sure way to corruption if something bad happens between the two.

is incorrect. In fact there are very few devices today that perform ordered I/O (CR-ROMS?). This has really been true for decades. ZFS uses the appropriate barriers (eg. SCSI SYNCHRONIZE_CACHE) to ensure consistency on-disk.

We use scsi_mq extensively and it behaves properly. I see no problem for folks who want to use scsi_mq or any other multiqueue device drivers with ZFS. IMHO, the attempt to change scheduler is the bug, that should not be necessary and only exists for legacy reasons.

I believe this issue should be reopened. (I would do it myself but it looks like I don't have the permission to do so.)

When Debian packaged the 4.17 kernel which recently hit unstable/sid, they enabled MQ by default in the kernel options:

http://ftp-master.metadata.debian.org/changelogs//main/l/linux/linux_4.17.8-1_changelog

linux (4.17~rc7-1~exp1) experimental; urgency=medium
  [ Ben Hutchings ]
  * SCSI: Enable SCSI_MQ_DEFAULT. This can be reverted using the kernel
    parameter: scsi_mod.use_blk_mq=n
 -- Ben Hutchings <[email protected]>  Tue, 29 May 2018 09:54:12 +0100

Consequently, as soon as I updated my Debian Unstable install to that kernel, all my SATA drives suddenly switched to MQ, and I got spammed with ZFS: Unable to set "noop" scheduler for /dev/sdXX (sdX): 256 log messages.

$ cat /sys/block/sdX/queue/scheduler
[mq-deadline] none

Since this change is on the Debian release train, it would be wise to fix this in one way or another. Otherwise, pretty much everyone is going to see that message eventually.

@dechamps thanks for the heads up, I'm reopening this issue. Redirecting the warning to the internal log is straightforward and has already been done in master. Can anyone share their performance results for mq-deadline vs none?

@behlendorf

Can anyone share their performance results for mq-deadline vs none?

I don't know about performance, but it definitely looks like switching to the mq scheduler is making ZFS prone to deadlocks on my end: #7772

just reminder, scsi-mq have made default y on stable kernel 4.19

My initial thoughts while writing #8004 was to accommodate this for initramfs-tools partitions, but did not see consensus of whether "none" or "mq" was correct.

@behlendorf I am currently running ZFS from master on 4.20.5. Today I was a bit surprised to see at random that I have set mq-deadline for all disks (ssd and spinning). Since I did not notice any problems or even performance issue I doubt there is any difference between mq-deadline vs. noop/none when using ZFS. I do not have any measurements but this is just my feeling.

Hi,

I also had that issue with kernel 4.17. We discussed about it in the openmediavault forum:

https://forum.openmediavault.org/index.php/Thread/23727-Upgrade-Kernel-4-17-ZFS-Unable-to-set-noop-scheduler-for-dev-sdd1-sdd-256/

For me it was solved by a kernel update to 4.18.

At the moment I am at latest 4.19 Debian kernel.

Regards Hoppel

@kpande
Did your advice regarding mq-deadline change since this comment? https://github.com/zfsonlinux/zfs/issues/8506#issuecomment-473182030

iSCSI operations are completed in order, so there is limited benefit to filling the queue very far. OOB ZFS will happily try to do so, though. Also, very likely the initiator timeout settings need to be adjusted to be more reasonable. And then it can get complicated...

The current initramfs contrib code zfs.in only checks for the "noop" scheduler and does not attempt to set "none" (this is the additional scheduler setting from https://github.com/zfsonlinux/zfs/pull/6807).

It's not clear to me from the above discussion that "none" is preferred over mq-deadline, but if it is, this code should be modified to set "none" if it's an available scheduler.

https://github.com/zfsonlinux/zfs/pull/9042 is a suitable patch to implement this logic.

The initramfs supports "none" now.

that is initramfs code but what about dracut?

Good question. I hadn't considered that. I checked now and the dracut codes does not set "noop" anyway.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mabod picture mabod  路  53Comments

dmaziuk picture dmaziuk  路  52Comments

chrisrd picture chrisrd  路  65Comments

cytrinox picture cytrinox  路  66Comments

vbrik picture vbrik  路  108Comments