Type | Version/Name
--- | ---
Distribution Name | Gentoo
Linux Kernel | 4.12.5-gentoo
Architecture | amd64
ZFS Version | 0.7.1-r0-gentoo
SPL Version | 0.7.1-r0-gentoo
[ 48.738987] ZFS: Unable to set "noop" scheduler for /dev/sda1 (sda): 256
# cat /sys/block/sda/queue/scheduler
[mq-deadline] kyber bfq none
oops, just a minor issue about using new kernel,
as they chaged scheduler quite a lot.
@AndCycle what is the issue here? And why is this a ZFS issue?
AFAIK "noop" is still a valid scheduler, maybe you just forgot to enable it in your Kconfig?
no, it's not, you can see the name changed to none, zfs has defined noop
so that require detect kernel version during build time to choose right name for it.
I don't think "noop" and "none" are the same thing just with a different name, i'm going to build 4.12.5 now but from the looks of it "noop" is still there.
http://elixir.free-electrons.com/linux/v4.12.5/source/block/noop-iosched.c#L104
static struct elevator_type elevator_noop = {
.ops.sq = {
.elevator_merge_req_fn = noop_merged_requests,
.elevator_dispatch_fn = noop_dispatch,
.elevator_add_req_fn = noop_add_request,
.elevator_former_req_fn = noop_former_request,
.elevator_latter_req_fn = noop_latter_request,
.elevator_init_fn = noop_init_queue,
.elevator_exit_fn = noop_exit_queue,
},
.elevator_name = "noop",
.elevator_owner = THIS_MODULE,
};
EDIT: spelling
Is this an NVMe drive? The mq- prefix suggests this is the multi-queue variant which is relatively new in the kernel and only makes sense for multi-queue block devices.
Archlinux:
4.12.6-1-ARCH
cat /sys/block/sda/queue/scheduler
[noop] deadline cfq
Noop still there...
@DeHackEd noop, it's SATA device, attached by LSI SAS card,
so am I the only one got this problem here?
they are SATA drive attached by LSI SAS/SP1200 onboard/USB
guess I need to dig out why all of my device changed to multi-queue,
I just use kernel 4.9 config on 4.12
Prompt: SCSI: use blk-mq I/O path by default
Location:
-> Device Drivers
-> SCSI device support
ok, I have been messed by this.
I'd vote for re-opening this issue. The noop is called none in mq- io scheduling. It would be good for ZFS to support both, as single queue schedulers will eventually be deprecated.
The problem is the scsi_mq driver is a boot-time option, not a runtime option. Thus when the scsi_mq driver is the default for block devices, it cannot be changed at runtime. You can safely ignore the error message indicating this.
@richardelling , I partially agree with you. Ignoring these messages will not break your rig immediately.
In order to maintain 100% consistency, zfs makes an assumption that the write requests are not re-ordered by the io stack - i.e. writing a reference for the data not yet written is a sure way to corruption if something bad happens between the two.
Any scheduler except noop/none would re-order the writes.
You can have either SCSI_MQ or SCSI_SQ active, not both (splitting by a device).
I would prefer to have BFQ_MQ manage the devices with other linux FSs, and make sure zfs managed devices get noop/none
@rugubara the statement:
In order to maintain 100% consistency, zfs makes an assumption that the write requests are not re-ordered by the io stack - i.e. writing a reference for the data not yet written is a sure way to corruption if something bad happens between the two.
is incorrect. In fact there are very few devices today that perform ordered I/O (CR-ROMS?). This has really been true for decades. ZFS uses the appropriate barriers (eg. SCSI SYNCHRONIZE_CACHE) to ensure consistency on-disk.
We use scsi_mq extensively and it behaves properly. I see no problem for folks who want to use scsi_mq or any other multiqueue device drivers with ZFS. IMHO, the attempt to change scheduler is the bug, that should not be necessary and only exists for legacy reasons.
I believe this issue should be reopened. (I would do it myself but it looks like I don't have the permission to do so.)
When Debian packaged the 4.17 kernel which recently hit unstable/sid, they enabled MQ by default in the kernel options:
http://ftp-master.metadata.debian.org/changelogs//main/l/linux/linux_4.17.8-1_changelog
linux (4.17~rc7-1~exp1) experimental; urgency=medium
[ Ben Hutchings ]
* SCSI: Enable SCSI_MQ_DEFAULT. This can be reverted using the kernel
parameter: scsi_mod.use_blk_mq=n
-- Ben Hutchings <[email protected]> Tue, 29 May 2018 09:54:12 +0100
Consequently, as soon as I updated my Debian Unstable install to that kernel, all my SATA drives suddenly switched to MQ, and I got spammed with ZFS: Unable to set "noop" scheduler for /dev/sdXX (sdX): 256 log messages.
$ cat /sys/block/sdX/queue/scheduler
[mq-deadline] none
Since this change is on the Debian release train, it would be wise to fix this in one way or another. Otherwise, pretty much everyone is going to see that message eventually.
@dechamps thanks for the heads up, I'm reopening this issue. Redirecting the warning to the internal log is straightforward and has already been done in master. Can anyone share their performance results for mq-deadline vs none?
@behlendorf
Can anyone share their performance results for mq-deadline vs none?
I don't know about performance, but it definitely looks like switching to the mq scheduler is making ZFS prone to deadlocks on my end: #7772
just reminder, scsi-mq have made default y on stable kernel 4.19
My initial thoughts while writing #8004 was to accommodate this for initramfs-tools partitions, but did not see consensus of whether "none" or "mq" was correct.
@behlendorf I am currently running ZFS from master on 4.20.5. Today I was a bit surprised to see at random that I have set mq-deadline for all disks (ssd and spinning). Since I did not notice any problems or even performance issue I doubt there is any difference between mq-deadline vs. noop/none when using ZFS. I do not have any measurements but this is just my feeling.
Hi,
I also had that issue with kernel 4.17. We discussed about it in the openmediavault forum:
For me it was solved by a kernel update to 4.18.
At the moment I am at latest 4.19 Debian kernel.
Regards Hoppel
@kpande
Did your advice regarding mq-deadline change since this comment? https://github.com/zfsonlinux/zfs/issues/8506#issuecomment-473182030
iSCSI operations are completed in order, so there is limited benefit to filling the queue very far. OOB ZFS will happily try to do so, though. Also, very likely the initiator timeout settings need to be adjusted to be more reasonable. And then it can get complicated...
The current initramfs contrib code zfs.in only checks for the "noop" scheduler and does not attempt to set "none" (this is the additional scheduler setting from https://github.com/zfsonlinux/zfs/pull/6807).
It's not clear to me from the above discussion that "none" is preferred over mq-deadline, but if it is, this code should be modified to set "none" if it's an available scheduler.
https://github.com/zfsonlinux/zfs/pull/9042 is a suitable patch to implement this logic.
The initramfs supports "none" now.
that is initramfs code but what about dracut?
Good question. I hadn't considered that. I checked now and the dracut codes does not set "noop" anyway.