Zfs: Scrub heavily impacting application IO performance

Created on 25 Apr 2020 · 30Comments · Source: openzfs/zfs

System information

Type | Version/Name
--- | ---
Distribution Name | CentOS
Distribution Version | 8.1
Linux Kernel | 4.18.0-147.5.1.el8_1.x86_64
Architecture | x86_64
ZFS Version | 0.8.3-1
SPL Version | 0.8.3-1

Describe the problem you're observing

The new scub code heavily impacts application IO performance when used with HDD-based pools. Application IOPs are reducted by up to 10x factor.

Using a 4x SAS 15k 300 GB disks test pool which can provide ~250 IOPs for 4K single-thread sync random read (as measured by fio), starting a scrub degrades application random 4K reads to 20-60 (so 4-10x lower random read speed).

The older ZFS 0.7.x relase had a zfs_scrub_delay which can be used to limit how much scrub "conflicts" with other read/write operation, but this parameter is gone with the new 0.8.x relase. The rationale is that management of the different IO classes should be done excluvely via ZIO scheduler tuning, adjusting the relative weight via *_max_active tunables, but I can't see any meaningful difference even when setting zfs_vdev_scrub_max_active=1 and zfs_vdev_sync_read_max_active=1000.

I think the problem is due to the new scrub code batching read in very large block sizes, leading to long depth on the scrub queue and, finally, on the vdev queue. Indeed the new scrub code is very fast (reading at 400-500 MB/s on that test array), but this leads to poor random IOPs delivered to (test) application.

While faster scrub is great, we need a method to limit its impact on production pool (even if this means a longer scrub time).

Describe how to reproduce the problem

start a 4k random read workload with somethins as fio --name=test --filename=/tank/test.img --rw=randread --size=32G and look at the current IOPs
start a scrub via zpool scrub tank
check again the instantaneous IOPs are reported by fio

NOTE: using a 128k random read (matching the dataset recordsize) will not change IOPs number (except that the raw throughput value is higher).

Performance

Source

shodanshok

Most helpful comment

After some more investigation, I found that the very low scrub performance was not directly related to the new zfs scan mode, but due to the interaction of

mq-scheduler IO sched (rather than none)
the lack of scrub throttling as defined in 0.7.x by zfs_scrub_delay and zfs_scan_idle

If the first point was my fault (well, I set it as noop but it is not valid anymore on CentOS 8; rather, none must be used), the second one (lack of scrub throttling) is a real concern: it generally means that, even setting zfs_vdev_scrub_max_active=1, single-threaded / low queue depth application running on HDD pools will face a ~50% reduction in random IO speed.

Lets see the output of zpool iostat -q 1 during a concurrent fio and zpool scrub:

              capacity     operations     bandwidth    syncq_read    syncq_write   asyncq_read  asyncq_write   scrubq_read   trimq_write
pool        alloc   free   read  write   read  write   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
tank         108G   972G    214      0   144M      0      0      1      0      0      0      0      0      0     24      1      0      0
tank         108G   972G    222      0   137M      0      0      1      0      0      0      0      0      0     24      1      0      0
tank         108G   972G    206     59   130M   855K      0      1      0      0      0      0      0      0     24      1      0      0
tank         108G   972G    206      0   140M      0      0      1      0      0      0      0      0      0     24      1      0      0
tank         108G   972G    206      0   140M      0      0      1      0      0      0      0      0      0     24      1      0      0
tank         108G   972G    203      0   137M      0      0      1      0      0      0      0      0      0     24      1      0      0
tank         108G   972G    195      0   123M      0      0      1      0      0      0      0      0      0     24      1      0      0
tank         108G   972G    213     51   133M   855K      0      1      0      0      0      0      0      0     23      1      0      0
tank         108G   972G    217      0   148M      0      0      1      0      0      0      0      0      0     24      1      0      0

scrubq_read has 1 request always active/issued, with no throttling. On rotational media this means the seek rate effectively doubles, halving application performance for random reads.

While I really like the new scrub/resilver performance, I think we need an "escape hatch" to throttle scrubbing when application IO should be affected as little as possible.

shodanshok on 28 Apr 2020

👍2

All 30 comments

After some more investigation, I found that the very low scrub performance was not directly related to the new zfs scan mode, but due to the interaction of

mq-scheduler IO sched (rather than none)
the lack of scrub throttling as defined in 0.7.x by zfs_scrub_delay and zfs_scan_idle

Lets see the output of zpool iostat -q 1 during a concurrent fio and zpool scrub:

              capacity     operations     bandwidth    syncq_read    syncq_write   asyncq_read  asyncq_write   scrubq_read   trimq_write
pool        alloc   free   read  write   read  write   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
tank         108G   972G    214      0   144M      0      0      1      0      0      0      0      0      0     24      1      0      0
tank         108G   972G    222      0   137M      0      0      1      0      0      0      0      0      0     24      1      0      0
tank         108G   972G    206     59   130M   855K      0      1      0      0      0      0      0      0     24      1      0      0
tank         108G   972G    206      0   140M      0      0      1      0      0      0      0      0      0     24      1      0      0
tank         108G   972G    206      0   140M      0      0      1      0      0      0      0      0      0     24      1      0      0
tank         108G   972G    203      0   137M      0      0      1      0      0      0      0      0      0     24      1      0      0
tank         108G   972G    195      0   123M      0      0      1      0      0      0      0      0      0     24      1      0      0
tank         108G   972G    213     51   133M   855K      0      1      0      0      0      0      0      0     23      1      0      0
tank         108G   972G    217      0   148M      0      0      1      0      0      0      0      0      0     24      1      0      0

scrubq_read has 1 request always active/issued, with no throttling. On rotational media this means the seek rate effectively doubles, halving application performance for random reads.

While I really like the new scrub/resilver performance, I think we need an "escape hatch" to throttle scrubbing when application IO should be affected as little as possible.

shodanshok on 28 Apr 2020

👍2

An update: I considered restoring some form of delay, taking it from 0.7.x branch. However, dsl_scan.c and the scrub approach as a whole are sufficiently different that I am not sure this would be reasonable, much less accepted.

I found that limiting zfs_scan_vdev_limit (in addition to zfs_vdev_scrub_max_active) can reduce scrub impact on low queue depth random reads. Moreover, and more importantly, multi-thread random reads (ie: higher queue depth reads) are much less impacted by scrub overhead (as the zfs_vdev_scrub_max_active vs zfs_vdev_sync_read_max_active comparison is, by default, 2 vs 10).

Finally, a scrub can be stopped/paused during work hours.

@behlendorf feel free to close the ticket. I am not closing it now only because I don't know if you (or other maintainers) want to track the problem described above. Thanks.

shodanshok on 4 May 2020

@behlendorf @ahrens (I do not remember who contributed the sequential scrub code, please feel free to add the right person)

I would like to add another datapoint. Short summary: scrub so heavily impacts performance that VMs sometime see 0 (zero) read IOPs. This is a small pool with 4x 2TB HDD + 2x L2ARC SSD + 1x NVME SLOG and a running scrub:

[root@localhost parameters]# zpool iostat -q 1 -v
                                    capacity     operations     bandwidth    syncq_read    syncq_write   asyncq_read  asyncq_write   scrubq_read   trimq_write
pool                              alloc   free   read  write   read  write   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ
--------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
tank                              1.52T  2.11T  1.33K     16   515M   311K      0      1      0      0     16      3      0      0    242      8      0      0
  mirror                           776G  1.05T    681      0   253M      0      0      0      0      0      0      0      0      0     66      4      0      0
    pci-0000:02:00.1-ata-1.0          -      -    381      0   133M      0      0      0      0      0      0      0      0      0      0      2      0      0
    pci-0000:02:00.1-ata-2.0          -      -    300      0   120M      0      0      0      0      0      0      0      0      0     66      2      0      0
  mirror                           777G  1.05T    679      0   262M      0      0      1      0      0     16      3      0      0    176      4      0      0
    pci-0000:02:00.1-ata-5.0          -      -    342      0   131M      0      0      1      0      0      0      0      0      0     69      2      0      0
    pci-0000:02:00.1-ata-6.0          -      -    337      0   131M      0      0      0      0      0     16      3      0      0    107      2      0      0
logs                                  -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -
  nvme0n1                         94.5M  26.9G      0     16      0   311K      0      0      0      0      0      0      0      0      0      0      0      0
cache                                 -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -      -
  pci-0000:02:00.1-ata-3.0-part6  11.1G   245G      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0
  pci-0000:02:00.1-ata-4.0-part6  11.5G   245G      0     51      0  5.16M      0      0      0      0      0      0      0      0      0      0      0      0
--------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----


[root@localhost parameters]# iostat -x -k 1
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.76    0.00    3.39   41.50    0.00   53.35

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme0n1           0.00     0.00    0.00   20.00     0.00   816.00    81.60     0.01    0.30    0.00    0.30   0.10   0.20
sda               0.00     0.00  376.00    0.00 72480.00     0.00   385.53     6.19    6.36    6.36    0.00   2.66 100.00
sdb               0.00     0.00  390.00    0.00 70576.00     0.00   361.93     4.95   16.39   16.39    0.00   2.56 100.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00   56.00     0.00  5684.00   203.00     0.02    0.43    0.00    0.43   0.21   1.20
sdf               0.00     0.00  440.00    0.00 123236.00     0.00   560.16     6.76   18.85   18.85    0.00   2.27 100.00
sdg               0.00     0.00  437.00    0.00 125008.00     0.00   572.12     6.72    6.24    6.24    0.00   2.29 100.00
md127             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
md126             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Please note how the HDDs are overwelmed by pending ZFS scrub request: while the scrub itself is very fast, it completely saturates the HDDs with very bad resulting performance for running VMs. Setting zfs_scan_vdev_limit to 128K and zfs_vdev_scrub_max_active only slightly lessen the problem, while fiddling with zfs_no_scrub_prefetch and zfs_scrub_min_time_ms seems to have no effect at all.

Any idea on what can be done to further decrease scrub load?

shodanshok on 30 May 2020

Well, I did an interesting discovery: setting sd[abfg]/device/queue_depth=1 (effectively disabling NCQ) solved the VMs stalling problem. I can confirm with a fio --rw=randread that no 0 (or very low) IOPs are recorded.

I got curious and tested a disk (WD Gold 2 TB) in isolation. I can replicate the issue by concurrently running the following two fio commands:
fio --name=test --filename=/dev/sda --direct=1 --rw=read #sequential read
fio --name=test --filename=/dev/sda --direct=1 --rw=randread #random read

While the first fio sucked almost all IOPs, the second one was mostly stalled. In short, it seems that the new scrub code, which is much more sequential than the old behavior, causes some disks (WD Gold in this case) to stall random read requests. I suppose this is due to over-aggressive read-ahead enabled by "seeing" multiple concurrent requests (setting sda/device/queue_depth=2, using a minimal NCQ amount, give the same stalling outcome) , but the exact cause is probably not so important. The old scrub code, with its more random IO pattern, did not expose the problem.

As a side note, an older WD Green did not show any issue.

I am leaving this issue open for some days only because I don't know if someone want to comment and/or share other relevant experiences. Anyway, feel free to close it.

Thanks.

shodanshok on 31 May 2020

That's really interesting. It definitely sounds like an issue with the WD Gold drives, and it's not something I would have expected from an enterprise branded drive. You might want to check if there's a firmware update available. Thanks for posting a solution for anyone else who may encounter this.

behlendorf on 1 Jun 2020

I was able to reproduce this on a different model of Western Digital hard drives: WD Red 10 TB (WD100EFAX). I am using 6 of these drives in a zpool made of 3 mirrors. See #10535.

My experience closely matches @shodanshok's: following the steps to reproduce in his original post, with default settings (scheduler=none, queue_depth=32), I get roughly 150-180 IOPs in fio, falling to a measly trickle of about 10 IOPs when a scrub is ongoing. But if I set queue_depth=1, then I get about 60-100 IOPs - a huge improvement. So thank you, @shodanshok, for getting to the bottom of this issue! Your workaround seems to work quite well. In fact, I get the impression that I'm getting better performance with queue_depth=1 during normal operation even when a scrub is not running (about 200-250 IOPs).

Now, if only Western Digital could fix their firmware… stalling all random reads when sequential reads are inflight sounds pretty bad. One can easily imagine such behaviour causing problems with production services becoming unresponsive just because some random user decided to scan the contents of a file.

dechamps on 10 Jul 2020

Is it time for the ZFS wiki or related documentation to make a "known bad" list of drives/firmwares that have been definitively identified as interacting badly with ZFS?

gdevenyi on 11 Jul 2020

@gdevenyi Rather than a list (which will become outdated pretty fast), I suggest inserting a note in the hardware/performance page stating that if excessive performance degradation is observed during scrub, disabling NCQ is a possible workaround (maybe even linking to this issue).

shodanshok on 11 Jul 2020

I was actually planning to add a "Queue depth" section on the Performance tuning OpenZFS wiki page to describe this problem, but that page doesn't seem to have open edit access.

dechamps on 11 Jul 2020

…and for reference, I used the following udev rule to automatically apply the workaround to all my affected disks:

DRIVER=="sd", ATTR{model}=="WDC WD100EFAX-68", ATTR{queue_depth}="1"

dechamps on 11 Jul 2020

👍1

On Jul 11, 2020, at 3:38 AM, Etienne Dechamps notifications@github.com wrote:

…and for reference, I used the following udev rule to automatically apply the workaround to all my affected disks:

DRIVER=="sd", ATTR{model}=="WDC WD100EFAX-68", ATTR{queue_depth}="1"

This has long been a behaviour seen by HDDs, with some firmware better than others.
You might find queue_depth=2 works better, but higher queue depths are worse. For
some background, see
http://blog.richardelling.com/2012/03/iops-and-latency-are-not-related-hdd.html http://blog.richardelling.com/2012/03/iops-and-latency-are-not-related-hdd.html

-- richard

richardelling on 13 Jul 2020

@richardelling Unfortunately, for the specific case of WD Gold disks (and I suppose @dechamps WD Red too), using anything over 1 causes the read starvation issue described above.

shodanshok on 13 Jul 2020

with WD Gold disks, disabling the disk scheduler (using noop) resolves this issue, I don't need to set queue_depth=1. however, if I use any other disk scheduler than noop, zpool scrub will cause IO starvation.

misterbigstuff on 13 Jul 2020

To clarify, in my case, /sys/class/block/sd*/queue/scheduler was [none] from the very beginning, so clearly that didn't help with my WD Red WD100EFAX. Only setting queue_depth to 1 fixed the issue.

dechamps on 13 Jul 2020

@misterbigstuff I my case, using noop or none made no difference on IOPS recorded during a scrub. Limiting queue_depth to 1 was the only solution, matching @dechamps experience.

shodanshok on 13 Jul 2020

notably i'm using the SATA revision of these devices, which have different firmware from the SAS counterpart.

misterbigstuff on 14 Jul 2020

@misterbigstuff Interesting: I also have multiple SATA WD Gold, but these drives show the described issued unless I set queue_depth=1, irrespective of the IO scheduler (which is consistent with the fio tests which, by using direct=1 and issuing a single IOP at a times, should be unaffected by the scheduler). Maybe some newer firmware fixed it? For reference, here are my disk details:

Model Family:     Western Digital Gold
Device Model:     WDC WD2005FBYZ-01YCBB2
Serial Number:    WD-XXX
LU WWN Device Id: 5 0014ee 0af18d58f
Firmware Version: RR07
User Capacity:    2,000,398,934,016 bytes [2.00 TB]

Can you share your disk model/firmware version? Did you try the reproducer involving concurrently running these two fio commands? Can you post the results of the tests below?

fio --name=test --filename=/dev/sda --direct=1 --rw=read #sequential read
fio --name=test --filename=/dev/sda --direct=1 --rw=randread #random read

shodanshok on 14 Jul 2020

@misterbigstuff

i'm using the SATA revision of these devices

I am also using SATA, so that shouldn't make a difference.

Here are the details of one of my drives:

Model Family:     Western Digital Red
Device Model:     WDC WD100EFAX-68LHPN0
Serial Number:    JEJEGWKM
LU WWN Device Id: 5 000cca 267e24fb9
Firmware Version: 83.H0A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Jul 14 10:12:02 2020 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

OS details: Debian Unstable/Sid, Linux 5.7.0-1, ZFS/SPL 0.8.4-1.

@misterbigstuff One thing that might be different in your case is that you might be running an older Linux Kernel - you keep mentioning the noop scheduler, but in modern kernels with mq, that scheduler is called none:

$ cat /sys/block/sda/queue/scheduler
[none] mq-deadline

dechamps on 14 Jul 2020

i am also having this issue. zpool scrub seems runs without any throttle at all and thus
impacting io latency and overall system load.

with zfs 0.7.* i throttled scrubs with those parameters:

zfs_scrub_delay=60
zfs_top_maxinflight=6
zfs_scan_idle=150

this slowed down scrubs without impacting the application (much).

with zfs 0.8 these parameter do not exist anymore. i've been reading the zfs module parameters man page and began playing around with these parameters, but i am unable to slow down the scrub at all:

zfs_vdev_scrub_max_active
zfs_scan_strict_mem_lim
zfs_scan_mem_lim_soft_fact
zfs_scan_mem_lim_fact
zfs_scan_vdev_limit
zfs_scrub_min_time_ms
zfs_no_scrub_prefetch

i also made sure, that the system parameters for queue depth and io scheduler are set as seen above.

$ for i in /sys/block/sd* ; do [[ $(cat $i/queue/rotational) == 1 ]] && cat $i/device/queue_depth ; done | sort | uniq
1

$ for i in /sys/block/sd* ; do [[ $(cat $i/queue/rotational) == 1 ]] && cat $i/queue/scheduler ; done | sort | uniq
[none] mq-deadline

system configuration:

dell md3060e enclosure
sas hba
12 raidz1 pools of 5 nl-sas hdds (4TB) (manufacturer toshiba, seagate, hgst)
os: debian buster
zfs version: 0.8.4-1~bpo10+1
kernel version: 4.19.0-9-amd64

graphs from the prometheus node exporter. i did stop the scrub after some time:

2020-08-11-1420-zpool-wtime

2020-08-11-1420-zpool-nread

i could use some help on how to go on with this. which other parameters might be helpful in decreasing the scrub speed? what else can i try?

wildente on 11 Aug 2020

At Delphix, we have investigated reducing the impact of scrub by having it run at a reduced i/o rate. Several years back, one of our interns prototyped this. It would be wonderful if we took this discussion as motivation to complete that work with a goal of having scrub on by default in more deployments of ZFS! If anyone is interested in working on that, I can dig up the design documents and any code.

ahrens on 11 Aug 2020

@wildente from the graphs you posted, it seems the pools had almost no load excluding scrub itself. Did you scrub all your pool at the same time? Can you set zfs_vdev_scrub_max_active=1 and run the following fio command on both idle and scrubbing pool?

fio --name=test --filename=/yourpool/test.img --rw=randread --size=32G

@ahrens excluding bad interactions with hardware queues, setting zfs_vdev_scrub_max_active=1 should let scrub to "only" eat 50% of available IOPs. Do you think a simple rule as "if any other queues has one or more active/pending IO, skip scrubbing for some msec" can be useful (similar to how 0.7.x throttled scrub)? Thanks.

shodanshok on 11 Aug 2020

NB, zpool wait time is the time I/Os are not issued to physical devices. So if you have a scrub ongoing and
zfs_vdev_scrub_max_active is small (default=2), then it is expected to see high wait time at the zpool level.
To make this info useful, you'll need to look at the wait time per queue. See zpool iostat -l (though I'm not
convinced zpool iostat -l is as advertised, but that is another discussion)
-- richard

On Aug 11, 2020, at 6:38 AM, Wildente notifications@github.com wrote:

i am also having this issue. zpool scrub seems runs without any throttle at all and thus
impacting io latency and overall system load.

with zfs 0.7.* i throttled scrubs with those parameters:

zfs_scrub_delay=60
zfs_top_maxinflight=6
zfs_scan_idle=150
this slowed down scrubs without impacting the application (much).

with zfs 0.8 these parameter do not exist anymore. i've been reading the zfs module parameters man page and began playing around with these parameters, but i am unable to slow down the scrub at all:

zfs_vdev_scrub_max_active
zfs_scan_strict_mem_lim
zfs_scan_mem_lim_soft_fact
zfs_scan_mem_lim_fact
zfs_scan_vdev_limit
zfs_scrub_min_time_ms
zfs_no_scrub_prefetch
i also made sure, that the system parameters for queue depth and io scheduler are set as seen above.

$ for i in /sys/block/sd* ; do [[ $(cat $i/queue/rotational) == 1 ]] && cat $i/device/queue_depth ; done | sort | uniq
1

$ for i in /sys/block/sd* ; do [[ $(cat $i/queue/rotational) == 1 ]] && cat $i/queue/scheduler ; done | sort | uniq
[none] mq-deadline

system configuration:

dell md3060e enclosure
sas hba
12 raidz1 pools of 5 nl-sas hdds (4TB) (manufacturer toshiba, seagate, hgst)
os: debian buster
zfs version: 0.8.4-1~bpo10+1
kernel version: 4.19.0-9-amd64
graphs from the prometheus node exporter. i did stop the scrub after some time:

https://user-images.githubusercontent.com/29410350/89903397-e2f72f00-dbe7-11ea-9e79-312406462f24.png
https://user-images.githubusercontent.com/29410350/89903469-f86c5900-dbe7-11ea-8dd0-db06605b6759.png
i could use some help on how to go on with this. which other parameters might be helpful in decreasing the scrub speed? what else can i try?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/openzfs/zfs/issues/10253#issuecomment-671952196, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGTZTPW6H57PPF3WYJBHN3SAFCVHANCNFSM4MQWWWTA.

richardelling on 11 Aug 2020

I'm wondering, I might be totally off, but some pools we've recently created, were created with bad ashift (=9, when the drives had 4k sectors in fact). It were SSDs in both cases, but accessing the drives with 512B sectors absolutely destroyed any hint at performance the devices might have had. Recreating the pool with -o ashift=12 fixed it.

Could you, just to be sure, check the ashift? 1.3k IOPS from a pool with NVMe drives sounds like exactly the situation I'm describing :)

snajpa on 11 Aug 2020

@shodanshok

setting zfs_vdev_scrub_max_active=1 should let scrub to "only" eat 50% of available IOPs

Assuming that all i/os are equal, yes. But the per-byte costs can cause scrub i/o's to eat more than 50% of the available performance. I think that scrub i/o's can aggregate up to 1MB (and are likely to, now that we have "sorted scrub"), vs typical i/o's might be smaller.

Do you think a simple rule as "if any other queues has one or more active/pending IO, skip scrubbing for some msec" can be useful (similar to how 0.7.x throttled scrub)? Thanks.

I think it could be, if we do it right. For example, we might want finer granularity than whole milliseconds. And we'd want to consider both "metadata scanning" and "issuing scrub i/os" phases. Although maybe we could ignore (not limit) the metadata scanning for this purpose? A deliberate "slow scrub" feature might work by automatically adjusting this kind of knob.

ahrens on 11 Aug 2020

@shodanshok

setting zfs_vdev_scrub_max_active=1 should let scrub to "only" eat 50% of available IOPs

Assuming that all i/os are equal, yes. But the per-byte costs can cause scrub i/o's to eat more than 50% of the available performance. I think that scrub i/o's can aggregate up to 1MB (and are likely to, now that we have "sorted scrub"), vs typical i/o's might be smaller.

True.

I think it could be, if we do it right. For example, we might want finer granularity than whole milliseconds. And we'd want to consider both "metadata scanning" and "issuing scrub i/os" phases. Although maybe we could ignore (not limit) the metadata scanning for this purpose? A deliberate "slow scrub" feature might work by automatically adjusting this kind of knob.

I suppose the metadata scan does not need special treatment. On the other side the data scrub phase, being sequential in nature, can really consume vast amout of bandwidth (and IOPs).

shodanshok on 12 Aug 2020

thank you for the overwhelming number of messages. i'll try to answer
them all

@ahrens: yes some more information would be useful. i was using
the parameters from my post to slow down the scrub and so it would
finish within one week (zfs 0.7) instead of ~15 hours (zfs 0.8).

i also agree, that the weight of each io request is relevant for this
case, since we are now scrubbing sequentially in large blocks.

@shodanshok: maybe i should have posted graphs of the read/write ops
instead of read data rate. these kind of servers mainly handle small random
reads and write-appends (similar to the mbox format).

i will set `zfs_vdev_scrub_max_active=1 and run your fio test command
tomorrow morning.

and yes, all pools did scrub at the same time.

@richardelling: thanks. i always thought, that this is the actual
wait io from the underlying disks.

@snajpa: in my case those pools were create about a year ago,
on hdds and they were created with ashift=12

wildente on 12 Aug 2020

@shodanshok i've set zfs_vdev_scrub_max_active=1 and ran the fio command on a one of the 12 zpools:

# zpool iostat zpool1-store1 -l 1

[snip]

zpool1-store1  11,5T  6,60T    313      0  6,94M      0   14ms      -   14ms      -    2us      -      -      -    2us      -
zpool1-store1  11,5T  6,60T    312      0  7,13M      0   14ms      -   14ms      -    2us      -      -      -    2us      -
zpool1-store1  11,5T  6,60T    301      0  6,94M      0   15ms      -   15ms      -    2us      -      -      -    2us      -
zpool1-store1  11,5T  6,60T    335      0  7,87M      0   14ms      -   14ms      -    2us      -      -      -    2us      -
zpool1-store1  11,5T  6,60T    324      0  7,24M      0   14ms      -   14ms      -    2us      -      -      -    2us      -
zpool1-store1  11,5T  6,60T    300      0  6,62M      0   16ms      -   16ms      -    2us      -      -      -    2us      -
zpool1-store1  11,5T  6,60T    330      0  7,44M      0   15ms      -   15ms      -    2us      -      -      -    2us      -
zpool1-store1  11,5T  6,60T    307      0  6,88M      0   16ms      -   16ms      -    2us      -      -      -    2us      -
zpool1-store1  11,5T  6,60T    314      0  6,92M      0   15ms      -   15ms      -    2us      -      -      -    2us      -
zpool1-store1  11,5T  6,60T    325      0  7,50M      0   14ms      -   14ms      -    2us      -      -      -    2us      -
zpool1-store1  11,5T  6,60T    315      0  6,85M      0   15ms      -   15ms      -    2us      -      -      -    2us      -
zpool1-store1  11,5T  6,60T    310      0  6,85M      0   14ms      -   14ms      -    2us      -      -      -    2us      -
zpool1-store1  11,5T  6,60T    142    275  4,29M  2,13M   20ms  143ms   20ms   26ms    2us      -      -  117ms    3us      -
zpool1-store1  11,5T  6,60T  1,29K     21  91,1M  87,7K    5ms   68ms    3ms   38ms    2us    2us      -  402ms    1ms      -
zpool1-store1  11,5T  6,60T  1,65K      0   115M      0    4ms      -    3ms      -    2us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,46K      0   102M      0    5ms      -    3ms      -    2us      -      -      -    1ms      -
                 capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub   trim
pool           alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait   wait
-------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
zpool1-store1  11,5T  6,60T  1,35K      0  97,5M      0    5ms      -    4ms      -    2us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,47K      0   110M      0    5ms      -    3ms      -    3us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,59K      0   113M      0    4ms      -    3ms      -    2us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,50K      0   108M      0    4ms      -    3ms      -    2us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,43K      0   104M      0    5ms      -    3ms      -    3us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,51K      0   105M      0    4ms      -    3ms      -    2us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,20K      0  80,7M      0    5ms      -    4ms      -    2us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,25K      0  85,1M      0    5ms      -    3ms      -    2us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,13K      0  77,7M      0    6ms      -    4ms      -    3us      -      -      -    2ms      -
zpool1-store1  11,5T  6,60T  1,10K      0  77,5M      0    6ms      -    4ms      -    2us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,30K      0  90,1M      0    5ms      -    3ms      -    2us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,08K      0  75,0M      0    6ms      -    4ms      -    2us      -      -      -    2ms      -

[snip]

zpool1-store1  11,5T  6,60T  1,14K      0  79,4M      0    6ms      -    4ms      -    2us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,13K      0  79,9M      0    5ms      -    3ms      -    2us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T    957      0  66,6M      0    6ms      -    4ms      -    2us      -      -      -    2ms      -
zpool1-store1  11,5T  6,60T  1,20K      0  89,7M      0    5ms      -    4ms      -    3us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,03K      0  70,3M      0    6ms      -    4ms      -    2us      -      -      -    2ms      -
zpool1-store1  11,5T  6,60T  1,19K      0  80,0M      0    6ms      -    4ms      -    2us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,25K      0  87,8M      0    5ms      -    4ms      -    2us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,71K      0   121M      0    4ms      -    3ms      -    3us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,29K     13  91,5M  55,8K    5ms   66ms    3ms   57ms    2us      -      -   11ms    1ms      -
zpool1-store1  11,5T  6,60T    155    327  4,64M  2,09M   19ms  167ms   19ms   25ms    2us    2us      -  150ms    3us      -
zpool1-store1  11,5T  6,60T    306      0  7,10M      0   15ms      -   15ms      -    2us      -      -      -    2us      -
zpool1-store1  11,5T  6,60T    317      0  7,31M      0   14ms      -   14ms      -    2us      -      -      -    2us      -
zpool1-store1  11,5T  6,60T    304     31  6,89M   136K   15ms   45ms   15ms   28ms    2us      -      -   15ms    2us      -
zpool1-store1  11,5T  6,60T    479    252  30,4M  2,01M    9ms  159ms    7ms   27ms    3us    2us      -  141ms    2ms      -
zpool1-store1  11,5T  6,60T  1,50K      0   107M      0    4ms      -    3ms      -    2us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,28K      0  90,7M      0    5ms      -    4ms      -    2us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,39K      0  99,0M      0    5ms      -    3ms      -    2us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,20K      0  79,9M      0    5ms      -    4ms      -    2us      -      -      -    1ms      -
zpool1-store1  11,5T  6,60T  1,11K      0  79,7M      0    6ms      -    4ms      -    2us      -      -      -    2ms      -

before the start of the scrub, we have ~300-330 read ops. after the start of the scrub, it jumps to 1k-1.7k read ops. i am guessing the write operation in between are checkpoints.

# zpool status zpool1-store1 | grep pool: -A4
  pool: zpool1-store1
 state: ONLINE
  scan: scrub in progress since Thu Aug 13 09:09:05 2020
        580G scanned at 642M/s, 59,5G issued at 65,9M/s, 11,5T total
        0B repaired, 0,50% done, 2 days 02:41:31 to go

# fio --name=test --filename=/zpool1-store1/test.img --rw=randread --size=32G
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.12
Starting 1 process
^Cbs: 1 (f=1): [r(1)][1.2%][r=256KiB/s][r=64 IOPS][eta 01d:06h:10m:44s]
fio: terminating on signal 2

test: (groupid=0, jobs=1): err= 0: pid=5417: Thu Aug 13 09:28:25 2020
  read: IOPS=76, BW=305KiB/s (312kB/s)(401MiB/1345644msec)
    clat (usec): min=2, max=514086, avg=13104.19, stdev=13722.49
     lat (usec): min=2, max=514086, avg=13104.61, stdev=13722.55
    clat percentiles (usec):
     |  1.00th=[    11],  5.00th=[    26], 10.00th=[    38], 20.00th=[    41],
     | 30.00th=[    52], 40.00th=[  8586], 50.00th=[ 13698], 60.00th=[ 17171],
     | 70.00th=[ 20055], 80.00th=[ 24249], 90.00th=[ 28967], 95.00th=[ 32637],
     | 99.00th=[ 39584], 99.50th=[ 44827], 99.90th=[ 68682], 99.95th=[ 99091],
     | 99.99th=[421528]
   bw (  KiB/s): min=    8, max=38312, per=100.00%, avg=305.07, stdev=930.11, samples=2691
   iops        : min=    2, max= 9578, avg=76.21, stdev=232.53, samples=2691
  lat (usec)   : 4=0.07%, 10=0.88%, 20=1.72%, 50=22.95%, 100=11.64%
  lat (usec)   : 250=0.06%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.03%, 10=5.91%, 20=26.39%, 50=30.03%
  lat (msec)   : 100=0.25%, 250=0.02%, 500=0.03%, 750=0.01%
  cpu          : usr=0.05%, sys=0.62%, ctx=64785, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=102656,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=305KiB/s (312kB/s), 305KiB/s-305KiB/s (312kB/s-312kB/s), io=401MiB (420MB), run=1345644-1345644msec

can i provide anything else to help with this issue?

wildente on 13 Aug 2020

@wildente so during the scrub, fio shows 76 IOPs. What about re-running fio without a background scrub? How many IOPs do you have?

Your latency numbers seems ok. Can you show, both with and without scrub running, the output of "zpool iostat -q" (to get queue stats)?

shodanshok on 13 Aug 2020

@shodanshok yes, i will do that on monday morning

wildente on 14 Aug 2020

@shodanshok sorry for the long delay. i can reproduce the IOPS from above, but i think that is expected of a raidz with 5 drives.

```# zpool iostat zpool1-store1 -q -l 1
capacity operations bandwidth total_wait disk_wait syncq_wait asyncq_wait scrub trim syncq_read syncq_write asyncq_read asyncq_write scrubq_read trimq_write
pool alloc free read write read write read write read write read write read write wait wait pend activ pend activ pend activ pend activ pend activ pend activ

zpool1-store1 11,4T 6,69T 61 5 2,96M 50,9K 1s 110ms 7ms 21ms 11ms 2us 7ms 97ms 1s - 0 0 0 0 0 0 0 0 0 0 0 0
zpool1-store1 11,4T 6,69T 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 0 0
zpool1-store1 11,4T 6,69T 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 0 0
zpool1-store1 11,4T 6,69T 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 0 0
zpool1-store1 11,4T 6,69T 0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 0 0

[snip]

zpool1-store1 11,4T 6,69T 308 zpool1-store1 11,4T 6,69T 274 zpool1-store1 11,4T 6,69T 287 zpool1-store1 11,4T 6,69T 318 zpool1-store1 11,4T 6,69T 288 zpool1-store1 11,4T 6,69T 301 zpool1-store1 11,4T 6,69T 294 zpool1-store1 11,4T 6,69T 322 zpool1-store1 11,4T 6,69T 303 zpool1-store1 11,4T 6,69T 295 zpool1-store1 11,4T 6,69T 315 zpool1-store1 11,4T 6,69T 303 zpool1-store1 11,4T 6,69T 307 zpool1-store1 11,4T 6,69T 303 zpool1-store1 11,4T 6,69T 333 zpool1-store1 11,4T 6,69T 304 zpool1-store1 11,4T 6,69T 307 zpool1-store1 11,4T 6,69T 273 zpool1-store1 11,4T 6,69T 304 zpool1-store1 11,4T 6,69T 321 zpool1-store1 11,4T 6,69T 282 zpool1-store1 11,4T 6,69T 308 zpool1-store1 11,4T 6,69T 300 zpool1-store1 11,4T 6,69T 277 zpool1-store1 11,4T 6,69T 306 zpool1-store1 11,4T 6,69T 322 zpool1-store1 11,4T 6,69T 294 zpool1-store1 11,4T 6,69T 279 zpool1-store1 11,4T 6,69T 316 zpool1-store1 11,4T 6,69T 302 zpool1-store1 11,4T 6,69T 302 zpool1-store1 11,4T 6,69T 305 zpool1-store1 11,4T 6,69T 289 zpool1-store1 11,4T 6,69T 312 zpool1-store1 11,4T 6,69T 311 zpool1-store1 11,4T 6,69T 302 zpool1-store1 11,4T 6,69T 285 zpool1-store1 11,4T 6,69T 286 zpool1-store1 11,4T 6,69T 286 zpool1-store1 11,4T 6,69T 303 zpool1-store1 11,4T 6,69T 303 zpool1-store1 11,4T 6,69T 289 zpool1-store1 11,4T 6,69T 281 zpool1-store1 11,4T 6,69T 303 zpool1-store1 11,4T 6,69T 305 zpool1-store1 11,4T 6,69T 93 zpool1-store1 11,4T 6,69T zpool1-store1 11,4T 6,69T zpool1-store1 11,4T 6,69T zpool1-store1 11,4T 6,69T zpool1-store1 11,4T 6,69T 0 9,63M 0 10ms - 10ms - 2us - - - - - 0 3 0 0 0 0 0 0 0 0 0 0
0 8,57M 0 11ms - 11ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 8,97M 0 10ms - 10ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 9,94M 0 10ms - 10ms - 2us - - - - - 0 1 0 0 0 0 0 0 0 0 0 0
0 9,00M 0 11ms - 11ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 9,41M 0 10ms - 10ms - 2us - - - - - 0 2 0 0 0 0 0 0 0 0 0 0
0 9,19M 0 10ms - 10ms - 2us - - - - - 0 3 0 0 0 0 0 0 0 0 0 0
0 10,1M 0 9ms - 9ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 9,47M 0 10ms - 10ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 9,22M 0 10ms - 10ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 9,85M 0 10ms - 10ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 9,47M 0 10ms - 10ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 9,60M 0 10ms - 10ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 9,47M 0 10ms - 10ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 10,4M 0 9ms - 9ms - 2us - - - - - 0 1 0 0 0 0 0 0 0 0 0 0
0 9,50M 0 10ms - 10ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 9,60M 0 10ms - 10ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 8,54M 0 11ms - 11ms - 2us - - - - - 0 2 0 0 0 0 0 0 0 0 0 0
0 9,52M 0 10ms - 10ms - 2us - - - - - 0 1 0 0 0 0 0 0 0 0 0 0
0 10,0M 0 9ms - 9ms - 2us - - - - - 0 3 0 0 0 0 0 0 0 0 0 0
0 8,82M 0 11ms - 11ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 9,63M 0 9ms - 9ms - 2us - - - - - 0 3 0 0 0 0 0 0 0 0 0 0
0 9,38M 0 10ms - 10ms - 2us - - - - - 0 2 0 0 0 0 0 0 0 0 0 0
0 8,66M 0 11ms - 11ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 9,56M 0 10ms - 10ms - 2us - - - - - 0 1 0 0 0 0 0 0 0 0 0 0
0 10,1M 0 9ms - 9ms - 2us - - - - - 0 2 0 0 0 0 0 0 0 0 0 0
0 9,19M 0 10ms - 10ms - 2us - - - - - 0 3 0 0 0 0 0 0 0 0 0 0
0 8,72M 0 11ms - 11ms - 2us - - - - - 0 3 0 0 0 0 0 0 0 0 0 0
0 9,88M 0 9ms - 9ms - 2us - - - - - 0 2 0 0 0 0 0 0 0 0 0 0
0 9,44M 0 10ms - 10ms - 2us - - - - - 0 3 0 0 0 0 0 0 0 0 0 0
0 9,44M 0 10ms - 10ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 9,53M 0 9ms - 9ms - 2us - - - - - 0 2 0 0 0 0 0 0 0 0 0 0
0 9,04M 0 10ms - 10ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 9,75M 0 10ms - 10ms - 2us - - - - - 0 3 0 0 0 0 0 0 0 0 0 0
0 9,72M 0 10ms - 10ms - 2us - - - - - 0 3 0 0 0 0 0 0 0 0 0 0
0 9,44M 0 10ms - 10ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 8,91M 0 10ms - 10ms - 2us - - - - - 0 2 0 0 0 0 0 0 0 0 0 0
0 8,94M 0 10ms - 10ms - 2us - - - - - 0 3 0 0 0 0 0 0 0 0 0 0
0 8,94M 0 10ms - 10ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 9,47M 0 10ms - 10ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 9,47M 0 10ms - 10ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 9,04M 0 11ms - 11ms - 2us - - - - - 0 2 0 0 0 0 0 0 0 0 0 0
0 8,79M 0 10ms - 10ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 9,47M 0 10ms - 10ms - 2us - - - - - 0 4 0 0 0 0 0 0 0 0 0 0
0 9,53M 0 10ms - 10ms - 2us - - - - - 0 2 0 0 0 0 0 0 0 0 0 0
0 2,93M 0 10ms - 10ms - 2us - - - - - 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 - - - - - - - - - - 0 0 0 0 0 0 0 0 0 0 0 0

fio --name=test --filename=/zpool1-store1/test.img --rw=randread --size=32G

test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.12
Starting 1 process
^Cbs: 1 (f=1): [r(1)][0.1%][r=300KiB/s][r=75 IOPS][eta 01d:07h:21m:08s]
fio: terminating on signal 2

test: (groupid=0, jobs=1): err= 0: pid=2313: Mon Aug 24 13:31:35 2020
read: IOPS=74, BW=297KiB/s (305kB/s)(44.8MiB/154330msec)
clat (usec): min=8, max=68477, avg=13441.23, stdev=5411.49
lat (usec): min=8, max=68478, avg=13441.77, stdev=5411.49
clat percentiles (usec):
| 1.00th=[ 19], 5.00th=[ 6783], 10.00th=[ 8029], 20.00th=[ 9241],
| 30.00th=[10159], 40.00th=[11600], 50.00th=[13042], 60.00th=[15139],
| 70.00th=[16319], 80.00th=[17171], 90.00th=[17957], 95.00th=[20317],
| 99.00th=[35390], 99.50th=[36963], 99.90th=[39584], 99.95th=[43779],
| 99.99th=[47449]
bw ( KiB/s): min= 112, max= 368, per=100.00%, avg=297.31, stdev=42.17, samples=308
iops : min= 28, max= 92, avg=74.26, stdev=10.54, samples=308
lat (usec) : 10=0.03%, 20=1.94%, 50=0.01%, 100=0.02%
lat (msec) : 2=0.01%, 4=0.14%, 10=26.45%, 20=66.32%, 50=5.08%
lat (msec) : 100=0.01%
cpu : usr=0.08%, sys=0.85%, ctx=11528, majf=0, minf=9
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=11477,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
READ: bw=297KiB/s (305kB/s), 297KiB/s-297KiB/s (305kB/s-305kB/s), io=44.8MiB (47.0MB), run=154330-154330msec

the layout consists of 12 zpools, each configured as raidz1 with 5 drives:
```# zpool status zpool1-store1 -P | grep config -A10
config:

        NAME                            STATE     READ WRITE CKSUM
        zpool1-store1                   ONLINE       0     0     0
          raidz1-0                      ONLINE       0     0     0
            /dev/MD3060e/D01-S01-E01p1  ONLINE       0     0     0
            /dev/MD3060e/D02-S01-E13p1  ONLINE       0     0     0
            /dev/MD3060e/D03-S01-E25p1  ONLINE       0     0     0
            /dev/MD3060e/D04-S01-E37p1  ONLINE       0     0     0
            /dev/MD3060e/D05-S01-E49p1  ONLINE       0     0     0

wildente on 24 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings