Type | Version/Name
--- | ---
Distribution Name | Gentoo Linux
Distribution Version | amd64 (stable) 17.1/no-multilib
Linux Kernel | 4.19.72-gentoo
Architecture | x86_64
ZFS Version | 8.2 (the same behavior in 8.1)
SPL Version | N/A
Performance is not satisfying. Cannot saturate full HDDs' bandwidth. Reads are less than 50% of write performance.
I have 8 ST4000VN008. 6 of them performs according to specification when reading from beginning ~180MiB/s. 2 of them under-perform with ~160MiB/s (should I RMA?). Write performance is lower with ~160MiB/s (slow disks are slower in all tests).
Benchmark provides the same results when run on all disks at the same time in parallel.
These disks are configured as RAIDZ-2 with the following command:
zpool create -n -m /mnt/storage -o ashift=12 -o autoexpand=on -o autotrim=on \
-O acltype=posixacl -O atime=off -O compression=lz4 -O dedup=off -O dnodesize=auto \
-O encryption=aes-256-gcm -O keyformat=raw -O keylocation=file:///root/storage.key \
-O logbias=latency -O xattr=sa -O casesensitivity=sensitive storage raidz2 \
ata-ST4000VN008-ZDR166_ZGY5C3W7 ata-ST4000VN008-ZDR166_ZGY5E06J \
ata-ST4000VN008-ZDR166_ZDH7EMPY ata-ST4000VN008-ZDR166_ZDH7F08Z \
ata-ST4000VN008-ZDR166_ZDH7ESM1 ata-ST4000VN008-ZDR166_ZDH7FA7S \
ata-ST4000VN008-ZDR166_ZDH7F9P5 ata-ST4000VN008-ZDR166_ZDH7F9BN \
log nvme-INTEL_SSDPED1D280GA_PHMB7443018J280CGN
(but with compression disabled for benchmarks).
When the filesystem created above is benchmarked using
dd if=/dev/zero of=zero bs=10M
then
zpool iostat -vl 10
displays the following data:
capacity operations bandwidth total_wait disk_wait syncq_wait asyncq_wait scrub trim
pool alloc free read write read write read write read write read write read write wait wait
--------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
storage 24.0G 29.1T 1 10.3K 4.80K 951M 15ms 5ms 15ms 4ms 3us 2us - 614us - -
raidz2 24.0G 29.1T 1 10.3K 4.80K 951M 15ms 5ms 15ms 4ms 3us 2us - 614us - -
ata-ST4000VN008-2DR166_ZGY5C3W7 - - 0 1.32K 818 119M 37ms 4ms 37ms 4ms 3us 4us - 507us - -
ata-ST4000VN008-2DR166_ZGY5E06J - - 0 1.31K 409 119M 50ms 4ms 50ms 4ms 3us 1us - 548us - -
ata-ST4000VN008-2DR166_ZDH7EMPY - - 0 1.27K 409 119M 196us 5ms 196us 4ms 3us 2us - 665us - -
ata-ST4000VN008-2DR166_ZDH7F08Z - - 0 1.34K 1.20K 119M 4ms 4ms 4ms 4ms 3us 1us - 487us - -
ata-ST4000VN008-2DR166_ZDH7ESM1 - - 0 1.31K 409 119M 50ms 4ms 50ms 4ms 3us 1us - 545us - -
ata-ST4000VN008-2DR166_ZDH7FA7S - - 0 1.27K 818 119M 196us 5ms 196us 4ms 3us 1us - 635us - -
ata-ST4000VN008-2DR166_ZDH7F9P5 - - 0 1.24K 818 119M 393us 5ms 393us 5ms 3us 1us - 730us - -
ata-ST4000VN008-2DR166_ZDH7F9BN - - 0 1.22K 0 119M - 6ms - 5ms - 1us - 818us - -
logs - - - - - - - - - - - - - - - -
nvme-INTEL_SSDPED1D280GA_PHMB7443018J280CGN 0 260G 0 0 0 0 - - - - - - - - - -
--------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
Note: Sequential write peaks at 120MiB/s per HDD.
When reading
dd if=zero of=/dev/null bs=10M
then
zpool iostat -vl 10
displays the following data:
capacity operations bandwidth total_wait disk_wait syncq_wait asyncq_wait scrub trim
pool alloc free read write read write read write read write read write read write wait wait
--------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
storage 33.9G 29.1T 7.28K 0 362M 0 1ms - 1ms - 9ms - 368us - - -
raidz2 33.9G 29.1T 7.28K 0 362M 0 1ms - 1ms - 9ms - 368us - - -
ata-ST4000VN008-2DR166_ZGY5C3W7 - - 943 0 44.0M 0 1ms - 1ms - 6ms - 310us - - -
ata-ST4000VN008-2DR166_ZGY5E06J - - 964 0 44.8M 0 1ms - 1ms - 4ms - 316us - - -
ata-ST4000VN008-2DR166_ZDH7EMPY - - 946 0 45.0M 0 1ms - 1ms - 6ms - 332us - - -
ata-ST4000VN008-2DR166_ZDH7F08Z - - 911 0 46.2M 0 2ms - 1ms - 17ms - 423us - - -
ata-ST4000VN008-2DR166_ZDH7ESM1 - - 928 0 46.6M 0 2ms - 1ms - 9ms - 453us - - -
ata-ST4000VN008-2DR166_ZDH7FA7S - - 887 0 44.3M 0 2ms - 1ms - 8ms - 433us - - -
ata-ST4000VN008-2DR166_ZDH7F9P5 - - 893 0 45.3M 0 1ms - 1ms - 9ms - 346us - - -
ata-ST4000VN008-2DR166_ZDH7F9BN - - 984 0 45.8M 0 1ms - 1ms - 13ms - 339us - - -
logs - - - - - - - - - - - - - - - -
nvme-INTEL_SSDPED1D280GA_PHMB7443018J280CGN 0 260G 0 0 0 0 - - - - - - - - - -
--------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
Note: Sequential read peaks at ~45MiB/s per HDD.
Further experimentation revealed the following information:
zfs_vdev_raidz_impl is avx512bw (determined to be the fastest). Changing it to scalar doesn't impact performance.dnodesize to legacy doesn't impact performance.recordsize to 1M slightly increases performance (~130MiB/s write, 50-55MiB/s read).My build is roughly:
Copy-paste the above commands. Note that they depend on availability of HDDs with specific S/Ns, so you will need to adjust.
None/Not aware of any.
You should look at IOPS too, please show iostat -x 1 on your disks, for example, during tests. If %util is nearly 100% - you've got all the IOPS your disks may give. ZFS is CoW filesystem, so on (nearly) each uncached read you should read it's metadata, and read will usually be random, even if you try to read logically sequential data. So the more recordsize is - the better your seq read/write is.
And one more thing - sometimes 1 thread can't give you full pool performance. You may want to tune params for it, for example prefetch read https://github.com/zfsonlinux/zfs/wiki/ZFS-on-Linux-Module-Parameters#zfetch_max_distance . It depends on your load.
So looks like not a bug.
Numbers vary greatly with 1 second intervals (workload seems to be bursty). When interval is set to 10 seconds these are:
For write:
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 59.59 2.08 0.00 38.33
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme1n1 1.20 0.00 92.40 0.00 21.90 0.00 94.81 0.00 0.83 0.00 0.00 77.00 0.00 0.75 0.09
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.80 1326.40 0.80 120472.80 0.00 3.30 0.00 0.25 56.75 4.15 5.51 1.00 90.83 0.47 62.77
sde 0.80 1398.30 0.80 120432.00 0.00 2.10 0.00 0.15 67.25 3.55 4.96 1.00 86.13 0.42 58.29
sdf 0.70 1312.40 0.40 120686.40 0.00 2.20 0.00 0.17 76.71 4.32 5.64 0.57 91.96 0.49 63.87
sdg 0.90 1353.80 1.20 120470.80 0.00 2.70 0.00 0.20 68.11 4.11 5.57 1.33 88.99 0.47 63.77
sdh 0.60 1268.90 0.00 120447.20 0.00 2.60 0.00 0.20 104.33 5.01 6.36 0.00 94.92 0.55 70.17
sdi 0.90 1307.10 1.20 120485.20 0.00 2.60 0.00 0.20 53.78 4.31 5.64 1.33 92.18 0.49 64.03
sdb 0.90 1393.80 1.20 120481.20 0.00 2.90 0.00 0.21 58.33 3.64 5.08 1.33 86.44 0.42 59.01
sdc 0.70 1360.80 0.40 120433.20 0.00 2.10 0.00 0.15 55.71 3.89 5.31 0.57 88.50 0.45 61.39
dm-0 23.10 0.00 92.40 0.00 0.00 0.00 0.00 0.00 0.93 0.00 0.02 4.00 0.00 0.04 0.09
dm-1 23.10 0.00 92.40 0.00 0.00 0.00 0.00 0.00 0.93 0.00 0.02 4.00 0.00 0.04 0.09
for read:
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 27.94 3.30 0.00 68.76
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
nvme1n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 835.40 0.00 42555.60 0.00 0.00 0.00 0.00 0.00 1.33 0.00 1.11 50.94 0.00 0.58 48.41
sde 896.60 0.00 42510.80 0.00 0.00 0.00 0.00 0.00 1.22 0.00 1.10 47.41 0.00 0.52 46.53
sdf 816.60 0.00 43789.60 0.00 0.00 0.00 0.00 0.00 1.62 0.00 1.31 53.62 0.00 0.69 55.98
sdg 877.50 0.00 41629.60 0.00 0.00 0.00 0.00 0.00 1.28 0.00 1.12 47.44 0.00 0.54 47.49
sdh 873.50 0.00 41980.80 0.00 0.10 0.00 0.01 0.00 1.32 0.00 1.15 48.06 0.00 0.56 48.86
sdi 850.10 0.00 43307.60 0.00 0.10 0.00 0.01 0.00 1.42 0.00 1.18 50.94 0.00 0.59 49.90
sdb 896.30 0.00 41473.20 0.00 0.10 0.00 0.01 0.00 1.09 0.00 0.98 46.27 0.00 0.48 43.36
sdc 898.10 0.00 41845.60 0.00 0.20 0.00 0.02 0.00 1.25 0.00 1.12 46.59 0.00 0.52 46.63
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Tunning zfetch_max_distance (default value is 8MiB):
%util 90-95%Hi,
I'm experiencing the same behavior in my raidz1 pool, Large file read speed is only about 140MB/s, which is equal to the performance of one disk. The system is at no load with more than 10GB RAM
available.
System: Proxmox 6.0
Kernel: 5.0
CPU: Xeon E3-1285
RAM: 32GB 1600MHZ ECC
Disk: 3* WD RED 3TB 5400RPM
HBA: SAS 9305-24i
ZFS: 0.8.1
pool: Workspace
state: ONLINE
scan: scrub repaired 0B in 0 days 09:59:32 with 0 errors on Sun Sep 8 04:14:16 2019
config:
NAME STATE READ WRITE CKSUM
Workspace ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
wwn-0x50014ee65a70ec05 ONLINE 0 0 0
wwn-0x50014ee2b6deadfc ONLINE 0 0 0
wwn-0x50014ee264864e38 ONLINE 0 0 0
errors: No known data errors
During large file read, as you can see the disks are not in full load:
```avg-cpu: %user %nice %system %iowait %steal %idle
1.13 0.00 3.65 43.32 0.00 51.89
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sde 89.00 0.00 45568.00 0.00 0.00 0.00 0.00 0.00 34.58 0.00 2.89 512.00 0.00 5.44 48.40
sdf 91.00 0.00 46592.00 0.00 0.00 0.00 0.00 0.00 29.37 0.00 2.48 512.00 0.00 5.27 48.00
sdg 96.00 0.00 49152.00 0.00 0.00 0.00 0.00 0.00 40.56 0.00 3.70 512.00 0.00 5.33 51.20
And cpu usage is very low:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2493 mengsk 20 0 4259668 28548 9996 S 1.1 0.1 0:19.07 /usr/sbin/smbd --foreground --no-process-group
6352 root 20 0 4903636 47884 4968 S 1.0 0.1 708:21.85 /usr/bin/kvm -id 101 -name vsrvl -chardev socket,id=qmp,path=/var/run/qemu-server/101.qmp,+
1936 root 39 19 0 0 0 S 0.1 0.0 25:09.83 [kipmi0]
3072 root 0 -20 0 0 0 S 0.1 0.0 0:32.87 [z_rd_int]
3074 root 0 -20 0 0 0 S 0.1 0.0 0:32.83 [z_rd_int]
3075 root 0 -20 0 0 0 S 0.1 0.0 0:33.00 [z_rd_int]
3076 root 0 -20 0 0 0 S 0.1 0.0 0:32.86 [z_rd_int]
5825 www-data 20 0 357536 112404 9472 S 0.1 0.3 0:03.35 pveproxy worker
6467 root 20 0 0 0 0 S 0.1 0.0 23:59.06 [vhost-6352]
10 root 20 0 0 0 0 I 0.0 0.0 1:14.74 [rcu_sched]
557 root 1 -19 0 0 0 S 0.0 0.0 0:54.80 [z_wr_iss]
558 root 1 -19 0 0 0 S 0.0 0.0 0:54.79 [z_wr_iss]
563 root 0 -20 0 0 0 S 0.0 0.0 0:28.06 [z_wr_int]
3071 root 0 -20 0 0 0 S 0.0 0.0 0:32.80 [z_rd_int]
3073 root 0 -20 0 0 0 S 0.0 0.0 0:32.89 [z_rd_int]
3077 root 0 -20 0 0 0 S 0.0 0.0 0:32.76 [z_rd_int]
3078 root 0 -20 0 0 0 S 0.0 0.0 0:32.83 [z_rd_int]
6325 root 20 0 325804 68868 6456 S 0.0 0.2 0:14.49 pve-ha-crm
6464 root 20 0 0 0 0 S 0.0 0.0 35:39.83 [vhost-6352]
6466 root 20 0 0 0 0 S 0.0 0.0 27:37.46 [vhost-6352]
13163 root 20 0 11720 3492 2500 R 0.0 0.0 0:00.13 top
1 root 20 0 170592 8028 4724 S 0.0 0.0 0:15.98 /sbin/init
2 root 20 0 0 0 0 S 0.0 0.0 1:36.34 [kthreadd]
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 [rcu_gp]
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 [rcu_par_gp]
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 [kworker/0:0H-kblockd]
8 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 [mm_percpu_wq]
9 root 20 0 0 0 0 S 0.0 0.0 0:04.41 [ksoftirqd/0]
```
I'll tune zfetch_max_distance to see if it helps.
Returning after making some more tunings and benchmarks.
In general I found that increasing:
zfetch_array_rd_szzfetch_max_distancezfs_pd_bytes_maxTo values ~1G increases single HDD throughput to 120-140MiB/s decreasing IOPS at the same time. Going beyond that range seems to be difficult.
Before taking a look at zio_taskq_batch_pct (which is less convenient to adjust) I did some tests of sync workload:
dd if=/dev/zero of=zero bs=10M count=5000 oflag=sync
With result of roughly 90MiB/s. This is also throughput of Optane 900P SSD drive. iostat -mx 5 reveals that:
avg-cpu: %user %nice %system %iowait %steal %idle
0.01 0.00 6.65 0.01 0.00 93.33
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.00 666.80 0.00 83.35 0.00 0.00 0.00 0.00 0.00 0.07 1.00 0.00 128.00 1.50 99.98
nvme1n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Note this low IOPS combined with 100% utilization. zpool iostat -vyr 5 shows:
storage sync_read sync_write async_read async_write scrub trim
req_size ind agg ind agg ind agg ind agg ind agg ind agg
--------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
512 0 0 0 0 0 0 0 0 0 0 0 0
1K 0 0 0 0 0 0 0 0 0 0 0 0
2K 0 0 0 0 0 0 0 0 0 0 0 0
4K 0 0 6 0 0 0 46 0 0 0 0 0
8K 0 0 0 0 0 0 3 20 0 0 0 0
16K 0 0 0 0 0 0 497 23 0 0 0 0
32K 0 0 0 0 0 0 0 116 0 0 0 0
64K 0 0 0 0 0 0 0 218 0 0 0 0
128K 0 0 695 0 0 0 0 149 0 0 0 0
256K 0 0 0 0 0 0 0 66 0 0 0 0
512K 0 0 0 0 0 0 0 113 0 0 0 0
1M 0 0 0 0 0 0 0 0 0 0 0 0
2M 0 0 0 0 0 0 0 0 0 0 0 0
4M 0 0 0 0 0 0 0 0 0 0 0 0
8M 0 0 0 0 0 0 0 0 0 0 0 0
16M 0 0 0 0 0 0 0 0 0 0 0 0
---------------------------------------------------------------------------------------------------------------------------------
raidz2 sync_read sync_write async_read async_write scrub trim
req_size ind agg ind agg ind agg ind agg ind agg ind agg
--------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
512 0 0 0 0 0 0 0 0 0 0 0 0
1K 0 0 0 0 0 0 0 0 0 0 0 0
2K 0 0 0 0 0 0 0 0 0 0 0 0
4K 0 0 6 0 0 0 46 0 0 0 0 0
8K 0 0 0 0 0 0 3 20 0 0 0 0
16K 0 0 0 0 0 0 497 23 0 0 0 0
32K 0 0 0 0 0 0 0 116 0 0 0 0
64K 0 0 0 0 0 0 0 218 0 0 0 0
128K 0 0 0 0 0 0 0 149 0 0 0 0
256K 0 0 0 0 0 0 0 66 0 0 0 0
512K 0 0 0 0 0 0 0 113 0 0 0 0
1M 0 0 0 0 0 0 0 0 0 0 0 0
2M 0 0 0 0 0 0 0 0 0 0 0 0
4M 0 0 0 0 0 0 0 0 0 0 0 0
8M 0 0 0 0 0 0 0 0 0 0 0 0
16M 0 0 0 0 0 0 0 0 0 0 0 0
---------------------------------------------------------------------------------------------------------------------------------
ata-ST4000VN008-2DR166_ZGY5C3W7 sync_read sync_write async_read async_write scrub trim
req_size ind agg ind agg ind agg ind agg ind agg ind agg
--------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
512 0 0 0 0 0 0 0 0 0 0 0 0
1K 0 0 0 0 0 0 0 0 0 0 0 0
2K 0 0 0 0 0 0 0 0 0 0 0 0
4K 0 0 0 0 0 0 6 0 0 0 0 0
8K 0 0 0 0 0 0 0 2 0 0 0 0
16K 0 0 0 0 0 0 64 2 0 0 0 0
32K 0 0 0 0 0 0 0 16 0 0 0 0
64K 0 0 0 0 0 0 0 24 0 0 0 0
128K 0 0 0 0 0 0 0 20 0 0 0 0
256K 0 0 0 0 0 0 0 9 0 0 0 0
512K 0 0 0 0 0 0 0 12 0 0 0 0
1M 0 0 0 0 0 0 0 0 0 0 0 0
2M 0 0 0 0 0 0 0 0 0 0 0 0
4M 0 0 0 0 0 0 0 0 0 0 0 0
8M 0 0 0 0 0 0 0 0 0 0 0 0
16M 0 0 0 0 0 0 0 0 0 0 0 0
---------------------------------------------------------------------------------------------------------------------------------
ata-ST4000VN008-2DR166_ZGY5E06J sync_read sync_write async_read async_write scrub trim
req_size ind agg ind agg ind agg ind agg ind agg ind agg
--------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
512 0 0 0 0 0 0 0 0 0 0 0 0
1K 0 0 0 0 0 0 0 0 0 0 0 0
2K 0 0 0 0 0 0 0 0 0 0 0 0
4K 0 0 0 0 0 0 5 0 0 0 0 0
8K 0 0 0 0 0 0 0 2 0 0 0 0
16K 0 0 0 0 0 0 60 4 0 0 0 0
32K 0 0 0 0 0 0 0 13 0 0 0 0
64K 0 0 0 0 0 0 0 25 0 0 0 0
128K 0 0 0 0 0 0 0 18 0 0 0 0
256K 0 0 0 0 0 0 0 8 0 0 0 0
512K 0 0 0 0 0 0 0 13 0 0 0 0
1M 0 0 0 0 0 0 0 0 0 0 0 0
2M 0 0 0 0 0 0 0 0 0 0 0 0
4M 0 0 0 0 0 0 0 0 0 0 0 0
8M 0 0 0 0 0 0 0 0 0 0 0 0
16M 0 0 0 0 0 0 0 0 0 0 0 0
---------------------------------------------------------------------------------------------------------------------------------
ata-ST4000VN008-2DR166_ZDH7EMPY sync_read sync_write async_read async_write scrub trim
req_size ind agg ind agg ind agg ind agg ind agg ind agg
--------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
512 0 0 0 0 0 0 0 0 0 0 0 0
1K 0 0 0 0 0 0 0 0 0 0 0 0
2K 0 0 0 0 0 0 0 0 0 0 0 0
4K 0 0 0 0 0 0 5 0 0 0 0 0
8K 0 0 0 0 0 0 0 2 0 0 0 0
16K 0 0 0 0 0 0 68 1 0 0 0 0
32K 0 0 0 0 0 0 0 17 0 0 0 0
64K 0 0 0 0 0 0 0 26 0 0 0 0
128K 0 0 0 0 0 0 0 20 0 0 0 0
256K 0 0 0 0 0 0 0 6 0 0 0 0
512K 0 0 0 0 0 0 0 14 0 0 0 0
1M 0 0 0 0 0 0 0 0 0 0 0 0
2M 0 0 0 0 0 0 0 0 0 0 0 0
4M 0 0 0 0 0 0 0 0 0 0 0 0
8M 0 0 0 0 0 0 0 0 0 0 0 0
16M 0 0 0 0 0 0 0 0 0 0 0 0
---------------------------------------------------------------------------------------------------------------------------------
ata-ST4000VN008-2DR166_ZDH7F08Z sync_read sync_write async_read async_write scrub trim
req_size ind agg ind agg ind agg ind agg ind agg ind agg
--------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
512 0 0 0 0 0 0 0 0 0 0 0 0
1K 0 0 0 0 0 0 0 0 0 0 0 0
2K 0 0 0 0 0 0 0 0 0 0 0 0
4K 0 0 0 0 0 0 5 0 0 0 0 0
8K 0 0 0 0 0 0 0 2 0 0 0 0
16K 0 0 0 0 0 0 68 2 0 0 0 0
32K 0 0 0 0 0 0 0 13 0 0 0 0
64K 0 0 0 0 0 0 0 30 0 0 0 0
128K 0 0 0 0 0 0 0 19 0 0 0 0
256K 0 0 0 0 0 0 0 7 0 0 0 0
512K 0 0 0 0 0 0 0 13 0 0 0 0
1M 0 0 0 0 0 0 0 0 0 0 0 0
2M 0 0 0 0 0 0 0 0 0 0 0 0
4M 0 0 0 0 0 0 0 0 0 0 0 0
8M 0 0 0 0 0 0 0 0 0 0 0 0
16M 0 0 0 0 0 0 0 0 0 0 0 0
---------------------------------------------------------------------------------------------------------------------------------
ata-ST4000VN008-2DR166_ZDH7ESM1 sync_read sync_write async_read async_write scrub trim
req_size ind agg ind agg ind agg ind agg ind agg ind agg
--------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
512 0 0 0 0 0 0 0 0 0 0 0 0
1K 0 0 0 0 0 0 0 0 0 0 0 0
2K 0 0 0 0 0 0 0 0 0 0 0 0
4K 0 0 0 0 0 0 5 0 0 0 0 0
8K 0 0 0 0 0 0 0 3 0 0 0 0
16K 0 0 0 0 0 0 66 2 0 0 0 0
32K 0 0 0 0 0 0 0 15 0 0 0 0
64K 0 0 0 0 0 0 0 33 0 0 0 0
128K 0 0 0 0 0 0 0 13 0 0 0 0
256K 0 0 0 0 0 0 0 9 0 0 0 0
512K 0 0 0 0 0 0 0 13 0 0 0 0
1M 0 0 0 0 0 0 0 0 0 0 0 0
2M 0 0 0 0 0 0 0 0 0 0 0 0
4M 0 0 0 0 0 0 0 0 0 0 0 0
8M 0 0 0 0 0 0 0 0 0 0 0 0
16M 0 0 0 0 0 0 0 0 0 0 0 0
---------------------------------------------------------------------------------------------------------------------------------
ata-ST4000VN008-2DR166_ZDH7FA7S sync_read sync_write async_read async_write scrub trim
req_size ind agg ind agg ind agg ind agg ind agg ind agg
--------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
512 0 0 0 0 0 0 0 0 0 0 0 0
1K 0 0 0 0 0 0 0 0 0 0 0 0
2K 0 0 0 0 0 0 0 0 0 0 0 0
4K 0 0 0 0 0 0 8 0 0 0 0 0
8K 0 0 0 0 0 0 0 1 0 0 0 0
16K 0 0 0 0 0 0 68 1 0 0 0 0
32K 0 0 0 0 0 0 0 11 0 0 0 0
64K 0 0 0 0 0 0 0 25 0 0 0 0
128K 0 0 0 0 0 0 0 19 0 0 0 0
256K 0 0 0 0 0 0 0 9 0 0 0 0
512K 0 0 0 0 0 0 0 14 0 0 0 0
1M 0 0 0 0 0 0 0 0 0 0 0 0
2M 0 0 0 0 0 0 0 0 0 0 0 0
4M 0 0 0 0 0 0 0 0 0 0 0 0
8M 0 0 0 0 0 0 0 0 0 0 0 0
16M 0 0 0 0 0 0 0 0 0 0 0 0
---------------------------------------------------------------------------------------------------------------------------------
ata-ST4000VN008-2DR166_ZDH7F9P5 sync_read sync_write async_read async_write scrub trim
req_size ind agg ind agg ind agg ind agg ind agg ind agg
--------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
512 0 0 0 0 0 0 0 0 0 0 0 0
1K 0 0 0 0 0 0 0 0 0 0 0 0
2K 0 0 0 0 0 0 0 0 0 0 0 0
4K 0 0 0 0 0 0 4 0 0 0 0 0
8K 0 0 0 0 0 0 0 2 0 0 0 0
16K 0 0 0 0 0 0 47 3 0 0 0 0
32K 0 0 0 0 0 0 0 14 0 0 0 0
64K 0 0 0 0 0 0 0 26 0 0 0 0
128K 0 0 0 0 0 0 0 20 0 0 0 0
256K 0 0 0 0 0 0 0 7 0 0 0 0
512K 0 0 0 0 0 0 0 14 0 0 0 0
1M 0 0 0 0 0 0 0 0 0 0 0 0
2M 0 0 0 0 0 0 0 0 0 0 0 0
4M 0 0 0 0 0 0 0 0 0 0 0 0
8M 0 0 0 0 0 0 0 0 0 0 0 0
16M 0 0 0 0 0 0 0 0 0 0 0 0
---------------------------------------------------------------------------------------------------------------------------------
ata-ST4000VN008-2DR166_ZDH7F9BN sync_read sync_write async_read async_write scrub trim
req_size ind agg ind agg ind agg ind agg ind agg ind agg
--------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
512 0 0 0 0 0 0 0 0 0 0 0 0
1K 0 0 0 0 0 0 0 0 0 0 0 0
2K 0 0 0 0 0 0 0 0 0 0 0 0
4K 0 0 0 0 0 0 3 0 0 0 0 0
8K 0 0 0 0 0 0 0 2 0 0 0 0
16K 0 0 0 0 0 0 52 3 0 0 0 0
32K 0 0 0 0 0 0 0 13 0 0 0 0
64K 0 0 0 0 0 0 0 25 0 0 0 0
128K 0 0 0 0 0 0 0 16 0 0 0 0
256K 0 0 0 0 0 0 0 6 0 0 0 0
512K 0 0 0 0 0 0 0 15 0 0 0 0
1M 0 0 0 0 0 0 0 0 0 0 0 0
2M 0 0 0 0 0 0 0 0 0 0 0 0
4M 0 0 0 0 0 0 0 0 0 0 0 0
8M 0 0 0 0 0 0 0 0 0 0 0 0
16M 0 0 0 0 0 0 0 0 0 0 0 0
---------------------------------------------------------------------------------------------------------------------------------
nvme-INTEL_SSDPED1D280GA_PHMB7443018J280CGN sync_read sync_write async_read async_write scrub trim
req_size ind agg ind agg ind agg ind agg ind agg ind agg
--------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
512 0 0 0 0 0 0 0 0 0 0 0 0
1K 0 0 0 0 0 0 0 0 0 0 0 0
2K 0 0 0 0 0 0 0 0 0 0 0 0
4K 0 0 0 0 0 0 0 0 0 0 0 0
8K 0 0 0 0 0 0 0 0 0 0 0 0
16K 0 0 0 0 0 0 0 0 0 0 0 0
32K 0 0 0 0 0 0 0 0 0 0 0 0
64K 0 0 0 0 0 0 0 0 0 0 0 0
128K 0 0 696 0 0 0 0 0 0 0 0 0
256K 0 0 0 0 0 0 0 0 0 0 0 0
512K 0 0 0 0 0 0 0 0 0 0 0 0
1M 0 0 0 0 0 0 0 0 0 0 0 0
2M 0 0 0 0 0 0 0 0 0 0 0 0
4M 0 0 0 0 0 0 0 0 0 0 0 0
8M 0 0 0 0 0 0 0 0 0 0 0 0
16M 0 0 0 0 0 0 0 0 0 0 0 0
---------------------------------------------------------------------------------------------------------------------------------
So we write to SLOG in 128K blocks. Let's benchmark this with
dd if=/dev/zero of=/dev/disk/by-id/nvme-INTEL_SSDPED1D280GA_PHMB7443018J280CGN bs=128K oflag=sync
I get throughput of 1.1GB/s and the following output from iostat -mx 5:
avg-cpu: %user %nice %system %iowait %steal %idle
0.03 0.00 2.62 3.36 0.00 93.99
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.00 8605.40 0.00 1075.67 0.00 266769.60 0.00 96.88 0.00 0.08 1.00 0.00 128.00 0.12 100.00
nvme1n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
As we can see, this SSD is able to handle much more IOPS sized at 128K resulting in ~1075MB/s write performance.
What causes ZFS to consume all the bandwidth of the SSD drive with just 700 IOPS, when dd can issue more than 10x at the same queue length and utilization?
I guess the answer is in wrqm column. But stalling SSD device at 666.8 writes per second seems to be wrong. I'm wondering if something is wrong with my system.
After discovering these nice histograms I run another iteration of write benchmark:
dd if=/dev/zero of=zero bs=10M count=5000
To keep it short:
storage sync_read sync_write async_read async_write scrub trim
req_size ind agg ind agg ind agg ind agg ind agg ind agg
--------------------------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
512 0 0 0 0 0 0 0 0 0 0 0 0
1K 0 0 0 0 0 0 0 0 0 0 0 0
2K 0 0 0 0 0 0 0 0 0 0 0 0
4K 0 0 12 0 0 0 156 0 0 0 0 0
8K 0 0 0 0 0 0 41 6 0 0 0 0
16K 0 0 0 0 0 0 6.65K 155 0 0 0 0
32K 0 0 0 0 0 0 0 668 0 0 0 0
64K 0 0 0 0 0 0 0 813 0 0 0 0
128K 0 0 0 0 0 0 0 637 0 0 0 0
256K 0 0 0 0 0 0 0 522 0 0 0 0
512K 0 0 0 0 0 0 0 492 0 0 0 0
1M 0 0 0 0 0 0 0 0 0 0 0 0
2M 0 0 0 0 0 0 0 0 0 0 0 0
4M 0 0 0 0 0 0 0 0 0 0 0 0
8M 0 0 0 0 0 0 0 0 0 0 0 0
16M 0 0 0 0 0 0 0 0 0 0 0 0
---------------------------------------------------------------------------------------------------------------------------------
I interpret it as follows (please correct me if I'm wrong): column ind is a "pre-aggregation" value and agg is a "post-aggregation" (thus only blocks from agg column issue IOs). It strikes me that the above dd command with block size of 10M results in IO/s of size 16K. recordsize should be somewhat bigger... I think this can be related to inability to reach higher throughput of async write during tests.
With that I kind of have solution of the problem "reads are slower than writes". I still don't have solution of the problem "Subpar performance of RAIDZ". I think it can be related to IO size. Also issue with SLOG is worrying. I actually get much better bandwidth (>300MiB/s) when SLOG device is not present.
I used to suspect some issue with scheduling or some form of lock contention (indicated by noticeable performance difference when hyper-threading was switched off). I think the explanation of the phenomenon observed on SLOG goes beyond that.
Issues related to raidz read performance being lower than write performance are somewhat viral on the internet. While some benchmarks have mistakes like block size of 512B, there is some data suggesting that the thing might be in configuration of the software (like https://www.reddit.com/r/zfs/comments/8pm7i0/8x_seagate_12tb_in_raidz2_poor_readwrite/).
I'm facing the exact same issue. Random writes with 1M BS can sustain around 1GB/s on 8x 8TB Ultrastar 7200 HDDs. Sequential write gets me up to around 1200MB/s.
Random read with 1M BS hovers 80-100MB/s. Sequential read gets me up to around 200MB/s, still far shy of the random/sequential write speeds.
I have noticed this as well, 0.8.2 on 4.15.14
Random read with 1M BS hovers 80-100MB/s. Sequential read gets me up to around 200MB/s, still far shy of the random/sequential write speeds.
What is wrong with that? That is around what iops a spinning disk can do.
Random read with 1M BS hovers 80-100MB/s. Sequential read gets me up to around 200MB/s, still far shy of the random/sequential write speeds.
What is wrong with that? That is around what iops a spinning disk can do.
Not exactly. The IOPS on each disk are well under 100, far shy of the ~200-250 they can sustain each, and do individually. Even iostat reports each drive as 30-40% utilized. The limitation seems to be in ZFS somewhere, not the hardware.
In all other benchmarks against ZFS (and random read as well against the disk directly) the disks max out at a sustained ~250 IOPS.
I was thinking about the 200 MB/s sequential. Wouldn't that be the max a disc and thus raidz can do?
Doesn't raidz1 have a theoretical max sequential of (n-1) * STR of a single disk?
I have seen greater than 1GB /s sequential on my raidz3 arrays.
We had very similar results.
We had to drop ZFS from consideration after seeing similar issue.
HP Apollo machines with P408i-p controllers. 256GB, 40 cores.
Centos 7, ZFS ver 0.7 to 0.8/master from December 2019.
Raidz 4x6 4TB disks.
With XFS+hw controller we get 2.5-3GBs sequential speeds (fio, dd), single threaded.
With ZFS we got max 300-400MBs, after heavy tuning there were peaks around 700MBs with 8+ threads. Write speeds around 1.5GBs, acceptable in our scenario (parallel DB).
We would be happy to sacrifice some speed for the features, but this was a deal-breaker for us.
I have faced exactly the same behaviour with my 2 x 6 RaidZ2 pool (ZFS version 0.8.3-pve1 on Kernel 5.3.18-2-pve).
I described the issue here: https://forums.servethehome.com/index.php?threads/disappointing-zfs-read-performance-on-2-x-6-raidz2-and-quest-for-bottleneck-s.27716
Thank you very much @Maciej-Poleski: setting zfetch_max_distance to the maximum value of 2147483648 also got me the read speed I expected from my system.
I would like to chime in. I've been running benchmarks on my system after running into performance issues too.
System is HPe ML10 Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30GHz, 32 GB DDR4 ECC.
6 x 4TB spinning disks, Hitachi Deskstar and Ultrastar 7200rpm.
Before running the benchmarks, I have installed a completely fresh system with Arch Linux, 5.6.15 and zfs 0.8.4. I changed nothing on the config, simply installed with basic and minimal settings and just the packages I needed to run the benchmarks and monitor performance.
I ran memtest86+ full cycle to make sure RAM is oke (its ECC, but still).
To establish a baseline, I have benchmarked each disk individually with 2048 aligned ext4 partitions, with fio in a loop with the following parameters. Testfile gets deleted and caches dropped between each run:
I can post the results if you like but believe me when I say these numbers are consistent accross the board, and completely within every reasonable expectation. They also match with online test results.
I created a zfs pool as follows:
I created a zfs dataset for each of the following recordsizes: 4K, 8K, 64K, 128K, 1M. I then ran the fio loop with each blocksize on each dataset. This amounts to 20 tests on each dataset, a total of 100 tests across the board, 15 minutes per run, 5 hours per dataset and 25 hours to complete.
The random read numbers on the pool:
Mode: RANDREAD | 聽 | RS4K | RS8K | RS64K | RS128K | RS1M
-- | -- | -- | -- | -- | -- | --
4K Rand | IOPS | 591 | 432 | 240 | 222 | 232
8K Rand | IOPS | 345 | 416 | 278 | 246 | 187
64K Rand | IOPS | 140 | 331 | 272 | 243 | 230
128K Rand | IOPS | 241 | 155 | 235 | 245 | 165
1M Rand | IOPS | 88 | 80 | 109 | 116 | 168
Averages | 聽 | 281 | 283 | 227 | 214 | 196
4K Rand | MiB/s | 2,3 | 1,7 | 0,9 | 0,9 | 0,9
8K Rand | MiB/s | 2,7 | 3,3 | 2,2 | 1,9 | 1,5
64K Rand | MiB/s | 8,8 | 20,7 | 17,1 | 15,2 | 14,4
128K Rand | MiB/s | 30,2 | 19,5 | 29,4 | 30,7 | 20,7
1M Rand | MiB/s | 89,0 | 80,4 | 110,0 | 117,0 | 169,0
Averages | 聽 | 26,6 | 25,1 | 31,9 | 33,1 | 41,3
Comparing to the single disk speeds, only the 4K was faster (about twice as fast) on my pool. From 8K and up its pretty much single disk speeds, give or take here and there.
Random writes are a different story. Look at this:
Mode: RANDWRITE | 聽 | RS4K | RS8K | RS64K | RS128K | RS1M
-- | -- | -- | -- | -- | -- | --
4K Rand | IOPS | 8905 | 5068 | 4829 | 5873 | 2200
8K Rand | IOPS | 6555 | 15500 | 2744 | 2819 | 2064
64K Rand | IOPS | 921 | 1813 | 3705 | 579 | 407
128K Rand | IOPS | 389 | 851 | 1794 | 2674 | 297
1M Rand | IOPS | 44 | 54 | 257 | 336 | 505
Averages | 聽 | 3363 | 4657 | 2666 | 2456 | 1095
4K Rand | MiB/s | 34,8 | 19,8 | 18,9 | 22,9 | 8,6
8K Rand | MiB/s | 51,2 | 121,0 | 21,4 | 22,0 | 16,1
64K Rand | MiB/s | 57,5 | 113,0 | 232,0 | 26,2 | 25,5
128K Rand | MiB/s | 48,7 | 106,0 | 224,0 | 334,0 | 37,2
1M Rand | MiB/s | 44,3 | 54,6 | 258,0 | 336,0 | 505,0
Averages | 聽 | 47,3 | 82,9 | 150,9 | 148,2 | 118,5
I don't know what to make of this. IOPS are through the roof (unreal, each disk is capable of maybe 250-300 max?). The 4K and 8K MiB/s are also unrealistically high, but the rest seems decent and consistent with triple to quad single disk speeds.
Again, I don't know what to make of this but I would really like to find out whether I can get those random read speeds "up to speed", so to speak. I'm running the above tests on a striped NVMe pool of 3 SSD's (which look to turn out abnormally slow while their single disk speeds are reasonable). After that is done I can experiment with tuning performance parameters (if I know which ones).
Most helpful comment
I would like to chime in. I've been running benchmarks on my system after running into performance issues too.
System is HPe ML10 Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30GHz, 32 GB DDR4 ECC.
6 x 4TB spinning disks, Hitachi Deskstar and Ultrastar 7200rpm.
Before running the benchmarks, I have installed a completely fresh system with Arch Linux, 5.6.15 and zfs 0.8.4. I changed nothing on the config, simply installed with basic and minimal settings and just the packages I needed to run the benchmarks and monitor performance.
I ran memtest86+ full cycle to make sure RAM is oke (its ECC, but still).
To establish a baseline, I have benchmarked each disk individually with 2048 aligned ext4 partitions, with fio in a loop with the following parameters. Testfile gets deleted and caches dropped between each run:
I can post the results if you like but believe me when I say these numbers are consistent accross the board, and completely within every reasonable expectation. They also match with online test results.
I created a zfs pool as follows:
I created a zfs dataset for each of the following recordsizes: 4K, 8K, 64K, 128K, 1M. I then ran the fio loop with each blocksize on each dataset. This amounts to 20 tests on each dataset, a total of 100 tests across the board, 15 minutes per run, 5 hours per dataset and 25 hours to complete.
The random read numbers on the pool:
Mode: RANDREAD | 聽 | RS4K | RS8K | RS64K | RS128K | RS1M
-- | -- | -- | -- | -- | -- | --
4K Rand | IOPS | 591 | 432 | 240 | 222 | 232
8K Rand | IOPS | 345 | 416 | 278 | 246 | 187
64K Rand | IOPS | 140 | 331 | 272 | 243 | 230
128K Rand | IOPS | 241 | 155 | 235 | 245 | 165
1M Rand | IOPS | 88 | 80 | 109 | 116 | 168
Averages | 聽 | 281 | 283 | 227 | 214 | 196
4K Rand | MiB/s | 2,3 | 1,7 | 0,9 | 0,9 | 0,9
8K Rand | MiB/s | 2,7 | 3,3 | 2,2 | 1,9 | 1,5
64K Rand | MiB/s | 8,8 | 20,7 | 17,1 | 15,2 | 14,4
128K Rand | MiB/s | 30,2 | 19,5 | 29,4 | 30,7 | 20,7
1M Rand | MiB/s | 89,0 | 80,4 | 110,0 | 117,0 | 169,0
Averages | 聽 | 26,6 | 25,1 | 31,9 | 33,1 | 41,3
Comparing to the single disk speeds, only the 4K was faster (about twice as fast) on my pool. From 8K and up its pretty much single disk speeds, give or take here and there.
Random writes are a different story. Look at this:
Mode: RANDWRITE | 聽 | RS4K | RS8K | RS64K | RS128K | RS1M
-- | -- | -- | -- | -- | -- | --
4K Rand | IOPS | 8905 | 5068 | 4829 | 5873 | 2200
8K Rand | IOPS | 6555 | 15500 | 2744 | 2819 | 2064
64K Rand | IOPS | 921 | 1813 | 3705 | 579 | 407
128K Rand | IOPS | 389 | 851 | 1794 | 2674 | 297
1M Rand | IOPS | 44 | 54 | 257 | 336 | 505
Averages | 聽 | 3363 | 4657 | 2666 | 2456 | 1095
4K Rand | MiB/s | 34,8 | 19,8 | 18,9 | 22,9 | 8,6
8K Rand | MiB/s | 51,2 | 121,0 | 21,4 | 22,0 | 16,1
64K Rand | MiB/s | 57,5 | 113,0 | 232,0 | 26,2 | 25,5
128K Rand | MiB/s | 48,7 | 106,0 | 224,0 | 334,0 | 37,2
1M Rand | MiB/s | 44,3 | 54,6 | 258,0 | 336,0 | 505,0
Averages | 聽 | 47,3 | 82,9 | 150,9 | 148,2 | 118,5
I don't know what to make of this. IOPS are through the roof (unreal, each disk is capable of maybe 250-300 max?). The 4K and 8K MiB/s are also unrealistically high, but the rest seems decent and consistent with triple to quad single disk speeds.
Again, I don't know what to make of this but I would really like to find out whether I can get those random read speeds "up to speed", so to speak. I'm running the above tests on a striped NVMe pool of 3 SSD's (which look to turn out abnormally slow while their single disk speeds are reasonable). After that is done I can experiment with tuning performance parameters (if I know which ones).