Type | Version/Name
--- | ---
Distribution Name | Arch Linux
Distribution Version | N/A (rolling)
Linux Kernel | 4.11.2
Architecture | x86_64
ZFS Version | 0.7.0.rc4.r20.gf871ab6ea
SPL Version | 0.7.0.rc4.r1.g8f87971
I noticed that my hard drives are constantly making noise despite nothing actively issuing I/Os. I ran iotop first and found that z_null_int is very busy doing something:
Total DISK READ : 952.40 B/s | Total DISK WRITE : 552.00 K/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 22.26 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
654 be/0 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [z_null_int]
281 be/4 root 0.00 B/s 44.64 K/s 0.00 % 0.00 % systemd-journald
1656 be/3 root 0.00 B/s 3.72 K/s 0.00 % 0.00 % ulogd
iostat agrees that the disks are being kept very busy:
avg-cpu: %user %nice %system %iowait %steal %idle
0.21 0.00 1.30 11.98 0.00 86.51
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
nvme0n1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 13.33 312.67 0.00 4.64 29.15 0.83 2.51 57.50 0.16 2.48 81.00
sdd 0.00 0.00 13.33 273.67 0.00 3.25 23.20 0.72 2.49 49.25 0.21 2.42 69.33
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 13.67 274.33 0.00 3.25 23.12 0.73 2.57 50.00 0.21 2.45 70.67
sde 0.00 0.00 13.33 316.00 0.00 4.64 28.85 0.81 2.42 54.75 0.21 2.36 77.67
zram0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Is there some way to discover what z_null_int is doing? And how to stop it?
The pool isn't doing any scrubbing or resilvering:
pool: Silmaril
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
Silmaril ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-HGST_HDN724040ALE640_PK2334PEK3LW3T ONLINE 0 0 0
ata-HGST_HDN724040ALE640_PK2334PEK3WEZT ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-HGST_HDN724040ALE640_PK2338P4HYVXVC ONLINE 0 0 0
ata-HGST_HDN724040ALE640_PK2338P4HYW71C ONLINE 0 0 0
logs
ata-Samsung_SSD_850_PRO_256GB_S251NXAH210160H ONLINE 0 0 0
cache
ata-Samsung_SSD_850_PRO_512GB_S250NXAGB42356V ONLINE 0 0 0
errors: No known data errors
Unknown. I don't know what caused it.
Nothing relevant.
So I blew away all my filesystem snapshots except for the most recent one. I have a cron that makes them once every 2 weeks, and the oldest was in early February and the newest was 2017-05-15. Killing the snapshots also killed the constant disk I/O but I'm not entirely sure why.
I'm having the same issue on Ubuntu 17.04 with Linux 4.11.6 (Ubuntu mainline) with SPL 0.7.0-rc4_4_gac48361c and ZFS 0.7.0-rc4_65_gd9ad3fea3b without having any snapshots. Together with z_null_int ksoftirqd is at 99.9% in iotop > 95% of the time.
The system behaves without any major limitations (#5867 is still an issue, so that's only as far as I can tell).
I'm experiencing the same with
For me it happens only during a scrub on and up to date Arch system. Scrub is super slow, 5-10 MB/s, and z_null_int is pegged at 99.99% usage by iotop. Pool is a mirrored pair with dedup enabled.
4.12.3-1-ARCHzfs-linux 0.7.1.4.12.8.2-3 (archzfs-linux)spl-linux 0.7.1.4.12.8.2-3 (archzfs-linux)I haven't had this issue happen in recent weeks. Each time it has happened, though, blowing away all my snapshots has caused it to immediately stop churning the disks. I never noticed any I/O performance degradation even when it was abusing the poor disks. It seemed like it was only doing a ton of work when the pool was idle? I wonder if the issue @dpendolino is describing is somehow orthogonal to this one?
$ pacman -Q | grep -e '^linux-hsw' -e 'zfs' -e 'spl'
linux-hsw 4.12.10-1
linux-hsw-headers 4.12.10-1
spl-dkms 0.7.0.r12.g9df9692-1
spl-utils 0.7.0.r12.g9df9692-1
zfs-auto-snapshot 1:1.2.1.1.r2.049e966-1
zfs-dkms 0.7.0.r46.g2209e4098-1
zfs-utils 0.7.0.r46.g2209e4098-1
zfs-utils-debug 0.7.0.r46.g2209e4098-1
I don't use deduping because the cost of doing so was super high the last time I tried it. And my pool structure is still the same as I had in my original report (two stripes, mirrored).
I also changed how I'm handling snapshotting. The zfs-auto-snapshot package in the list above is handling it. I used to have a cron that ran twice monthly (on the 1st and 15th of the month) to create a snapshot. Now I have the much more frequent snapshots managed by zfs-auto-snapshot and haven't run into the issue since.
@tycho I am using the git version of zfs-auto-snapshot from the AUR. My scrub from yesterday is still chugging at ~10 MB/s =/ I currently have about 48 snapshots of my pool, perhaps I'll try deleting them all as soon as possible.
$ pacman -Q | grep -e '^linux' -e 'zfs' -e 'spl'
linux 4.12.3-1
linux-api-headers 4.10.1-1
linux-firmware 20170622.7d2c913-1
linux-headers 4.12.3-1
linux-lts 4.9.47-1
linux-lts-headers 4.9.47-1
spl-linux 0.7.0_4.12.3_1-2
spl-utils-common 0.7.1-2
splix 2.0.0-13
zfs-auto-snapshot-git 1:1.2.1.1.r2.049e966-1
zfs-linux 0.7.0_4.12.3_1-2
zfs-utils-common 0.7.1-2
Same issue
CentOS Linux release 7.4.1708 (Core)
kernel-ml-4.13.9-1.el7.elrepo.x86_64
kmod-spl-devel-0.7.3-1.el7_4.x86_64
kmod-zfs-devel-0.7.3-1.el7_4.x86_64
libzfs2-0.7.3-1.el7_4.x86_64
spl-0.7.3-1.el7_4.x86_64
spl-dkms-0.7.3-1.el7_4.noarch
zfs-0.7.3-1.el7_4.x86_64
zfs-dkms-0.7.3-1.el7_4.noarch
zfs-release-1-5.el7_4.noarch
Same issue over here, dedup and lz4 compression is enabled.
NAME PROPERTY VALUE SOURCE
pool type filesystem -
pool creation Mon Oct 23 20:26 2017 -
pool used 197G -
pool available 210G -
pool referenced 96K -
pool compressratio 1.17x -
pool mounted no -
pool quota none default
pool reservation none default
pool recordsize 1M local
pool mountpoint none local
pool sharenfs off default
pool checksum on default
pool compression lz4 local
pool atime off local
pool devices on default
pool exec on default
pool setuid on default
pool readonly off default
pool zoned off default
pool snapdir hidden default
pool aclinherit restricted default
pool createtxg 1 -
pool canmount on default
pool xattr on default
pool copies 2 local
pool version 5 -
pool utf8only off -
pool normalization none -
pool casesensitivity sensitive -
pool vscan off default
pool nbmand off default
pool sharesmb off default
pool refquota none default
pool refreservation none default
pool guid 6470079991103544842 -
pool primarycache metadata local
pool secondarycache all default
pool usedbysnapshots 0B -
pool usedbydataset 96K -
pool usedbychildren 197G -
pool usedbyrefreservation 0B -
pool logbias latency default
pool dedup on local
pool mlslabel none default
pool sync disabled local
pool dnodesize legacy default
pool refcompressratio 1.00x -
pool written 96K -
pool logicalused 150G -
pool logicalreferenced 40K -
pool volmode default default
pool filesystem_limit none default
pool snapshot_limit none default
pool filesystem_count none default
pool snapshot_count none default
pool snapdev hidden default
pool acltype off default
pool context none default
pool fscontext none default
pool defcontext none default
pool rootcontext none default
pool relatime off local
pool redundant_metadata all default
pool overlay off default
pool/data type filesystem -
pool/data creation Mon Oct 23 20:26 2017 -
pool/data used 196G -
pool/data available 210G -
pool/data referenced 196G -
pool/data compressratio 1.17x -
pool/data mounted yes -
pool/data quota none default
pool/data reservation none default
pool/data recordsize 1M local
pool/data mountpoint /data local
pool/data sharenfs off default
pool/data checksum on default
pool/data compression lz4 inherited from pool
pool/data atime off inherited from pool
pool/data devices on default
pool/data exec on default
pool/data setuid on default
pool/data readonly off default
pool/data zoned off default
pool/data snapdir hidden default
pool/data aclinherit restricted default
pool/data createtxg 11 -
pool/data canmount on default
pool/data xattr on default
pool/data copies 2 local
pool/data version 5 -
pool/data utf8only off -
pool/data normalization none -
pool/data casesensitivity sensitive -
pool/data vscan off default
pool/data nbmand off default
pool/data sharesmb off default
pool/data refquota none default
pool/data refreservation none default
pool/data guid 12619517110471224372 -
pool/data primarycache metadata local
pool/data secondarycache all default
pool/data usedbysnapshots 0B -
pool/data usedbydataset 196G -
pool/data usedbychildren 0B -
pool/data usedbyrefreservation 0B -
pool/data logbias latency default
pool/data dedup on inherited from pool
pool/data mlslabel none default
pool/data sync disabled local
pool/data dnodesize legacy default
pool/data refcompressratio 1.17x -
pool/data written 196G -
pool/data logicalused 150G -
pool/data logicalreferenced 150G -
pool/data volmode default default
pool/data filesystem_limit none default
pool/data snapshot_limit none default
pool/data filesystem_count none default
pool/data snapshot_count none default
pool/data snapdev hidden default
pool/data acltype off default
pool/data context none default
pool/data fscontext none default
pool/data defcontext none default
pool/data rootcontext none default
pool/data relatime off inherited from pool
pool/data redundant_metadata all default
pool/data overlay off default
ii libzfs2linux 0.7.3-0york1~16.04 amd64 OpenZFS filesystem library for Linux
ii libzpool2linux 0.7.3-0york1~16.04 amd64 OpenZFS pool library for Linux
ii spl 0.7.3-0york1~16.04 amd64 Solaris Porting Layer user-space utilities for Linux
ii spl-dkms 0.7.3-0york1~16.04 all Solaris Porting Layer kernel modules for Linux
ii zfs-dkms 0.7.3-0york1~16.04 all OpenZFS filesystem kernel modules for Linux
ii zfs-zed 0.7.3-0york1~16.04 amd64 OpenZFS Event Daemon
ii zfsutils-linux 0.7.3-0york1~16.04 amd64 command-line tools to manage OpenZFS filesystems
md5-c4559ad48f32c87ed22d1798e99a2694
/sys/module/zfs/parameters/dbuf_cache_max_bytes: 104857600
/sys/module/zfs/parameters/l2arc_write_boost: 8388608
/sys/module/zfs/parameters/l2arc_write_max: 8388608
/sys/module/zfs/parameters/metaslab_aliquot: 524288
/sys/module/zfs/parameters/spa_config_path: /etc/zfs/zpool.cache
/sys/module/zfs/parameters/spa_load_verify_maxinflight: 10000
/sys/module/zfs/parameters/zfetch_array_rd_sz: 1048576
/sys/module/zfs/parameters/zfetch_max_distance: 67108864
/sys/module/zfs/parameters/zfs_arc_average_blocksize: 8192
/sys/module/zfs/parameters/zfs_arc_max: 80530636800
/sys/module/zfs/parameters/zfs_arc_meta_adjust_restarts: 4096
/sys/module/zfs/parameters/zfs_arc_meta_limit: 21474836480
/sys/module/zfs/parameters/zfs_arc_meta_prune: 10000
/sys/module/zfs/parameters/zfs_dbgmsg_maxsize: 4194304
/sys/module/zfs/parameters/zfs_deadman_checktime_ms: 5000
/sys/module/zfs/parameters/zfs_deadman_synctime_ms: 1000000
/sys/module/zfs/parameters/zfs_delay_scale: 500000
/sys/module/zfs/parameters/zfs_delete_blocks: 20480
/sys/module/zfs/parameters/zfs_dirty_data_max: 15206460620
/sys/module/zfs/parameters/zfs_dirty_data_max_max: 3134265344
/sys/module/zfs/parameters/zfs_dirty_data_sync: 67108864
/sys/module/zfs/parameters/zfs_free_max_blocks: 100000
/sys/module/zfs/parameters/zfs_free_min_time_ms: 1000
/sys/module/zfs/parameters/zfs_immediate_write_sz: 32768
/sys/module/zfs/parameters/zfs_max_recordsize: 1048576
/sys/module/zfs/parameters/zfs_multihost_interval: 1000
/sys/module/zfs/parameters/zfs_pd_bytes_max: 52428800
/sys/module/zfs/parameters/zfs_read_chunk_size: 1048576
/sys/module/zfs/parameters/zfs_resilver_min_time_ms: 3000
/sys/module/zfs/parameters/zfs_scan_min_time_ms: 1000
/sys/module/zfs/parameters/zfs_vdev_aggregation_limit: 262144
/sys/module/zfs/parameters/zfs_vdev_cache_max: 16384
/sys/module/zfs/parameters/zfs_vdev_max_active: 1000
/sys/module/zfs/parameters/zfs_vdev_mirror_rotating_seek_offset: 1048576
/sys/module/zfs/parameters/zfs_vdev_queue_depth_pct: 1000
/sys/module/zfs/parameters/zfs_vdev_raidz_impl: [fastest] original scalar sse2 ssse3
/sys/module/zfs/parameters/zfs_vdev_read_gap_limit: 32768
/sys/module/zfs/parameters/zfs_vdev_scheduler: deadline
/sys/module/zfs/parameters/zfs_vdev_write_gap_limit: 4096
/sys/module/zfs/parameters/zfs_zevent_len_max: 2048
/sys/module/zfs/parameters/zil_slog_bulk: 786432
/sys/module/zfs/parameters/zio_delay_max: 30000
/sys/module/zfs/parameters/zvol_max_discard_blocks: 16384
/sys/module/zfs/parameters/zvol_prefetch_bytes: 131072
I've got this as well, however i've got mirrored pool, on a ProxMox.
I've removing the mirror i get the same issue.
Killing the PID just results in another one starting, i do get slightly quick speeds for a while after the kill.
Just rebooted and the box improved with speed back to normal. However i did change swappienes to = 1
I have the same issue. Looks like it is getting much worse as soon as Dedup is on or snapshots exists.
I've moved my drive to ext4 as it was slowly drowning my system :(
I got same issue with high z_null_int IO on Proxmox 5.1 with kernel 4.13.x and ZFS 0.7.3. I use 2 pools of ZFS RAID1, zfs_arc_max is limited to 1GB, no zfs snapshot, and without deduplication.
Is this z_null_int high IO normal for ZFS 0.7.3? Never caught it on iotop when it was on ZFS 0.6.x.
Attached files are the ZFS details and screenshot.
I have the same 99.99% iowait on z_null_int with
i removed all snapshots, it doesn't change anything
Same issue on Proxmox 5.1, kernel 4.13.4 and ZFS 0.7.3. No snapshots or dedup, 2 pools (RAIDZ1 (with L2ARC and ZIL on SSD) and single drive).
To help us narrow down what's causing this could someone please provide the following debugging when the issue is occurring.
# Enable TXG history (the last 32 TXGs)
echo 32 >/sys/module/zfs/parameters/zfs_txg_history
# Wait 30 seconds then dump the TXG history and TX stats
cat /proc/spl/kstat/zfs/<pool>/txgs
cat /proc/spl/kstat/zfs/dmu_tx
# Enable internal debug log
echo 1 >/sys/module/zfs/parameters/zfs_dbgmsg_enable
# Wait 30 seconds then dump the internal log.
cat /proc/spl/kstat/zfs/dbgmsg
@behlendorf here is mine
@behlendorf
updated:
https://www.andcycle.idv.tw/~andcycle/tmp/tmp/20171130/zfs-debug-20171130-2.log.tar.gz
I should dump the log in single shot
i can prob spin up a debian (proxmox) box if you need more logs/info
@AndCycle thank you for the logs. Unfortunately they don't clearly show what's causing the IO. Could you try grabbing a couple stack traces from a zio_null_int threads when they're active, they're used for a couple different operations so this should help narrow it down.
cat /proc/<zio_null_int pid>/stack
@cooljimy84 thanks for the offer, but I think what would be most helpful would be a reproducer against the stock 0.7.3 release or master. I've been unable to reproduce this behavior locally.
On Thu, Nov 30, 2017 at 11:10 PM, AndCycle notifications@github.com wrote:
@behlendorf https://github.com/behlendorf oops, for a reason my system
doesn't have the stack path, do I messed kernel config?
/proc/PID/stack only shows up if you have CONFIG_STACKTRACE in your kernel (shows up under "Kernel hacking" in menuconfig).
@tycho thanks :)
@behlendorf here it is
Total DISK READ : 8.45 M/s | Total DISK WRITE : 303.16 K/s
Actual DISK READ: 4.07 M/s | Actual DISK WRITE: 2.28 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
10762 be/0 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [z_null_int]
8017 be/0 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [z_null_int]
9592 be/0 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [z_null_int]
9136 be/0 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [z_null_int]
4645 be/0 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [z_null_int]
24 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [ksoftirqd/3]
8431 be/4 qemu 108.21 K/s 20.19 K/s 0.00 % 0.00 % qemu-system-x86_64 -name guest=Win10_JP_P2P,debug-threads=on -S -object secret,id=master~-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.7,addr=0x0 -msg timestamp=on [worker]
8479 be/4 qemu 96.53 K/s 11.36 K/s 0.00 % 0.00 % qemu-system-x86_64 -name guest=Win10_JP_P2P,debug-threads=on -S -object secret,id=master~-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.7,addr=0x0 -msg timestamp=on [worker]
8518 be/4 qemu 240.07 K/s 0.00 B/s 0.00 % 0.00 % qemu-system-x86_64 -name guest=Win10_JP_P2P,debug-threads=on -S -object secret,id=master~-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.7,addr=0x0 -msg timestamp=on [worker]
25066 be/4 qemu 106.00 K/s 13.88 K/s 0.00 % 0.00 % qemu-system-x86_64 -name guest=Win10_JP_P2P,debug-threads=on -S -object secret,id=master~-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.7,addr=0x0 -msg timestamp=on [worker]
25068 be/4 qemu 96.53 K/s 0.00 B/s 0.00 % 0.00 % qemu-system-x86_64 -name guest=Win10_JP_P2P,debug-threads=on -S -object secret,id=master~-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.7,addr=0x0 -msg timestamp=on [worker]
25070 be/4 qemu 153.95 K/s 0.00 B/s 0.00 % 0.00 % qemu-system-x86_64 -name guest=Win10_JP_P2P,debug-threads=on -S -object secret,id=master~-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.7,addr=0x0 -msg timestamp=on [worker]
10230 be/4 qemu 107.89 K/s 0.00 B/s 0.00 % 0.00 % qemu-system-x86_64 -name guest=Win10_JP_P2P,debug-threads=on -S -object secret,id=master~-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.7,addr=0x0 -msg timestamp=on [worker]
18424 be/4 dnslog 0.00 B/s 323.04 B/s 0.00 % 0.00 % multilog t ./main
bacztwo ~ # ps ax|grep z_null_int |grep -v pts | awk '{print $1}'| while read i; do cat /proc/$i/stack; done
[<ffffffffa031608b>] taskq_thread+0x42b/0x550 [spl]
[<ffffffff810bd6ce>] kthread+0xfe/0x130
[<ffffffff819c0e02>] ret_from_fork+0x22/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffffa031608b>] taskq_thread+0x42b/0x550 [spl]
[<ffffffff810bd6ce>] kthread+0xfe/0x130
[<ffffffff819c0e02>] ret_from_fork+0x22/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffffa031608b>] taskq_thread+0x42b/0x550 [spl]
[<ffffffff810bd6ce>] kthread+0xfe/0x130
[<ffffffff819c0e02>] ret_from_fork+0x22/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffffa031608b>] taskq_thread+0x42b/0x550 [spl]
[<ffffffff810bd6ce>] kthread+0xfe/0x130
[<ffffffff819c0e02>] ret_from_fork+0x22/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffffa031608b>] taskq_thread+0x42b/0x550 [spl]
[<ffffffff810bd6ce>] kthread+0xfe/0x130
[<ffffffff819c0e02>] ret_from_fork+0x22/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffffa031608b>] taskq_thread+0x42b/0x550 [spl]
[<ffffffff810bd6ce>] kthread+0xfe/0x130
[<ffffffff819c0e02>] ret_from_fork+0x22/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
The stack traces tend to be non-useful for running tasks. If everyone reporting high CPU in the zio null threads has an l2arc device, I suspect that may be the cause since l2arc writes are dispatched to those threads. A good experiment would be to remove the l2arc device and see if the excessive CPU stops.
@dweeezil actually it's not high CPU usage, it just hang there on iotop,
and I don't have a L2ARC, I think that can be ruled out.
@AndCycle Indeed. That said, l2arc is one of the uses for those taskqs. I'd also suggest that anyone experiencing this issue might gain some information with cat /proc/spl/taskq and/or also running perf top -g and look for activity which would cause null zios to be created; possibly related to MMP. A good place to start might be to try to capture some stack traces when zio_null() is called. Possibly even adding it as a tracing event with perf probe -m zfs -a zio_null.
I got same results across different nodes. It seems every pool spawns its own 99.99% IO z_null_int thread in iotop. Could be 99.99% IO indicates ZFS is performing read write mirror operation inside the pool vdevs and iotop is not being able to list the read and write size properly?
Somehow I could not use perf in Proxmox 5.1 as it comes with kernel 4.13 and Debian Stretch's linux-perf package is for kernel 4.9.
##
# Proxmox 5.1 with kernel 4.13 and ZFS 0.7.3 with only 1 rpool pool mirror ZFS RAID1.
##
root@proxmox1:~# ps ax | grep z_null_int | grep -v pts | awk '{print $1}' | while read i; do cat /proc/$i/stack; done
[<ffffffffc03049f0>] taskq_thread+0x3e0/0x460 [spl]
[<ffffffffb64a58e9>] kthread+0x109/0x140
[<ffffffffb6d32875>] ret_from_fork+0x25/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
root@proxmox1:~# iotop
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
371 be/0 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [z_null_int]
##
# Proxmox 5.1 with kernel 4.13 and ZFS 0.7.3 with 2 rpool and dpool pools of mirror ZFS RAID1.
##
root@node1:~# ps ax | grep z_null_int | grep -v pts | awk '{print $1}' | while read i; do cat /proc/$i/stack; done
[<ffffffffc033c9f0>] taskq_thread+0x3e0/0x460 [spl]
[<ffffffff8fca58e9>] kthread+0x109/0x140
[<ffffffff90532875>] ret_from_fork+0x25/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffffc033c9f0>] taskq_thread+0x3e0/0x460 [spl]
[<ffffffff8fca58e9>] kthread+0x109/0x140
[<ffffffff90532875>] ret_from_fork+0x25/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
root@node1:~# iotop
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
491 be/0 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [z_null_int]
2803 be/0 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [z_null_int]
sorry I don't have the knowledge to do debugging as I write python and html most of the time,
just compiled the kernel with kprobe so perf can work with
# perf probe -m zfs -a zio_null
# perf record -e probe:zio_null -aR sleep 30
https://www.andcycle.idv.tw/~andcycle/tmp/tmp/20171203/perf.data.gz
don't know if this help, is there anything I can do to help with?
--
by the way, the 0.7.3 is prone to cause 100% kworker cpu usage and lock up the system if I do some heavy file lock contention by emerge -e --jobs 4 @world, which is just do parallel build/install package,
probably have to fire another issue.
I can easily reproduce this by running a 100% write load and constraining the ARC size because the "IO" column in iotop is showing a time spent _waiting_. In this case, most of the null zios are created by dmu_buf_hold_array_by_dnode() and also in dbuf_read() (the DB_UNCACHED case).
Does anyone reporting this problem have constrained ARC or low ARC hit rates?
@AndCycle As to using perf, my recommendation would be to run perf record -ag -e zio_null for awhile when the issue is observed, then interrupt it. Then you can either run "perf report" and hit the "E" key to see the call chains or you can run "perf report > somefile" which will create a text file with all of the call chains expanded.
At least in the case of poor cache rates, the null zio threads will definitely show high IO wait times.
Ah that is why. Nice finding @dweeezil !
On a 64GB Proxmox box, we limited the ARC to 1GB so there will be 3GB left for host OS, and 60GB for guest VMs.
After the upgrade to ZFS 0.7.x, the ARC hit rates decreased to around 30%. It used to be like 70-80% on previous ZFS 0.6.x.
second to this hit ratio issue, before 0.7 the hit rate is about 80%, since switch to 0.7 branch it drops to almost zero.
Possibly related to #6852.
In case another good case for reproduction is needed: checkouts between different branches of https://github.com/dmlloyd/openjdk.git, e.g. jdk/jdk, jdk6/jdk6, jdk8u/jdk8u cause high load and takes up to 10 minutes with almost no I/O rate.
@krichter722 In a (KVM) VM with 16 threads and 4GB RAM (ZoL master as of today), it doesn't look _too_ awful (this session is immediately following a pool import, so nothing was cached to begin with):
ubuntu@ubuntu:/tank/fs/openjdk$ git status
On branch jdk10/master
Your branch is up to date with 'origin/jdk10/master'.
nothing to commit, working tree clean
ubuntu@ubuntu:/tank/fs/openjdk$ time git checkout origin/jdk9/jdk9
Note: checking out 'origin/jdk9/jdk9'.
You are in 'detached HEAD' state. You can look around, ...
HEAD is now at 7d7fbd09fcf... Merge from 'jdk'
real 0m51.422s
user 0m8.355s
sys 0m15.248s
ubuntu@ubuntu:/tank/fs/openjdk$ time git checkout origin/jdk6/jdk6
Previous HEAD position was 7d7fbd09fcf... Merge from 'jdk'
HEAD is now at e27d58ae1d4... Merge from 'jdk'
real 0m49.802s
user 0m5.185s
sys 0m7.811s
ubuntu@ubuntu:/tank/fs/openjdk$ time git checkout origin/jdk7u/jdk7u
Previous HEAD position was e27d58ae1d4... Merge from 'jdk'
HEAD is now at 524a884c565... Merge from 'jdk'
real 0m22.568s
user 0m4.313s
sys 0m4.003s
This pool is 4 WD400EFRX drives connected to the VM with virtio-scsi, created with "zpool create tank mirror sdb sdc mirror sdd sde" and no property changes at all:
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
When the ARC hit rate was low, the zio null thread was certainly showing up in iotop but, overall, these times don't look too terrible; certainly no where near 10 minutes.
@krichter722 as @behlendorf suggested, any type of ARC collapse would certainly cause the behavior you're describing. In my test above, however, the VM had only 4GB of RAM so the ARC was capped at 2GB but even after all those checkouts, it never went above 1.4GB. It would be interesting to know how your ARC is performing when you're seeing this behavior.
My system is proxmox 5.1 zfs 0.7.2 with 128 gb di ram. At this moment is running only one vm with 16gb of ram and one zvol with writeback and discard.
I have the same problem z_null_init iotop 99% , and my arc_summary report a very high miss ratio.
Some workaround for arc miss ratio ?
arc_summary
ZFS Subsystem Report Thu Dec 07 19:17:00 2017
ARC Summary: (HEALTHY)
Memory Throttle Count: 0
ARC Misc:
Deleted: 72
Mutex Misses: 0
Evict Skips: 0
ARC Size: 1.88% 1.18 GiB
Target Size: (Adaptive) 100.00% 62.91 GiB
Min Size (Hard Limit): 6.25% 3.93 GiB
Max Size (High Water): 16:1 62.91 GiB
ARC Size Breakdown:
Recently Used Cache Size: 50.00% 31.46 GiB
Frequently Used Cache Size: 50.00% 31.46 GiB
ARC Hash Breakdown:
Elements Max: 148.78k
Elements Current: 100.00% 148.77k
Collisions: 890
Chain Max: 2
Chains: 469
ARC Total accesses: 199.57k
Cache Hit Ratio: 8.18% 16.32k
Cache Miss Ratio: 91.82% 183.25k
Actual Hit Ratio: 6.18% 12.34k
Data Demand Efficiency: 6.56% 155.00k
Data Prefetch Efficiency: 6.61% 11.75k
CACHE HITS BY CACHE LIST:
Anonymously Used: 24.38% 3.98k
Most Recently Used: 62.60% 10.22k
Most Frequently Used: 13.02% 2.12k
Most Recently Used Ghost: 0.00% 0
Most Frequently Used Ghost: 0.00% 0
CACHE HITS BY DATA TYPE:
Demand Data: 62.29% 10.16k
Prefetch Data: 4.76% 777
Demand Metadata: 13.27% 2.17k
Prefetch Metadata: 19.69% 3.21k
CACHE MISSES BY DATA TYPE:
Demand Data: 79.04% 144.84k
Prefetch Data: 5.99% 10.97k
Demand Metadata: 14.57% 26.69k
Prefetch Metadata: 0.41% 747
DMU Prefetch Efficiency: 278.42k
Hit Ratio: 2.31% 6.42k
Miss Ratio: 97.69% 272.00k
ZFS Tunable:
zvol_volmode 1
l2arc_headroom 2
dbuf_cache_max_shift 5
zfs_free_leak_on_eio 0
zfs_free_max_blocks 100000
zfs_read_chunk_size 1048576
metaslab_preload_enabled 1
zfs_dedup_prefetch 0
zfs_txg_history 0
zfs_scrub_delay 4
zfs_vdev_async_read_max_active 3
zfs_read_history 0
zfs_arc_sys_free 0
l2arc_write_max 8388608
zil_slog_bulk 786432
zfs_dbuf_state_index 0
zfs_sync_taskq_batch_pct 75
metaslab_debug_unload 0
zvol_inhibit_dev 0
zfs_abd_scatter_enabled 1
zfs_arc_pc_percent 0
zfetch_max_streams 8
zfs_recover 0
metaslab_fragmentation_factor_enabled 1
zfs_deadman_checktime_ms 5000
zfs_sync_pass_rewrite 2
zfs_object_mutex_size 64
zfs_arc_min_prefetch_lifespan 0
zfs_arc_meta_prune 10000
zfs_read_history_hits 0
zfetch_max_distance 8388608
l2arc_norw 0
zfs_dirty_data_max_percent 10
zfs_per_txg_dirty_frees_percent 30
zfs_arc_meta_min 0
metaslabs_per_vdev 200
zfs_arc_meta_adjust_restarts 4096
spa_load_verify_maxinflight 10000
spa_load_verify_metadata 1
zfs_multihost_history 0
zfs_send_corrupt_data 0
zfs_delay_min_dirty_percent 60
zfs_vdev_sync_read_max_active 10
zfs_dbgmsg_enable 0
zfs_metaslab_segment_weight_enabled 1
zio_requeue_io_start_cut_in_line 1
l2arc_headroom_boost 200
zfs_zevent_cols 80
zfs_dmu_offset_next_sync 0
spa_config_path /etc/zfs/zpool.cache
zfs_vdev_cache_size 0
dbuf_cache_hiwater_pct 10
zfs_multihost_interval 1000
zfs_multihost_fail_intervals 5
zio_dva_throttle_enabled 1
zfs_vdev_sync_write_min_active 10
zfs_vdev_scrub_max_active 2
ignore_hole_birth 1
zvol_major 230
zil_replay_disable 0
zfs_dirty_data_max_max_percent 25
zfs_expire_snapshot 300
zfs_sync_pass_deferred_free 2
spa_asize_inflation 24
dmu_object_alloc_chunk_shift 7
zfs_vdev_mirror_rotating_seek_offset 1048576
l2arc_feed_secs 1
zfs_autoimport_disable 1
zfs_arc_p_aggressive_disable 1
zfs_zevent_len_max 384
zfs_arc_meta_limit_percent 75
l2arc_noprefetch 1
zfs_vdev_raidz_impl [fastest] original scalar sse2 ssse3 avx2
zfs_arc_meta_limit 0
zfs_flags 0
zfs_dirty_data_max_max 4294967296
zfs_arc_average_blocksize 8192
zfs_vdev_cache_bshift 16
zfs_vdev_async_read_min_active 1
zfs_arc_dnode_reduce_percent 10
zfs_free_bpobj_enabled 1
zfs_arc_grow_retry 0
zfs_vdev_mirror_rotating_inc 0
l2arc_feed_again 1
zfs_vdev_mirror_non_rotating_inc 0
zfs_arc_lotsfree_percent 10
zfs_zevent_console 0
zvol_prefetch_bytes 131072
zfs_free_min_time_ms 1000
zfs_arc_dnode_limit_percent 10
zio_taskq_batch_pct 75
dbuf_cache_max_bytes 104857600
spa_load_verify_data 1
zfs_delete_blocks 20480
zfs_vdev_mirror_non_rotating_seek_inc 1
zfs_multihost_import_intervals 10
zfs_dirty_data_max 4294967296
zfs_vdev_async_write_max_active 10
zfs_dbgmsg_maxsize 4194304
zfs_nocacheflush 0
zfetch_array_rd_sz 1048576
zfs_arc_meta_strategy 1
zfs_dirty_data_sync 67108864
zvol_max_discard_blocks 16384
zvol_threads 32
zfs_vdev_async_write_active_max_dirty_percent 60
zfs_arc_p_dampener_disable 1
zfs_txg_timeout 5
metaslab_aliquot 524288
zfs_mdcomp_disable 0
zfs_vdev_sync_read_min_active 10
zfs_arc_dnode_limit 0
dbuf_cache_lowater_pct 10
zfs_abd_scatter_max_order 10
metaslab_debug_load 0
zfs_vdev_aggregation_limit 131072
metaslab_lba_weighting_enabled 1
zfs_vdev_scheduler noop
zfs_vdev_scrub_min_active 1
zfs_no_scrub_io 0
zfs_vdev_cache_max 16384
zfs_scan_idle 50
zfs_arc_shrink_shift 0
spa_slop_shift 5
zfs_vdev_mirror_rotating_seek_inc 5
zfs_deadman_synctime_ms 1000000
send_holes_without_birth_time 1
metaslab_bias_enabled 1
zvol_request_sync 0
zfs_admin_snapshot 1
zfs_no_scrub_prefetch 0
zfs_metaslab_fragmentation_threshold 70
zfs_max_recordsize 1048576
zfs_arc_min 0
zfs_nopwrite_enabled 1
zfs_arc_p_min_shift 0
zfs_multilist_num_sublists 0
zfs_vdev_queue_depth_pct 1000
zfs_mg_fragmentation_threshold 85
l2arc_write_boost 8388608
zfs_prefetch_disable 0
l2arc_feed_min_ms 200
zio_delay_max 30000
zfs_vdev_write_gap_limit 4096
zfs_pd_bytes_max 52428800
zfs_scan_min_time_ms 1000
zfs_resilver_min_time_ms 3000
zfs_delay_scale 500000
zfs_vdev_async_write_active_min_dirty_percent 30
zfs_vdev_sync_write_max_active 10
zfs_mg_noalloc_threshold 0
zfs_deadman_enabled 1
zfs_resilver_delay 2
zfs_metaslab_switch_threshold 2
zfs_arc_max 0
zfs_top_maxinflight 32
zfetch_min_sec_reap 2
zfs_immediate_write_sz 32768
zfs_vdev_async_write_min_active 2
zfs_sync_pass_dont_compress 5
zfs_vdev_read_gap_limit 32768
zfs_compressed_arc_enabled 1
zfs_vdev_max_active 1000
I don't think that's it... my system has a relatively high cache hit ratio, but exhibits the same behavior. I do have L2ARC, not shown below
ZFS Subsystem Report Thu Dec 07 10:42:43 2017
ARC Summary: (HEALTHY)
Memory Throttle Count: 0
ARC Misc:
Deleted: 25.56m
Mutex Misses: 668
Evict Skips: 668
ARC Size: 86.38% 6.76 GiB
Target Size: (Adaptive) 86.91% 6.80 GiB
Min Size (Hard Limit): 6.25% 500.67 MiB
Max Size (High Water): 16:1 7.82 GiB
ARC Size Breakdown:
Recently Used Cache Size: 7.07% 492.37 MiB
Frequently Used Cache Size: 92.93% 6.32 GiB
ARC Hash Breakdown:
Elements Max: 1.77m
Elements Current: 26.98% 477.21k
Collisions: 15.63m
Chain Max: 9
Chains: 46.04k
ARC Total accesses: 344.38m
Cache Hit Ratio: 91.73% 315.89m
Cache Miss Ratio: 8.27% 28.49m
Actual Hit Ratio: 91.42% 314.83m
Data Demand Efficiency: 99.79% 305.43m
Data Prefetch Efficiency: 16.80% 30.14m
CACHE HITS BY CACHE LIST:
Anonymously Used: 0.18% 577.06k
Most Recently Used: 8.47% 26.76m
Most Frequently Used: 91.19% 288.07m
Most Recently Used Ghost: 0.06% 175.70k
Most Frequently Used Ghost: 0.10% 313.54k
CACHE HITS BY DATA TYPE:
Demand Data: 96.49% 304.80m
Prefetch Data: 1.60% 5.06m
Demand Metadata: 1.57% 4.96m
Prefetch Metadata: 0.34% 1.07m
CACHE MISSES BY DATA TYPE:
Demand Data: 2.24% 637.76k
Prefetch Data: 88.01% 25.08m
Demand Metadata: 8.18% 2.33m
Prefetch Metadata: 1.57% 448.21k
Stack trace from the z_null_int process:
[<ffffffffc047c703>] taskq_thread+0x3a3/0x470 [spl]
[<ffffffffbc09c718>] kthread+0x118/0x130
[<ffffffffbc663f45>] ret_from_fork+0x25/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
(running that several times over a 10 minute period produces the same output each time)
This bug is causing ridiculous slowdowns on my system. It gets to the point where even just "ls" can hang for 30 seconds.
This started happening to me when I decided to turn on dedup for some datasets. I'm copying the data out of the datasets into new datasets with dedup turned off, and then destroying the dedup'd datasets, to see if that will fix the problem. However, because of this bug it is taking forever.
I do not have a L2ARC.
ZFS Subsystem Report Sun Dec 17 00:58:15 2017
ARC Summary: (HEALTHY)
Memory Throttle Count: 0
ARC Misc:
Deleted: 10.56M
Mutex Misses: 307
Evict Skips: 1.10k
ARC Size: 80.27% 12.58 GiB
Target Size: (Adaptive) 100.00% 15.67 GiB
Min Size (Hard Limit): 6.25% 1002.77 MiB
Max Size (High Water): 16:1 15.67 GiB
ARC Size Breakdown:
Recently Used Cache Size: 79.30% 7.28 GiB
Frequently Used Cache Size: 20.70% 1.90 GiB
ARC Hash Breakdown:
Elements Max: 1.25M
Elements Current: 45.38% 569.08k
Collisions: 3.04M
Chain Max: 6
Chains: 35.72k
ARC Total accesses: 39.49M
Cache Hit Ratio: 62.11% 24.53M
Cache Miss Ratio: 37.89% 14.97M
Actual Hit Ratio: 44.93% 17.74M
Data Demand Efficiency: 90.15% 7.79M
Data Prefetch Efficiency: 48.35% 13.76M
CACHE HITS BY CACHE LIST:
Anonymously Used: 6.48% 1.59M
Most Recently Used: 43.51% 10.67M
Most Frequently Used: 28.83% 7.07M
Most Recently Used Ghost: 16.75% 4.11M
Most Frequently Used Ghost: 4.43% 1.09M
CACHE HITS BY DATA TYPE:
Demand Data: 28.65% 7.03M
Prefetch Data: 27.12% 6.65M
Demand Metadata: 43.32% 10.62M
Prefetch Metadata: 0.92% 225.01k
CACHE MISSES BY DATA TYPE:
Demand Data: 5.13% 767.77k
Prefetch Data: 47.48% 7.11M
Demand Metadata: 45.94% 6.88M
Prefetch Metadata: 1.45% 216.52k
DMU Prefetch Efficiency: 46.11M
Hit Ratio: 11.82% 5.45M
Miss Ratio: 88.18% 40.66M
ZFS Tunables:
dbuf_cache_hiwater_pct 10
dbuf_cache_lowater_pct 10
dbuf_cache_max_bytes 104857600
dbuf_cache_max_shift 5
dmu_object_alloc_chunk_shift 7
ignore_hole_birth 1
l2arc_feed_again 1
l2arc_feed_min_ms 200
l2arc_feed_secs 1
l2arc_headroom 2
l2arc_headroom_boost 200
l2arc_noprefetch 1
l2arc_norw 0
l2arc_write_boost 8388608
l2arc_write_max 8388608
metaslab_aliquot 524288
metaslab_bias_enabled 1
metaslab_debug_load 0
metaslab_debug_unload 0
metaslab_fragmentation_factor_enabled 1
metaslab_lba_weighting_enabled 1
metaslab_preload_enabled 1
metaslabs_per_vdev 200
send_holes_without_birth_time 1
spa_asize_inflation 24
spa_config_path /etc/zfs/zpool.cache
spa_load_verify_data 1
spa_load_verify_maxinflight 10000
spa_load_verify_metadata 1
spa_slop_shift 5
zfetch_array_rd_sz 1048576
zfetch_max_distance 8388608
zfetch_max_streams 8
zfetch_min_sec_reap 2
zfs_abd_scatter_enabled 1
zfs_abd_scatter_max_order 10
zfs_admin_snapshot 1
zfs_arc_average_blocksize 8192
zfs_arc_dnode_limit 0
zfs_arc_dnode_limit_percent 10
zfs_arc_dnode_reduce_percent 10
zfs_arc_grow_retry 0
zfs_arc_lotsfree_percent 10
zfs_arc_max 0
zfs_arc_meta_adjust_restarts 4096
zfs_arc_meta_limit 0
zfs_arc_meta_limit_percent 75
zfs_arc_meta_min 0
zfs_arc_meta_prune 10000
zfs_arc_meta_strategy 1
zfs_arc_min 0
zfs_arc_min_prefetch_ms 0
zfs_arc_min_prescient_prefetch_ms 0
zfs_arc_p_aggressive_disable 1
zfs_arc_p_dampener_disable 1
zfs_arc_p_min_shift 0
zfs_arc_pc_percent 0
zfs_arc_shrink_shift 0
zfs_arc_sys_free 0
zfs_autoimport_disable 1
zfs_compressed_arc_enabled 1
zfs_dbgmsg_enable 0
zfs_dbgmsg_maxsize 4194304
zfs_dbuf_state_index 0
zfs_deadman_checktime_ms 5000
zfs_deadman_enabled 1
zfs_deadman_synctime_ms 1000000
zfs_dedup_prefetch 0
zfs_delay_min_dirty_percent 60
zfs_delay_scale 500000
zfs_delete_blocks 20480
zfs_dirty_data_max 3364727193
zfs_dirty_data_max_max 4294967296
zfs_dirty_data_max_max_percent 25
zfs_dirty_data_max_percent 10
zfs_dirty_data_sync 67108864
zfs_dmu_offset_next_sync 0
zfs_expire_snapshot 300
zfs_flags 0
zfs_free_bpobj_enabled 1
zfs_free_leak_on_eio 0
zfs_free_max_blocks 100000
zfs_free_min_time_ms 1000
zfs_immediate_write_sz 32768
zfs_key_max_salt_uses 400000000
zfs_max_recordsize 1048576
zfs_mdcomp_disable 0
zfs_metaslab_fragmentation_threshold 70
zfs_metaslab_segment_weight_enabled 1
zfs_metaslab_switch_threshold 2
zfs_mg_fragmentation_threshold 85
zfs_mg_noalloc_threshold 0
zfs_multihost_fail_intervals 5
zfs_multihost_history 0
zfs_multihost_import_intervals 10
zfs_multihost_interval 1000
zfs_multilist_num_sublists 0
zfs_no_scrub_io 0
zfs_no_scrub_prefetch 0
zfs_nocacheflush 0
zfs_nopwrite_enabled 1
zfs_object_mutex_size 64
zfs_pd_bytes_max 52428800
zfs_per_txg_dirty_frees_percent 30
zfs_prefetch_disable 0
zfs_read_chunk_size 1048576
zfs_read_history 0
zfs_read_history_hits 0
zfs_recover 0
zfs_resilver_min_time_ms 3000
zfs_scan_checkpoint_intval 7200
zfs_scan_fill_weight 3
zfs_scan_issue_strategy 0
zfs_scan_legacy 0
zfs_scan_mem_lim_fact 20
zfs_scan_mem_lim_soft_fact 20
zfs_scan_strict_mem_lim 0
zfs_scan_vdev_limit 4194304
zfs_scrub_min_time_ms 1000
zfs_send_corrupt_data 0
zfs_sync_pass_deferred_free 2
zfs_sync_pass_dont_compress 5
zfs_sync_pass_rewrite 2
zfs_sync_taskq_batch_pct 75
zfs_txg_history 100
zfs_txg_timeout 5
zfs_vdev_aggregation_limit 1048576
zfs_vdev_async_read_max_active 3
zfs_vdev_async_read_min_active 1
zfs_vdev_async_write_active_max_dirty_percent 60
zfs_vdev_async_write_active_min_dirty_percent 30
zfs_vdev_async_write_max_active 10
zfs_vdev_async_write_min_active 2
zfs_vdev_cache_bshift 16
zfs_vdev_cache_max 16384
zfs_vdev_cache_size 0
zfs_vdev_max_active 1000
zfs_vdev_mirror_non_rotating_inc 0
zfs_vdev_mirror_non_rotating_seek_inc 1
zfs_vdev_mirror_rotating_inc 0
zfs_vdev_mirror_rotating_seek_inc 5
zfs_vdev_mirror_rotating_seek_offset 1048576
zfs_vdev_queue_depth_pct 1000
zfs_vdev_raidz_impl [fastest] original scalar sse2 ssse3
zfs_vdev_read_gap_limit 32768
zfs_vdev_scheduler noop
zfs_vdev_scrub_max_active 2
zfs_vdev_scrub_min_active 1
zfs_vdev_sync_read_max_active 10
zfs_vdev_sync_read_min_active 10
zfs_vdev_sync_write_max_active 10
zfs_vdev_sync_write_min_active 10
zfs_vdev_write_gap_limit 4096
zfs_zevent_cols 80
zfs_zevent_console 0
zfs_zevent_len_max 128
zfs_zil_clean_taskq_maxalloc 1048576
zfs_zil_clean_taskq_minalloc 1024
zfs_zil_clean_taskq_nthr_pct 100
zil_replay_disable 0
zil_slog_bulk 786432
zio_delay_max 30000
zio_dva_throttle_enabled 1
zio_requeue_io_start_cut_in_line 1
zio_taskq_batch_pct 75
zvol_inhibit_dev 0
zvol_major 230
zvol_max_discard_blocks 16384
zvol_prefetch_bytes 131072
zvol_request_sync 0
zvol_threads 32
zvol_volmode 1
`
I have destroyed all the datasets with dedup turned on. zpool status -D returns no DDT entries.
Performance seems to normal, despite the z_null_int process eating up a lot of the I/O % time.
@behlendorf do you need any more information related to this issue? I'm having the same problem on my machines, so I can post something which may help to fix it.
I've opened PR #6989 with a proposed fix. The patch addresses the reported ARC hit rate regression which could explain the additional IO reported here. Any additional testing of the patch would be welcome once it passes all of the automated testing.
I will definitely test this when the patch comes out, though I suspect others will beat me to the punch as I have only ever installed ZFS through the AUR on Arch Linux, and there's always a lag between a new ZFS release and it hitting the AUR
I am still seeing high disk I/O with this patch applied; http://apolloner.eu/~apollo13/out.perf shows the events of a
perf record -ag -e probe:zio_null -F 99 -- sleep 60
run
I have upgraded to 0.7.6 yesterday.
iotop is still showing 99.99% IO on z_null_int.
zfs-0.7.6-1.fc27.x86_64
spl-0.7.6-1.fc27.x86_64
kernel 4.14.11-300.fc27.x86_64
@apollo13 according to the logs you posted the pool looks to be busy handling IO for real applications. In order to get to the bottom of this we're going to want to gather logs, using the perf command above, from an idle system running 0.7.6 which is exhibiting the problem.
perf record -ag -e probe:zio_null -F 99 -- sleep 60
I'm reopening this issue so we can get to the bottom of it.
@behlendorf http://apolloner.eu/~apollo13/out2.perf is the idle log. The time zio_null_int spends with 99% I/O is reduced by much (only every 2-5 seconds). That said, the system doesn't feel slow even with VMs running where zio_null_int is constantly at 99%. It basically stays at 99% as soon as I start a VM which still makes be wonder/think if it could be somewhat of a zfs problem as opposed to a real I/O problem.
@apollo13 thanks for the speedy reply. Am I correct in assuming ZFS is being used as the root pool on this system? From what I can see in the logs the following process are all actively performing IO.
perf_4.13 - Presumably writing the out2.perf logzpool iostat - Reading from /etc/hostid. This looks like a minor contributing factor.zfs - is being run occasionally and causing reads.cfs_loop, pveproxy worker, sshd, ls, pvesr and pickup - are pretty active in the filesystem.txg_sync - Normal behavior every few seconds.Does this match up with the top processes reported by iotop? From what I can see z_null_int is mostly in IO wait while these blocks are read from the ARC or disk as needed. Which is pretty normal behavior, and the improved hit rate would explain why less time it being spent there now.
One way to get a much better sense of exactly what is being read from disk is to enable the read history. This keeps a rolling log of the last N blocks which needed to be read from disk. It won't include anything which was serviced from the ARC cache.
echo 1000 >/sys/module/zfs/parameters/zfs_read_history
cat /proc/spl/kstat/zfs/<pool>/reads
@behlendorf Thank you for the detailed response, I'll answer as good as I can.
cfs_loop and kvm (when vms are running) seem to be the main offendersThat said, the following tests were done without doing anything else; so ls etc shouldn't show up at all!
For the idle case: http://apolloner.eu/~apollo13/proxmox_zfs/idle/ -- From the looks of it, z_null_int spikes to 99.99% every 5 seconds
For the load case (with VMs running): http://apolloner.eu/~apollo13/proxmox_zfs/load/ -- basically constantly at 99% I/O with (actual) disk writes between 0 and 15 MB/s -- the disk write from the KVM processes together seems to be less than 1 MB/s (probably even less than 100 KB/s). All in all the system still feels stable.
I'm experiencing exactly same (or very similar) problem as @apollo13 for past 2 months. Also using Proxmox with patched zfs 0.7.4 (today I asked Proxmox guys to update to full 0.7.6 release).
Problem occurs every time I try to do backup (to another server or at dedicated testing drive) or copy of larger (40 GB +) files.
Problem does not occurs when high IO is present inside of VM, only on host.
I have two pools (one for system and one for data - both mirrored ssd).
Every time I'm moving files from the data pool extremely high IO will appear on system pool and system sometimes completely freeze without any chance of recovery - and I see always huge loads in neighborhood of 170. After test I discovered that problem does't occur if I do backups to pool itself - that's how I determinated that it's not HW problem.
Just please note that this problem renders this server as basically not able to do majority of the backups.
My HW:
CPU is: Intel Xeon E5-2620 v4
RAM: 64 GB
DATA DISK: 2x 1TB SSD in RAID 10 on ZFS
BOOT DISK: 2x 32GB satadom in RAID 10
If I have any free time tomorrow I will replicate problem and catch any required data.
For now here are just parameters of the pools and some tests that I have done previously:
My tunables (I tried lowering ARC to see if it has any effect but what I can see it doesn't effect slowdowns at all):
options zfs zfs_arc_max=5120000000
options zfs zfs_arc_min=1024000000
options zfs zfs_prefetch_disable=1
options zfs zfs_txg_timeout=5
DATA pool:
zfs get all DP1
NAME PROPERTY VALUE SOURCE
DP1 type filesystem -
DP1 creation Fri Dec 16 20:21 2016 -
DP1 used 437G -
DP1 available 423G -
DP1 referenced 194G -
DP1 compressratio 1.12x -
DP1 mounted yes -
DP1 quota none default
DP1 reservation none default
DP1 recordsize 128K default
DP1 mountpoint /DP1 default
DP1 sharenfs off default
DP1 checksum on default
DP1 compression lz4 local
DP1 atime off local
DP1 devices on default
DP1 exec on default
DP1 setuid on default
DP1 readonly off default
DP1 zoned off default
DP1 snapdir hidden default
DP1 aclinherit restricted default
DP1 createtxg 1 -
DP1 canmount on default
DP1 xattr sa local
DP1 copies 1 default
DP1 version 5 -
DP1 utf8only off -
DP1 normalization none -
DP1 casesensitivity sensitive -
DP1 vscan off default
DP1 nbmand off default
DP1 sharesmb off default
DP1 refquota none default
DP1 refreservation none default
DP1 guid 6791509815837183392 -
DP1 primarycache all default
DP1 secondarycache all default
DP1 usedbysnapshots 0B -
DP1 usedbydataset 194G -
DP1 usedbychildren 243G -
DP1 usedbyrefreservation 0B -
DP1 logbias latency default
DP1 dedup off default
DP1 mlslabel none default
DP1 sync standard default
DP1 dnodesize legacy default
DP1 refcompressratio 1.00x -
DP1 written 194G -
DP1 logicalused 487G -
DP1 logicalreferenced 193G -
DP1 volmode default default
DP1 filesystem_limit none default
DP1 snapshot_limit none default
DP1 filesystem_count none default
DP1 snapshot_count none default
DP1 snapdev hidden default
DP1 acltype off default
DP1 context none default
DP1 fscontext none default
DP1 defcontext none default
DP1 rootcontext none default
DP1 relatime off default
DP1 redundant_metadata all default
DP1 overlay off default
System Pool:
zfs get all rpool
NAME PROPERTY VALUE SOURCE
rpool type filesystem -
rpool creation Sun Dec 4 17:00 2016 -
rpool used 9.20G -
rpool available 19.6G -
rpool referenced 96K -
rpool compressratio 1.45x -
rpool mounted yes -
rpool quota none default
rpool reservation none default
rpool recordsize 128K default
rpool mountpoint /rpool default
rpool sharenfs off default
rpool checksum on default
rpool compression lz4 local
rpool atime off local
rpool devices on default
rpool exec on default
rpool setuid on default
rpool readonly off default
rpool zoned off default
rpool snapdir hidden default
rpool aclinherit restricted default
rpool createtxg 1 -
rpool canmount on default
rpool xattr sa local
rpool copies 1 default
rpool version 5 -
rpool utf8only off -
rpool normalization none -
rpool casesensitivity sensitive -
rpool vscan off default
rpool nbmand off default
rpool sharesmb off default
rpool refquota none default
rpool refreservation none default
rpool guid 2481723427000224415 -
rpool primarycache all default
rpool secondarycache all default
rpool usedbysnapshots 0B -
rpool usedbydataset 96K -
rpool usedbychildren 9.20G -
rpool usedbyrefreservation 0B -
rpool logbias latency default
rpool dedup off default
rpool mlslabel none default
rpool sync standard local
rpool dnodesize legacy default
rpool refcompressratio 1.00x -
rpool written 96K -
rpool logicalused 10.5G -
rpool logicalreferenced 40K -
rpool volmode default default
rpool filesystem_limit none default
rpool snapshot_limit none default
rpool filesystem_count none default
rpool snapshot_count none default
rpool snapdev hidden default
rpool acltype off default
rpool context none default
rpool fscontext none default
rpool defcontext none default
rpool rootcontext none default
rpool relatime off default
rpool redundant_metadata all default
rpool overlay off default
sdd & sde are main data SSD (where data are copied from)
sdf & sdg are system SSD
sda & sdb are test HDD (to where rsync should copy data)
Command: rsync -r --progress /DP1/backups/ /DT3btrfs/backups/ - I tested it just because I didn't have any other Ideas with different file system as target, in this case btrfs but it doesn't matter what is the target drive.
As you can see System SSD (including swap) are hammered with data (utilization 100%) during data transfers on another drives, which I do not understand...
iostat -x -d 2
Linux 4.13.13-2-pve (klaas) 01/03/2018 _x86_64_ (16 CPU)
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
loop0 0.00 0.00 0.04 2.00 0.17 869.78 850.88 0.01 4.13 5.71 4.10 1.46 0.30
sda 0.00 0.00 0.00 0.00 0.00 0.00 40.94 0.00 2.98 2.98 0.00 1.96 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 40.94 0.00 3.58 3.58 0.00 2.96 0.00
sdc 0.00 2.00 0.02 0.71 0.26 108.82 297.28 0.02 22.87 7.26 23.33 9.11 0.67
sdd 0.00 0.01 12.79 39.53 794.05 645.26 55.02 0.02 0.29 0.71 0.15 0.19 0.99
sde 0.00 0.00 12.80 39.00 794.16 645.26 55.58 0.02 0.30 0.72 0.17 0.20 1.04
sdf 0.00 0.00 0.88 10.16 10.27 139.85 27.22 0.13 11.66 4.42 12.28 5.96 6.57
sdg 0.00 0.00 0.89 10.39 10.32 139.85 26.63 0.14 12.53 4.38 13.24 6.41 7.23
zd0 0.00 0.00 0.04 0.24 0.16 0.94 8.00 0.02 87.75 5.03 101.71 35.04 0.97
zd16 0.00 0.00 0.33 0.46 2.82 8.71 28.95 0.00 1.17 0.28 1.80 0.11 0.01
zd32 0.00 0.00 0.03 5.96 0.77 88.80 29.88 0.00 0.19 0.31 0.18 0.02 0.01
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
loop0 0.00 0.00 0.00 0.50 0.00 2.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.50 0.00 1.00 0.00 6.00 12.00 0.01 6.00 0.00 6.00 6.00 0.60
sdd 0.00 0.00 17.50 16.50 24.00 162.00 10.94 0.01 0.35 0.69 0.00 0.35 1.20
sde 0.00 0.00 16.50 16.50 18.00 162.00 10.91 0.01 0.30 0.61 0.00 0.30 1.00
sdf 0.00 0.50 0.50 2.50 0.00 22.00 14.67 2.70 754.67 792.00 747.20 333.33 100.00
sdg 0.00 0.00 2.50 3.00 8.00 30.00 13.82 0.39 73.45 128.00 28.00 35.64 19.60
zd0 0.00 0.00 0.00 1.50 0.00 6.00 8.00 3.99 728.00 0.00 728.00 666.67 100.00
zd16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
zd32 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 20.50 6.00 1566.00 104.00 126.04 0.01 0.30 0.39 0.00 0.23 0.60
sde 0.00 0.00 20.00 6.00 1690.00 104.00 138.00 0.01 0.46 0.40 0.67 0.38 1.00
sdf 0.00 0.50 13.50 44.50 10.00 646.00 22.62 2.93 68.03 78.67 64.81 16.97 98.40
sdg 0.50 0.50 19.00 44.00 40.00 630.00 21.27 2.85 44.41 34.74 48.59 15.24 96.00
zd0 0.00 0.00 0.00 11.00 0.00 44.00 8.00 2.59 375.45 0.00 375.45 91.09 100.20
zd16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
zd32 0.00 0.00 0.00 4.00 0.00 32.00 16.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 61.00 117.00 7028.00 3236.00 115.33 0.04 0.25 0.49 0.12 0.19 3.40
sde 0.00 0.00 40.00 84.00 4680.00 3236.00 127.68 0.07 0.55 1.20 0.24 0.40 5.00
sdf 0.00 0.50 7.00 9.50 78.00 852.00 112.73 3.64 222.18 147.71 277.05 60.61 100.00
sdg 0.00 0.00 7.00 15.50 32.00 1556.00 141.16 2.89 121.60 59.71 149.55 44.44 100.00
zd0 0.00 0.00 0.00 21.00 0.00 84.00 8.00 19.72 2074.95 0.00 2074.95 47.62 100.00
zd16 0.00 0.00 0.00 1.00 0.00 4.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00
zd32 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
loop0 0.00 0.00 0.00 1.00 0.00 4.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.50 0.00 1.50 0.00 8.00 10.67 0.02 13.33 0.00 13.33 13.33 2.00
sdd 0.00 0.00 10.50 4.00 832.00 50.00 121.66 0.01 0.41 0.57 0.00 0.28 0.40
sde 0.00 0.00 8.50 4.00 576.00 50.00 100.16 0.02 1.28 0.94 2.00 1.12 1.40
sdf 0.00 2.00 5.50 11.50 12.00 1534.00 181.88 2.76 160.59 110.18 184.70 58.82 100.00
sdg 0.00 1.50 6.00 13.00 48.00 1622.00 175.79 2.86 156.42 107.67 178.92 52.63 100.00
zd0 0.00 0.00 4.00 34.50 16.00 138.00 8.00 22.63 692.10 120.00 758.43 25.97 100.00
zd16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
zd32 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 5.50 17.00 4.00 378.00 33.96 0.00 0.09 0.36 0.00 0.09 0.20
sde 0.00 0.00 7.50 6.50 42.00 98.00 20.00 0.01 0.71 0.53 0.92 0.57 0.80
sdf 0.00 1.00 7.50 11.00 28.00 1384.00 152.65 3.16 152.65 105.60 184.73 54.05 100.00
sdg 0.00 0.50 4.00 8.00 16.00 976.00 165.33 3.36 208.00 192.50 215.75 83.33 100.00
zd0 0.00 0.00 7.00 17.50 28.00 70.00 8.00 25.68 592.65 231.71 737.03 40.82 100.00
zd16 0.00 0.00 0.00 3.50 0.00 14.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00
zd32 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
loop0 0.00 0.00 0.00 0.50 0.00 2.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.50 0.00 1.00 0.00 6.00 12.00 0.01 10.00 0.00 10.00 10.00 1.00
sdd 0.00 0.00 3.00 66.50 14.00 1308.00 38.04 0.01 0.17 1.33 0.12 0.12 0.80
sde 0.00 0.00 2.50 57.00 0.00 1588.00 53.38 0.01 0.24 1.60 0.18 0.17 1.00
sdf 0.00 0.00 1.50 1.00 6.00 128.00 107.20 3.27 1056.80 1004.00 1136.00 400.00 100.00
sdg 0.00 0.00 0.00 0.50 0.00 64.00 256.00 3.62 2176.00 0.00 2176.00 2000.00 100.00
zd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 25.00 0.00 0.00 0.00 0.00 100.00
zd16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
zd32 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
This is for example log from backup - you can see how speed will start to drop drastically (usually to the point of KB/s)
INFO: status: 0% (364904448/42949672960), sparse 0% (36098048), duration 3, read/write 121/109 MB/s
INFO: status: 1% (707002368/42949672960), sparse 0% (44933120), duration 6, read/write 114/111 MB/s
INFO: status: 2% (1045299200/42949672960), sparse 0% (50085888), duration 9, read/write 112/111 MB/s
INFO: status: 3% (1436811264/42949672960), sparse 0% (54272000), duration 12, read/write 130/129 MB/s
INFO: status: 4% (1816920064/42949672960), sparse 0% (54272000), duration 15, read/write 126/126 MB/s
INFO: status: 5% (2197028864/42949672960), sparse 0% (84951040), duration 18, read/write 126/116 MB/s
INFO: status: 6% (2580938752/42949672960), sparse 0% (85008384), duration 21, read/write 127/127 MB/s
INFO: status: 7% (3085434880/42949672960), sparse 0% (85286912), duration 25, read/write 126/126 MB/s
INFO: status: 8% (3450929152/42949672960), sparse 0% (89493504), duration 28, read/write 121/120 MB/s
INFO: status: 9% (3965386752/42949672960), sparse 0% (93696000), duration 32, read/write 128/127 MB/s
INFO: status: 10% (4295229440/42949672960), sparse 0% (95772672), duration 35, read/write 109/109 MB/s
INFO: status: 11% (4774166528/42949672960), sparse 0% (116699136), duration 39, read/write 119/114 MB/s
INFO: status: 12% (5241700352/42949672960), sparse 0% (116822016), duration 43, read/write 116/116 MB/s
INFO: status: 13% (5648416768/42949672960), sparse 0% (116858880), duration 46, read/write 135/135 MB/s
INFO: status: 14% (6013321216/42949672960), sparse 0% (116965376), duration 49, read/write 121/121 MB/s
INFO: status: 15% (6553075712/42949672960), sparse 0% (143826944), duration 53, read/write 134/128 MB/s
INFO: status: 16% (6898057216/42949672960), sparse 0% (148045824), duration 62, read/write 38/37 MB/s
INFO: status: 17% (7355105280/42949672960), sparse 0% (148209664), duration 217, read/write 2/2 MB/s
Usually during transfer I will also get:
[ 363.485362] INFO: task txg_sync:785 blocked for more than 120 seconds.
[ 363.485388] Tainted: P O 4.13.13-5-pve #1
[ 363.485406] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 363.485431] txg_sync D 0 785 2 0x00000000
[ 363.485433] Call Trace:
[ 363.485441] __schedule+0x3cc/0x850
[ 363.485443] schedule+0x36/0x80
[ 363.485446] io_schedule+0x16/0x40
[ 363.485453] cv_wait_common+0xb2/0x140 [spl]
[ 363.485456] ? wait_woken+0x80/0x80
[ 363.485460] __cv_wait_io+0x18/0x20 [spl]
[ 363.485503] zio_wait+0xfd/0x1b0 [zfs]
[ 363.485548] dsl_pool_sync+0xb8/0x440 [zfs]
[ 363.485635] spa_sync+0x42d/0xdb0 [zfs]
[ 363.485694] txg_sync_thread+0x2d4/0x4a0 [zfs]
[ 363.485719] ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[ 363.485722] thread_generic_wrapper+0x72/0x80 [spl]
[ 363.485724] kthread+0x109/0x140
[ 363.485727] ? __thread_exit+0x20/0x20 [spl]
[ 363.485728] ? kthread_create_on_node+0x70/0x70
[ 363.485728] ? kthread_create_on_node+0x70/0x70
[ 363.485730] ret_from_fork+0x1f/0x30
[ 387.639108] kauditd_printk_skb: 1 callbacks suppressed
[ 387.639109] audit: type=1400 audit(1517141349.643:29): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-container-default-cgns" name="/" pid=10490 comm="(ionclean)" flags="rw, rslave"
[ 484.317903] INFO: task txg_sync:785 blocked for more than 120 seconds.
[ 484.317931] Tainted: P O 4.13.13-5-pve #1
[ 484.317955] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 484.317997] txg_sync D 0 785 2 0x00000000
[ 484.318000] Call Trace:
[ 484.318007] __schedule+0x3cc/0x850
[ 484.318009] schedule+0x36/0x80
[ 484.318012] io_schedule+0x16/0x40
[ 484.318019] cv_wait_common+0xb2/0x140 [spl]
[ 484.318022] ? wait_woken+0x80/0x80
[ 484.318026] __cv_wait_io+0x18/0x20 [spl]
[ 484.318069] zio_wait+0xfd/0x1b0 [zfs]
[ 484.318094] dsl_pool_sync+0xb8/0x440 [zfs]
[ 484.318153] spa_sync+0x42d/0xdb0 [zfs]
[ 484.318197] txg_sync_thread+0x2d4/0x4a0 [zfs]
[ 484.318222] ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
[ 484.318225] thread_generic_wrapper+0x72/0x80 [spl]
[ 484.318227] kthread+0x109/0x140
[ 484.318229] ? __thread_exit+0x20/0x20 [spl]
[ 484.318230] ? kthread_create_on_node+0x70/0x70
[ 484.318231] ? kthread_create_on_node+0x70/0x70
[ 484.318233] ret_from_fork+0x1f/0x30
[ 605.154228] INFO: task txg_sync:785 blocked for more than 120 seconds.
[ 605.154258] Tainted: P O 4.13.13-5-pve #1
[ 605.154277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
(6x more the same)
EDIT Of course I also have in iotop:
7892 be/0 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [z_null_int]
509 be/0 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [z_null_int]
We also had a problem with high IO load (z_null_int). The system's root was on the same zfs pool as data storage for VMs. We got rid of this problem by reinstalling the system on different disks (not using ZFS anymore for root of the system).
When we imported the pool in reinstalled system (using the same version of ZFS) the problem was gone and did not appear again - it's over 2 months now.
Hope this info will help somehow.
Kinda wanted to use ZFS on the root, for nice quick roll backs after updates/upgrades.
@cooljimy84 It was working for over a year before first problem started for me so It's something that was updated approximately 3-5 months ago.
@vlcek did you transfer your proxmox config without any problem? I know about Script From Dan Ziemecki but I actually never tested if it transfers everything.
@jkalousek we are not using proxmox (only debian and xen as hypervisor) so I cannot help you with that.
@behlendorf well I tried to install perf (from git) but during cloning, host system stopped responding (probably because of this problem) and after 15 minutes of waiting (VM were running fine) I did hard reboot of whole server so I wont be probably able to provide perf log...
@behlendorf This is zfs_read for me during idle (all VM running - cpu usage ~4%, overall system load ~1)
23 0 0x01 500 56000 4896113145 418199471302
UID start objset object level blkid aflags origin pid process
321 362336832071 0x37 33090 0 0 0x12 27903 lxc-info
322 362336878504 0x37 237046 0 0 0x12 27903 lxc-info
323 362336887754 0x37 33090 0 0 0x12 27903 lxc-info
324 362344523315 0x37 33090 0 0 0x12 27905 lxc-info
325 362344553681 0x37 237046 0 0 0x12 27905 lxc-info
326 362344579011 0x37 33090 0 0 0x12 27905 lxc-info
327 362365912938 0x37 5760 0 0 0x12 27910 zpool
328 362370411944 0x37 5760 0 0 0x12 27911 zfs
329 362393990946 0x37 5760 0 0 0x12 27920 zpool
330 362398748410 0x37 5760 0 0 0x12 27921 zfs
331 362420954872 0x37 33090 0 0 0x12 27927 lxc-info
332 362420987507 0x37 237046 0 0 0x12 27927 lxc-info
333 362420996011 0x37 33090 0 0 0x12 27927 lxc-info
334 362428193837 0x37 33090 0 0 0x12 27929 lxc-info
335 362428223267 0x37 237046 0 0 0x12 27929 lxc-info
336 362428232016 0x37 33090 0 0 0x12 27929 lxc-info
337 362436028730 0x37 33090 0 0 0x12 27931 lxc-info
338 362436064990 0x37 237046 0 0 0x12 27931 lxc-info
339 362436076290 0x37 33090 0 0 0x12 27931 lxc-info
340 362443422212 0x37 33090 0 0 0x12 27933 lxc-info
341 362443463510 0x37 237046 0 0 0x12 27933 lxc-info
342 362443474172 0x37 33090 0 0 0x12 27933 lxc-info
343 362450862768 0x37 33090 0 0 0x12 27935 lxc-info
344 362450906817 0x37 237046 0 0 0x12 27935 lxc-info
345 362450915483 0x37 33090 0 0 0x12 27935 lxc-info
346 362458441401 0x37 33090 0 0 0x12 27937 lxc-info
347 362458468888 0x37 237046 0 0 0x12 27937 lxc-info
348 362458477171 0x37 33090 0 0 0x12 27937 lxc-info
349 363454018910 0x37 182855 0 0 0x12 28236 pvesh
350 363909431688 0x37 33090 0 0 0x12 28283 lxc-info
351 363909463555 0x37 237046 0 0 0x12 28283 lxc-info
352 363909473668 0x37 33090 0 0 0x12 28283 lxc-info
354 364534546084 0x37 33090 0 0 0x12 28299 lxc-info
355 364534578974 0x37 237046 0 0 0x12 28299 lxc-info
356 364534589350 0x37 33090 0 0 0x12 28299 lxc-info
357 366849433154 0x37 9954 0 0 0x12 20154 bash
358 366849894922 0x37 0 0 6060 0x12 20154 bash
359 366908790963 0x37 193922 0 0 0x12 28327 bash
360 367111076440 0x37 5759 0 0 0x12 28341 curl
361 367111095606 0x37 6019 0 0 0x12 28341 curl
362 367111242111 0x37 6019 0 0 0x12 28341 curl
363 368491945915 0x37 182855 0 0 0x12 28676 pvesh
364 368929074347 0x37 33090 0 0 0x12 28729 lxc-info
365 368929125414 0x37 237046 0 0 0x12 28729 lxc-info
366 368929137208 0x37 33090 0 0 0x12 28729 lxc-info
367 372445153521 0x37 182855 0 0 0x12 28783 pvesh
368 372499518460 0x37 33090 0 0 0x12 28786 lxc-info
369 372499585259 0x37 237046 0 0 0x12 28786 lxc-info
370 372499594237 0x37 33090 0 0 0x12 28786 lxc-info
371 372507740835 0x37 33090 0 0 0x12 28788 lxc-info
372 372507774559 0x37 237046 0 0 0x12 28788 lxc-info
373 372507783729 0x37 33090 0 0 0x12 28788 lxc-info
374 372515457680 0x37 33090 0 0 0x12 28790 lxc-info
375 372515488026 0x37 237046 0 0 0x12 28790 lxc-info
376 372515496816 0x37 33090 0 0 0x12 28790 lxc-info
377 372522530185 0x37 33090 0 0 0x12 28792 lxc-info
378 372522558590 0x37 237046 0 0 0x12 28792 lxc-info
379 372522567165 0x37 33090 0 0 0x12 28792 lxc-info
380 372530310337 0x37 33090 0 0 0x12 28794 lxc-info
381 372530340944 0x37 237046 0 0 0x12 28794 lxc-info
382 372530350460 0x37 33090 0 0 0x12 28794 lxc-info
383 372538138211 0x37 33090 0 0 0x12 28796 lxc-info
384 372538186134 0x37 237046 0 0 0x12 28796 lxc-info
385 372538194673 0x37 33090 0 0 0x12 28796 lxc-info
386 372560022356 0x37 5760 0 0 0x12 28803 zpool
388 372579569784 0x37 5760 0 0 0x12 28808 zpool
389 372583889373 0x37 5760 0 0 0x12 28809 zfs
390 372614085411 0x37 33090 0 0 0x12 28820 lxc-info
391 372614117611 0x37 237046 0 0 0x12 28820 lxc-info
392 372614126314 0x37 33090 0 0 0x12 28820 lxc-info
393 372621667535 0x37 33090 0 0 0x12 28822 lxc-info
394 372621699255 0x37 237046 0 0 0x12 28822 lxc-info
395 372621709058 0x37 33090 0 0 0x12 28822 lxc-info
396 372628924568 0x37 33090 0 0 0x12 28824 lxc-info
397 372628952777 0x37 237046 0 0 0x12 28824 lxc-info
398 372628961092 0x37 33090 0 0 0x12 28824 lxc-info
399 372636436553 0x37 33090 0 0 0x12 28826 lxc-info
400 372636467423 0x37 237046 0 0 0x12 28826 lxc-info
401 372636478140 0x37 33090 0 0 0x12 28826 lxc-info
402 372644363955 0x37 33090 0 0 0x12 28828 lxc-info
403 372644415366 0x37 237046 0 0 0x12 28828 lxc-info
404 372644424370 0x37 33090 0 0 0x12 28828 lxc-info
405 372651960567 0x37 33090 0 0 0x12 28830 lxc-info
406 372651990497 0x37 237046 0 0 0x12 28830 lxc-info
407 372651999868 0x37 33090 0 0 0x12 28830 lxc-info
408 372790055832 0x37 5883 0 0 0x12 28837 modprobe
409 372790070342 0x37 5883 0 0 0x12 28837 modprobe
410 372790720147 0x37 197011 0 0 0x12 28837 modprobe
411 372791213698 0x37 5883 0 0 0x12 28838 modprobe
412 372791227038 0x37 5883 0 0 0x12 28838 modprobe
413 372791811790 0x37 197011 0 0 0x12 28838 modprobe
414 372792351907 0x37 5883 0 0 0x12 28839 modprobe
415 372792364914 0x37 5883 0 0 0x12 28839 modprobe
416 372792931222 0x37 197011 0 0 0x12 28839 modprobe
417 372793438935 0x37 5883 0 0 0x12 28840 modprobe
418 372793452630 0x37 5883 0 0 0x12 28840 modprobe
419 372794024196 0x37 197011 0 0 0x12 28840 modprobe
420 372794534237 0x37 5883 0 0 0x12 28841 modprobe
422 372795240218 0x37 197011 0 0 0x12 28841 modprobe
423 372901449187 0x37 33090 0 0 0x12 28844 lxc-info
424 372901482174 0x37 237046 0 0 0x12 28844 lxc-info
425 372901492740 0x37 33090 0 0 0x12 28844 lxc-info
426 373788275199 0x37 5883 0 0 0x12 29285 modprobe
427 373788292002 0x37 5883 0 0 0x12 29285 modprobe
428 373789059196 0x37 197011 0 0 0x12 29285 modprobe
429 373789670232 0x37 5883 0 0 0x12 29286 modprobe
430 373789682429 0x37 5883 0 0 0x12 29286 modprobe
431 373790303778 0x37 197011 0 0 0x12 29286 modprobe
432 373790843311 0x37 5883 0 0 0x12 29288 modprobe
433 373790855193 0x37 5883 0 0 0x12 29288 modprobe
434 373791462469 0x37 197011 0 0 0x12 29288 modprobe
435 373791965308 0x37 5883 0 0 0x12 29289 modprobe
436 373791977558 0x37 5883 0 0 0x12 29289 modprobe
437 373792523031 0x37 197011 0 0 0x12 29289 modprobe
438 373793007780 0x37 5883 0 0 0x12 29290 modprobe
439 373793019252 0x37 5883 0 0 0x12 29290 modprobe
440 373793523140 0x37 197011 0 0 0x12 29290 modprobe
441 376776933666 0x37 5883 0 0 0x12 29330 modprobe
442 376776949458 0x37 5883 0 0 0x12 29330 modprobe
443 376777496061 0x37 197011 0 0 0x12 29330 modprobe
444 376778138966 0x37 5883 0 0 0x12 29331 modprobe
445 376778151263 0x37 5883 0 0 0x12 29331 modprobe
446 376778652781 0x37 197011 0 0 0x12 29331 modprobe
447 376779195727 0x37 5883 0 0 0x12 29332 modprobe
448 376779208084 0x37 5883 0 0 0x12 29332 modprobe
449 376779726323 0x37 197011 0 0 0x12 29332 modprobe
450 376780347706 0x37 5883 0 0 0x12 29333 modprobe
451 376780359790 0x37 5883 0 0 0x12 29333 modprobe
452 376780944766 0x37 197011 0 0 0x12 29333 modprobe
453 376781553089 0x37 5883 0 0 0x12 29334 modprobe
454 376781566104 0x37 5883 0 0 0x12 29334 modprobe
456 376784834054 0x37 5883 0 0 0x12 29335 modprobe
457 376784862236 0x37 5883 0 0 0x12 29335 modprobe
458 376785359986 0x37 197011 0 0 0x12 29335 modprobe
459 376785937950 0x37 5883 0 0 0x12 29336 modprobe
460 376785950643 0x37 5883 0 0 0x12 29336 modprobe
461 376786478919 0x37 197011 0 0 0x12 29336 modprobe
462 376787123313 0x37 5883 0 0 0x12 29337 modprobe
463 376787135412 0x37 5883 0 0 0x12 29337 modprobe
464 376787635440 0x37 197011 0 0 0x12 29337 modprobe
465 376788250693 0x37 5883 0 0 0x12 29338 modprobe
466 376788263730 0x37 5883 0 0 0x12 29338 modprobe
467 376788807890 0x37 197011 0 0 0x12 29338 modprobe
468 376789357625 0x37 5883 0 0 0x12 29339 modprobe
469 376789369711 0x37 5883 0 0 0x12 29339 modprobe
470 376789898642 0x37 197011 0 0 0x12 29339 modprobe
471 376792565885 0x37 5883 0 0 0x12 29340 modprobe
472 376792578938 0x37 5883 0 0 0x12 29340 modprobe
473 376793184728 0x37 197011 0 0 0x12 29340 modprobe
474 376793721845 0x37 5883 0 0 0x12 29341 modprobe
475 376793733954 0x37 5883 0 0 0x12 29341 modprobe
476 376794576561 0x37 197011 0 0 0x12 29341 modprobe
477 376795198536 0x37 5883 0 0 0x12 29342 modprobe
478 376795212892 0x37 5883 0 0 0x12 29342 modprobe
479 376795813063 0x37 197011 0 0 0x12 29342 modprobe
480 376796382845 0x37 5883 0 0 0x12 29343 modprobe
481 376796395273 0x37 5883 0 0 0x12 29343 modprobe
482 376796927031 0x37 197011 0 0 0x12 29343 modprobe
483 376797586208 0x37 5883 0 0 0x12 29344 modprobe
484 376797598424 0x37 5883 0 0 0x12 29344 modprobe
485 376798117815 0x37 197011 0 0 0x12 29344 modprobe
486 377095243443 0x37 200711 0 6 0x12 7423 cfs_loop
487 378778664191 0x37 282046 0 0 0x12 29588 (pvesr)
488 378779919145 0x37 282046 0 0 0x12 29588 pvesr
490 378784436303 0x37 5883 0 0 0x12 29593 modprobe
491 378785604363 0x37 197011 0 0 0x12 29593 modprobe
492 378786286390 0x37 5883 0 0 0x12 29594 modprobe
493 378786300151 0x37 5883 0 0 0x12 29594 modprobe
494 378787016749 0x37 197011 0 0 0x12 29594 modprobe
495 378787549344 0x37 5883 0 0 0x12 29595 modprobe
496 378787561771 0x37 5883 0 0 0x12 29595 modprobe
497 378788187992 0x37 197011 0 0 0x12 29595 modprobe
498 378788666023 0x37 5883 0 0 0x12 29596 modprobe
499 378788678474 0x37 5883 0 0 0x12 29596 modprobe
500 378789330251 0x37 197011 0 0 0x12 29596 modprobe
501 378789780076 0x37 5883 0 0 0x12 29597 modprobe
502 378789793203 0x37 5883 0 0 0x12 29597 modprobe
503 378790406637 0x37 197011 0 0 0x12 29597 modprobe
504 378888306535 0x37 182855 0 0 0x12 29586 pvesh
505 378936547975 0x37 182855 0 0 0x12 29588 pvesr
506 379234592929 0x37 229676 0 0 0x12 29588 pvesr
507 379316741405 0x37 182855 0 0 0x12 29623 pvesh
508 379357601449 0x37 33090 0 0 0x12 29656 lxc-info
509 379357634855 0x37 237046 0 0 0x12 29656 lxc-info
510 379357645984 0x37 33090 0 0 0x12 29656 lxc-info
511 379781607065 0x37 33090 0 0 0x12 29664 lxc-info
512 379781640012 0x37 237046 0 0 0x12 29664 lxc-info
513 379781650518 0x37 33090 0 0 0x12 29664 lxc-info
514 379949499840 0x37 182855 0 0 0x12 29667 pvesh
515 380411086569 0x37 33090 0 0 0x12 29727 lxc-info
516 380411120076 0x37 237046 0 0 0x12 29727 lxc-info
517 380411130753 0x37 33090 0 0 0x12 29727 lxc-info
518 380773125922 0x37 5883 0 0 0x12 29759 modprobe
519 380773141820 0x37 5883 0 0 0x12 29759 modprobe
520 380773760739 0x37 197011 0 0 0x12 29759 modprobe
521 380774360160 0x37 5883 0 0 0x12 29760 modprobe
522 380774372763 0x37 5883 0 0 0x12 29760 modprobe
524 380775526843 0x37 5883 0 0 0x12 29761 modprobe
525 380775540409 0x37 5883 0 0 0x12 29761 modprobe
526 380776070026 0x37 197011 0 0 0x12 29761 modprobe
527 380776565179 0x37 5883 0 0 0x12 29762 modprobe
528 380776577782 0x37 5883 0 0 0x12 29762 modprobe
529 380777102377 0x37 197011 0 0 0x12 29762 modprobe
530 380777566546 0x37 5883 0 0 0x12 29763 modprobe
531 380777579325 0x37 5883 0 0 0x12 29763 modprobe
532 380778121558 0x37 197011 0 0 0x12 29763 modprobe
533 381205272749 0x37 139446 0 0 0x12 29778 mysqladmin
534 381205284907 0x37 139446 0 0 0x12 29778 mysqladmin
535 381205304456 0x37 187647 0 0 0x12 29778 mysqladmin
536 381205320925 0x37 187648 0 0 0x12 29778 mysqladmin
537 381205336032 0x37 194119 0 0 0x12 29778 mysqladmin
538 381205359599 0x37 194119 0 0 0x12 29778 mysqladmin
539 381333392668 0x37 182855 0 0 0x12 29774 pvesh
540 381620898720 0x37 5759 0 0 0x12 29791 nc
541 381620917811 0x37 6019 0 0 0x12 29791 nc
542 381621102743 0x37 6019 0 0 0x12 29791 nc
543 381943989823 0x37 182855 0 0 0x12 29797 pvesh
544 382692450943 0x37 33090 0 0 0x12 29812 lxc-info
545 382692484975 0x37 237046 0 0 0x12 29812 lxc-info
546 382692494051 0x37 33090 0 0 0x12 29812 lxc-info
547 382699969330 0x37 33090 0 0 0x12 29816 lxc-info
548 382699996451 0x37 237046 0 0 0x12 29816 lxc-info
549 382700004966 0x37 33090 0 0 0x12 29816 lxc-info
550 382707380365 0x37 33090 0 0 0x12 29818 lxc-info
551 382707408294 0x37 237046 0 0 0x12 29818 lxc-info
552 382707416851 0x37 33090 0 0 0x12 29818 lxc-info
553 382715085902 0x37 33090 0 0 0x12 29820 lxc-info
554 382715116189 0x37 237046 0 0 0x12 29820 lxc-info
555 382715125014 0x37 33090 0 0 0x12 29820 lxc-info
556 382722860781 0x37 33090 0 0 0x12 29823 lxc-info
558 382722900756 0x37 33090 0 0 0x12 29823 lxc-info
559 382731696093 0x37 33090 0 0 0x12 29825 lxc-info
560 382731730452 0x37 237046 0 0 0x12 29825 lxc-info
561 382731739636 0x37 33090 0 0 0x12 29825 lxc-info
562 382782156483 0x37 5760 0 0 0x12 29911 zpool
563 382787532977 0x37 5760 0 0 0x12 29912 zfs
564 382792210161 0x37 5760 0 0 0x12 29913 zpool
565 382796900710 0x37 5760 0 0 0x12 29914 zfs
566 382812879708 0x37 33090 0 0 0x12 29919 lxc-info
567 382812911286 0x37 237046 0 0 0x12 29919 lxc-info
568 382812919841 0x37 33090 0 0 0x12 29919 lxc-info
569 382820159321 0x37 33090 0 0 0x12 29921 lxc-info
570 382820187122 0x37 237046 0 0 0x12 29921 lxc-info
571 382820195635 0x37 33090 0 0 0x12 29921 lxc-info
572 382826695778 0x37 33090 0 0 0x12 29923 lxc-info
573 382826726290 0x37 237046 0 0 0x12 29923 lxc-info
574 382826735973 0x37 33090 0 0 0x12 29923 lxc-info
575 382833616123 0x37 33090 0 0 0x12 29925 lxc-info
576 382833643636 0x37 237046 0 0 0x12 29925 lxc-info
577 382833652375 0x37 33090 0 0 0x12 29925 lxc-info
578 382840578560 0x37 33090 0 0 0x12 29927 lxc-info
579 382840604567 0x37 237046 0 0 0x12 29927 lxc-info
580 382840612429 0x37 33090 0 0 0x12 29927 lxc-info
581 382847571322 0x37 33090 0 0 0x12 29929 lxc-info
582 382847614008 0x37 237046 0 0 0x12 29929 lxc-info
583 382847622459 0x37 33090 0 0 0x12 29929 lxc-info
584 383702656834 0x0 99 0 2 0x12 1034 txg_sync
585 383796299184 0x37 5883 0 0 0x12 30199 modprobe
586 383796314400 0x37 5883 0 0 0x12 30199 modprobe
587 383796945766 0x37 197011 0 0 0x12 30199 modprobe
588 383797384307 0x37 5883 0 0 0x12 30200 modprobe
589 383797396393 0x37 5883 0 0 0x12 30200 modprobe
590 383797967741 0x37 197011 0 0 0x12 30200 modprobe
592 383798405395 0x37 5883 0 0 0x12 30201 modprobe
593 383799037836 0x37 197011 0 0 0x12 30201 modprobe
594 383799512418 0x37 5883 0 0 0x12 30202 modprobe
595 383799525474 0x37 5883 0 0 0x12 30202 modprobe
596 383800118140 0x37 197011 0 0 0x12 30202 modprobe
597 383800522893 0x37 5883 0 0 0x12 30203 modprobe
598 383800534509 0x37 5883 0 0 0x12 30203 modprobe
599 383801108176 0x37 197011 0 0 0x12 30203 modprobe
600 383956702526 0x0 99 0 2 0x12 1034 txg_sync
601 390107328945 0x37 182855 0 0 0x12 30972 pvesh
602 392889479514 0x37 33090 0 0 0x12 31024 lxc-info
603 392889514022 0x37 237046 0 0 0x12 31024 lxc-info
604 392889523448 0x37 33090 0 0 0x12 31024 lxc-info
605 392897272189 0x37 33090 0 0 0x12 31026 lxc-info
606 392897303381 0x37 237046 0 0 0x12 31026 lxc-info
607 392897312187 0x37 33090 0 0 0x12 31026 lxc-info
608 392905313305 0x37 33090 0 0 0x12 31028 lxc-info
609 392905346870 0x37 237046 0 0 0x12 31028 lxc-info
610 392905356137 0x37 33090 0 0 0x12 31028 lxc-info
611 392913396048 0x37 33090 0 0 0x12 31030 lxc-info
612 392913431620 0x37 237046 0 0 0x12 31030 lxc-info
613 392913441008 0x37 33090 0 0 0x12 31030 lxc-info
614 392920629927 0x37 33090 0 0 0x12 31032 lxc-info
615 392920660962 0x37 237046 0 0 0x12 31032 lxc-info
616 392920669929 0x37 33090 0 0 0x12 31032 lxc-info
617 392927636889 0x37 33090 0 0 0x12 31034 lxc-info
618 392927665458 0x37 237046 0 0 0x12 31034 lxc-info
619 392927673943 0x37 33090 0 0 0x12 31034 lxc-info
620 392927694872 0x37 182855 0 0 0x12 31020 pvesh
621 392953592946 0x37 5760 0 0 0x12 31041 zpool
622 392958014123 0x37 5760 0 0 0x12 31048 zfs
623 392991699821 0x37 5760 0 0 0x12 31065 zpool
624 392996003780 0x37 5760 0 0 0x12 31066 zfs
626 393002729434 0x37 237046 0 0 0x12 31067 lxc-info
627 393002738382 0x37 33090 0 0 0x12 31067 lxc-info
628 393010662339 0x37 33090 0 0 0x12 31069 lxc-info
629 393010693834 0x37 237046 0 0 0x12 31069 lxc-info
630 393010702636 0x37 33090 0 0 0x12 31069 lxc-info
631 393019014156 0x37 33090 0 0 0x12 31071 lxc-info
632 393019066472 0x37 237046 0 0 0x12 31071 lxc-info
633 393019076155 0x37 33090 0 0 0x12 31071 lxc-info
634 393027479304 0x37 33090 0 0 0x12 31073 lxc-info
635 393027509064 0x37 237046 0 0 0x12 31073 lxc-info
636 393027517637 0x37 33090 0 0 0x12 31073 lxc-info
637 393035757895 0x37 33090 0 0 0x12 31075 lxc-info
638 393035786634 0x37 237046 0 0 0x12 31075 lxc-info
639 393035794307 0x37 33090 0 0 0x12 31075 lxc-info
640 393043154654 0x37 33090 0 0 0x12 31077 lxc-info
641 393043183874 0x37 237046 0 0 0x12 31077 lxc-info
642 393043192723 0x37 33090 0 0 0x12 31077 lxc-info
643 393385578954 0x37 33090 0 0 0x12 31086 lxc-info
644 393385629534 0x37 237046 0 0 0x12 31086 lxc-info
645 393385640858 0x37 33090 0 0 0x12 31086 lxc-info
646 393461811029 0x37 182855 0 0 0x12 31082 pvesh
647 393552308491 0x37 182855 0 0 0x12 31089 pvesh
648 394016534660 0x37 33090 0 0 0x12 31301 lxc-info
649 394016570732 0x37 237046 0 0 0x12 31301 lxc-info
650 394016582094 0x37 33090 0 0 0x12 31301 lxc-info
651 394020419488 0x37 182855 0 0 0x12 31290 pvesh
652 394458880926 0x37 33090 0 0 0x12 31311 lxc-info
653 394458915665 0x37 237046 0 0 0x12 31311 lxc-info
654 394458926509 0x37 33090 0 0 0x12 31311 lxc-info
655 394463769075 0x37 203206 0 0 0x12 31290 pvesh
656 394463795682 0x37 203206 0 0 0x12 31290 pvesh
657 394620198175 0x37 182855 0 0 0x12 31314 pvesh
658 395053336734 0x37 33090 0 0 0x12 31366 lxc-info
660 395053381839 0x37 33090 0 0 0x12 31366 lxc-info
661 396194503791 0x37 182855 0 0 0x12 31444 pvesh
662 396638925014 0x37 33090 0 0 0x12 31458 lxc-info
663 396638958874 0x37 237046 0 0 0x12 31458 lxc-info
664 396638969428 0x37 33090 0 0 0x12 31458 lxc-info
665 396809968049 0x37 182855 0 0 0x12 31461 pvesh
666 397185758215 0x37 5759 0 0 0x12 31477 curl
667 397185780127 0x37 6019 0 0 0x12 31477 curl
668 397185929217 0x37 6019 0 0 0x12 31477 curl
669 397252092377 0x37 33090 0 0 0x12 31479 lxc-info
670 397252125380 0x37 237046 0 0 0x12 31479 lxc-info
671 397252136077 0x37 33090 0 0 0x12 31479 lxc-info
672 397763922935 0x37 5883 0 0 0x12 31484 modprobe
673 397763939615 0x37 5883 0 0 0x12 31484 modprobe
674 397764544866 0x37 197011 0 0 0x12 31484 modprobe
675 397765159979 0x37 5883 0 0 0x12 31485 modprobe
676 397765172697 0x37 5883 0 0 0x12 31485 modprobe
677 397765674210 0x37 197011 0 0 0x12 31485 modprobe
678 397766280019 0x37 5883 0 0 0x12 31486 modprobe
679 397766292380 0x37 5883 0 0 0x12 31486 modprobe
680 397766790874 0x37 197011 0 0 0x12 31486 modprobe
681 397767353642 0x37 5883 0 0 0x12 31487 modprobe
682 397767366316 0x37 5883 0 0 0x12 31487 modprobe
683 397767900398 0x37 197011 0 0 0x12 31487 modprobe
684 397768419832 0x37 5883 0 0 0x12 31488 modprobe
685 397768432157 0x37 5883 0 0 0x12 31488 modprobe
686 397769008619 0x37 197011 0 0 0x12 31488 modprobe
687 398023511967 0x37 182855 0 0 0x12 31492 pvesh
688 398494912162 0x37 33090 0 0 0x12 31514 lxc-info
689 398494943302 0x37 237046 0 0 0x12 31514 lxc-info
690 398494952903 0x37 33090 0 0 0x12 31514 lxc-info
691 398663276662 0x37 182855 0 0 0x12 31517 pvesh
692 399109475680 0x37 182855 0 0 0x12 31540 pvesh
694 399145233218 0x37 237046 0 0 0x12 31647 lxc-info
695 399145242751 0x37 33090 0 0 0x12 31647 lxc-info
696 399545725606 0x37 33090 0 0 0x12 31652 lxc-info
697 399545826071 0x37 237046 0 0 0x12 31652 lxc-info
698 399545838454 0x37 33090 0 0 0x12 31652 lxc-info
699 399551224442 0x37 203206 0 0 0x12 31540 pvesh
700 399551233093 0x37 203206 0 0 0x12 31540 pvesh
701 399701115862 0x37 182855 0 0 0x12 31667 pvesh
702 400162088768 0x37 33090 0 0 0x12 31753 lxc-info
703 400162136009 0x37 237046 0 0 0x12 31753 lxc-info
704 400162145635 0x37 33090 0 0 0x12 31753 lxc-info
705 402083374905 0x37 33090 0 0 0x12 31886 lxc-info
706 402083425933 0x37 237046 0 0 0x12 31886 lxc-info
707 402083435356 0x37 33090 0 0 0x12 31886 lxc-info
708 402090948705 0x37 33090 0 0 0x12 31888 lxc-info
709 402090976854 0x37 237046 0 0 0x12 31888 lxc-info
710 402090985602 0x37 33090 0 0 0x12 31888 lxc-info
711 402100302748 0x37 33090 0 0 0x12 31890 lxc-info
712 402100354253 0x37 237046 0 0 0x12 31890 lxc-info
713 402100368917 0x37 33090 0 0 0x12 31890 lxc-info
714 402108548237 0x37 33090 0 0 0x12 31892 lxc-info
715 402108582375 0x37 237046 0 0 0x12 31892 lxc-info
716 402108592845 0x37 33090 0 0 0x12 31892 lxc-info
717 402116129903 0x37 33090 0 0 0x12 31894 lxc-info
718 402116159277 0x37 237046 0 0 0x12 31894 lxc-info
719 402116167817 0x37 33090 0 0 0x12 31894 lxc-info
720 402123224649 0x37 33090 0 0 0x12 31896 lxc-info
721 402123266844 0x37 237046 0 0 0x12 31896 lxc-info
722 402123275122 0x37 33090 0 0 0x12 31896 lxc-info
723 402143046060 0x37 5760 0 0 0x12 31902 zpool
724 402147539776 0x37 5760 0 0 0x12 31903 zfs
725 402161483196 0x37 5760 0 0 0x12 31907 zpool
726 402166381204 0x37 5760 0 0 0x12 31908 zfs
728 402196276073 0x37 237046 0 0 0x12 31920 lxc-info
729 402196284725 0x37 33090 0 0 0x12 31920 lxc-info
730 402203697886 0x37 33090 0 0 0x12 31922 lxc-info
731 402203726669 0x37 237046 0 0 0x12 31922 lxc-info
732 402203735025 0x37 33090 0 0 0x12 31922 lxc-info
733 402210982310 0x37 33090 0 0 0x12 31924 lxc-info
734 402211008680 0x37 237046 0 0 0x12 31924 lxc-info
735 402211016766 0x37 33090 0 0 0x12 31924 lxc-info
736 402218665352 0x37 33090 0 0 0x12 31926 lxc-info
737 402218749874 0x37 237046 0 0 0x12 31926 lxc-info
738 402218759581 0x37 33090 0 0 0x12 31926 lxc-info
739 402226344987 0x37 33090 0 0 0x12 31928 lxc-info
740 402226373190 0x37 237046 0 0 0x12 31928 lxc-info
741 402226382087 0x37 33090 0 0 0x12 31928 lxc-info
742 402233758101 0x37 33090 0 0 0x12 31930 lxc-info
743 402233792094 0x37 237046 0 0 0x12 31930 lxc-info
744 402233801927 0x37 33090 0 0 0x12 31930 lxc-info
745 404110315459 0x37 182855 0 0 0x12 31959 pvesh
746 404563094638 0x37 33090 0 0 0x12 32075 lxc-info
747 404563127505 0x37 237046 0 0 0x12 32075 lxc-info
748 404563138052 0x37 33090 0 0 0x12 32075 lxc-info
749 406565812513 0x37 182855 0 0 0x12 32214 pvesh
750 406996114050 0x37 33090 0 0 0x12 32225 lxc-info
751 406996147099 0x37 237046 0 0 0x12 32225 lxc-info
752 406996157127 0x37 33090 0 0 0x12 32225 lxc-info
753 407156004627 0x37 182855 0 0 0x12 32228 pvesh
754 407580610461 0x37 33090 0 0 0x12 32235 lxc-info
755 407580643882 0x37 237046 0 0 0x12 32235 lxc-info
756 407580654101 0x37 33090 0 0 0x12 32235 lxc-info
757 408063275810 0x37 182855 0 0 0x12 32242 pvesh
758 408491940820 0x37 33090 0 0 0x12 32251 lxc-info
759 408491972322 0x37 237046 0 0 0x12 32251 lxc-info
760 408491982333 0x37 33090 0 0 0x12 32251 lxc-info
762 410035648749 0x37 33090 0 0 0x12 32550 lxc-info
763 410035683387 0x37 237046 0 0 0x12 32550 lxc-info
764 410035694016 0x37 33090 0 0 0x12 32550 lxc-info
765 410209241770 0x37 182855 0 0 0x12 32559 pvesh
766 410674729152 0x37 33090 0 0 0x12 32620 lxc-info
767 410674761812 0x37 237046 0 0 0x12 32620 lxc-info
768 410674772156 0x37 33090 0 0 0x12 32620 lxc-info
769 411207471662 0x37 139446 0 0 0x12 32629 mysqladmin
770 411207482840 0x37 139446 0 0 0x12 32629 mysqladmin
771 411207502090 0x37 187647 0 0 0x12 32629 mysqladmin
772 411207519962 0x37 187648 0 0 0x12 32629 mysqladmin
773 411207537077 0x37 194119 0 0 0x12 32629 mysqladmin
774 411207543977 0x37 194119 0 0 0x12 32629 mysqladmin
775 412274574843 0x37 33090 0 0 0x12 32644 lxc-info
776 412274611483 0x37 237046 0 0 0x12 32644 lxc-info
777 412274620736 0x37 33090 0 0 0x12 32644 lxc-info
778 412282876902 0x37 33090 0 0 0x12 32646 lxc-info
779 412282928259 0x37 237046 0 0 0x12 32646 lxc-info
780 412282937515 0x37 33090 0 0 0x12 32646 lxc-info
781 412291290224 0x37 33090 0 0 0x12 32648 lxc-info
782 412291322778 0x37 237046 0 0 0x12 32648 lxc-info
783 412291347060 0x37 33090 0 0 0x12 32648 lxc-info
784 412298440076 0x37 33090 0 0 0x12 32650 lxc-info
785 412298465593 0x37 237046 0 0 0x12 32650 lxc-info
786 412298473114 0x37 33090 0 0 0x12 32650 lxc-info
787 412305593990 0x37 33090 0 0 0x12 32652 lxc-info
788 412305652469 0x37 237046 0 0 0x12 32652 lxc-info
789 412305661007 0x37 33090 0 0 0x12 32652 lxc-info
790 412312858824 0x37 33090 0 0 0x12 32654 lxc-info
791 412312887206 0x37 237046 0 0 0x12 32654 lxc-info
792 412312895641 0x37 33090 0 0 0x12 32654 lxc-info
793 412324108698 0x37 5760 0 0 0x12 32656 zpool
794 412328982887 0x37 5760 0 0 0x12 32657 zfs
796 412363929826 0x37 5760 0 0 0x12 32670 zfs
797 412386082736 0x37 33090 0 0 0x12 32676 lxc-info
798 412386116648 0x37 237046 0 0 0x12 32676 lxc-info
799 412386125216 0x37 33090 0 0 0x12 32676 lxc-info
800 412394026806 0x37 33090 0 0 0x12 32678 lxc-info
801 412394058436 0x37 237046 0 0 0x12 32678 lxc-info
802 412394066958 0x37 33090 0 0 0x12 32678 lxc-info
803 412402470876 0x37 33090 0 0 0x12 32680 lxc-info
804 412402503387 0x37 237046 0 0 0x12 32680 lxc-info
805 412402512441 0x37 33090 0 0 0x12 32680 lxc-info
806 412409750160 0x37 33090 0 0 0x12 32682 lxc-info
807 412409777332 0x37 237046 0 0 0x12 32682 lxc-info
808 412409785483 0x37 33090 0 0 0x12 32682 lxc-info
809 412416928219 0x37 33090 0 0 0x12 32684 lxc-info
810 412416958049 0x37 237046 0 0 0x12 32684 lxc-info
811 412416967291 0x37 33090 0 0 0x12 32684 lxc-info
812 412424239951 0x37 33090 0 0 0x12 32686 lxc-info
813 412424268671 0x37 237046 0 0 0x12 32686 lxc-info
814 412424277304 0x37 33090 0 0 0x12 32686 lxc-info
815 414521519827 0x37 182855 0 0 0x12 430 pvesh
816 414958523769 0x37 33090 0 0 0x12 458 lxc-info
817 414958555211 0x37 237046 0 0 0x12 458 lxc-info
818 414958564632 0x37 33090 0 0 0x12 458 lxc-info
819 416325456124 0x37 182855 0 0 0x12 637 pvesh
820 416914501002 0x37 182855 0 0 0x12 645 pvesh
just a simple report back,
upgrade to 0.7.6 and have system run for 24hrs,
z_null_int no longer show up in my iotop, and the ARC hit rate also fixed on my case.
a simple report back as well (even if I didn't participate in the conversation before).
I still see z_null_int with 99.99% in iotop using
v0.7.6-1, ZFS pool version 5000, ZFS filesystem version 5
After upgrade to 0.7.6-1 on Proxmox (Instead of patched 0.7.4) on Friday I can say that I also still see z_null_int in iotop but after two nights full of backups so far I can say that problem with slowdowns and freezing didn't occur (unlike every backup in past 3 months) so I will keep continue testing through this week and report back if I hit any performance problems.
Cherry-picking e9a77290081b578144e9a911ad734f67274df82f may yeild further improvements depending on your exact kernel version. This one is also queued up for 0.7.7 (#7238).
for the PVE users in this issue: https://github.com/zfsonlinux/zfs/issues/7235#issuecomment-368531631 contains information about a test build of the PVE kernel with commit e9a7729 / PR#7170 backported
I ran into this problem also with kernel 4.13.0-37 , ubuntu 16.04 with zfs master using dkms modules.
I have kernel 4.4.0-97 also install.
Booting to 4.4.0-97 with the same version of zfs the problem went away.
Now in iotop the processes are
txg_sync , l2arc_feed which bounce around like expected, they weren't ever listed with the 4.13 kernel
Biggest difference is z_null_int doesn't even show up in iotop at all.
z_null_iss, z_wr_iss , z_wr_int_0-6 weren't showing up either with this issue.
I can switch back and forth easily and will try to upload screenshots from iotop later this evening.
Update problem started with 4.11 Kernels. , works fine with 4.10 and below.
@benbennett thanks for the additional information, that helps. To be clear you were running master with commit e9a7729 and still observe the issue with 4.11 and newer kernels?
Could you summerize a minimal environment and pool config needed to reproduce the issue. Is a clean Ubuntu 16.04 install with a 4.11 kernel from the mainline ppa sufficient?
@behlendorf I am master at cec3a0a1bb08574260a9e39301a24ce79dbfb69a , which contains the commit. Wondering if any of the meltdown/spectre patches are affecting it, but disabling them with nopti noretpoline option has no effect. I will have to check if 4.10 has the kpti/spectre_v2 changes and the intel firmware install.
Here is the kernels I tested on, all are just installed from xenial-updates xenial-security repos.
You have to call install them manually, I just used synaptic and picked the highest version of each mainline kernel. I had to tell it to install the headers also for some reason.
The pool is raidz2 (4 sas drives, 2 sata 2TB each, with samsung ssd acting as the log and cache(2 of them). Only odd thing I have is the two ssds are in mdadm raid0 array, partitioned into a cache and log device.
Oh it is clean 16.04 install .
/boot/initrd.img-4.10.0-42-lowlatency /boot/initrd.img-4.13.0-37-lowlatency
/boot/initrd.img-4.11.0-14-lowlatency
linux-image-4.11.0-14-lowlatency:
500 http://us.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
500 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages
Kernel parms
options zfs zfs_prefetch_disable=1
options zfs l2arc_feed_again=1
options zfs l2arc_feed_secs=1
options zfs l2arc_headroom=2
options zfs l2arc_headroom_boost=200
options zfs l2arc_noprefetch=0
options zfs l2arc_norw=0
options zfs l2arc_write_boost=600100100
options zfs l2arc_write_max=1100100100
options zfs zfs_arc_max=18100100100
options zfs zfs_arc_min=10100100100
options zfs zfs_dirty_data_max_max_percent=35
options zfs zfs_dirty_data_max_percent=30
options zfs zfs_immediate_write_sz=10100100100
options zfs zfs_per_txg_dirty_frees_percent=40
options zfs zfs_vdev_aggregation_limit=16777216
options zfs zfs_vdev_async_read_max_active=12
options zfs zfs_vdev_async_read_min_active=5
options zfs zfs_vdev_async_write_active_max_dirty_percent=70
options zfs zfs_vdev_async_write_active_min_dirty_percent=20
options zfs zfs_vdev_async_write_max_active=16
options zfs zfs_vdev_async_write_min_active=5
options zfs zfs_vdev_scheduler=deadline
options zfs zfs_vdev_scrub_max_active=2
options zfs zfs_vdev_scrub_min_active=1
options zfs zfs_vdev_sync_read_max_active=16
options zfs zfs_vdev_sync_read_min_active=10
options zfs zfs_vdev_sync_write_max_active=16
options zfs zfs_vdev_sync_write_min_active=10
options zfs zio_dva_throttle_enabled=0
options zfs zvol_threads=32
options zfs zfs_nocacheflush=1
The issue got solved for me with:
root@vmx02:~# uname -a
Linux vmx02 4.13.16-2-pve #1 SMP PVE 4.13.16-47 (Mon, 9 Apr 2018 09:58:12 +0200) x86_64 GNU/Linux
root@vmx02:~# apt-cache policy zfs-initramfs
zfs-initramfs:
Installed: 0.7.7-pve1~bpo9
Candidate: 0.7.7-pve1~bpo9
Version table:
*** 0.7.7-pve1~bpo9 500
500 http://download.proxmox.com/debian/pve stretch/pve-no-subscription amd64 Packages
100 /var/lib/dpkg/status
I cannot say whether it was the kernel update or the zfs update :/
A further data point from the proxmox forums (https://forum.proxmox.com/threads/z_null_int-with-99-99-io-load-after-5-1-upgrade.38136/page-3#post-207766):
I just want to add that I needed upgrade to 4.13.16-2-pve for the 0.7.7 fix to work.
Pure ZOL upgrade didn't work until after kernel upgrade to the latest.
Even relatively new 4.13.13-6-pve kernel didn't work.
So maybe the kernel got some relevant fixes too?
On Sun, Apr 29, 2018 at 11:57:34PM -0700, Florian Apolloner wrote:
A further data point from the proxmox forums (https://forum.proxmox.com/threads/z_null_int-with-99-99-io-load-after-5-1-upgrade.38136/page-3#post-207766):
I just want to add that I needed upgrade to 4.13.16-2-pve for the 0.7.7 fix to work. Pure ZOL upgrade didn't work until after kernel upgrade to the latest. Even relatively new 4.13.13-6-pve kernel didn't work.So maybe the kernel got some relevant fixes too?
in PVE the kernel contains the SPL and ZFS modules, so yes, of course
the kernel package version is relevant too ;)
Oh, with all the zfs packages on PVE I got a little bit confused; my mistake.
Why this issue is opened ? Is 0.7.9 affected in any way?
The primary cause of this issue was determined to be #7289 which would primarily manifest itself on systems with a low default kernel HZ values. The issue was resolved in 0.7.7. I'm closing this bug as resolved.
Most helpful comment
The issue got solved for me with:
I cannot say whether it was the kernel update or the zfs update :/