Today I've received that mail:
Von: [email protected] (root)
An: [email protected]
Betreff: ZFS io error for tanka on xxx.yyy.com
Datum: Thu, 14 Jul 2016 07:09:43 +0200
ZFS has detected an io error:
eid: 12
class: io
host: xxx.yyy.com
time: 2016-07-14 07:09:42+0200
vtype: disk
vpath: /dev/disk/by-id/ata-ST3000DM001-9YN166_S1F0F544-part1
vguid: 0xA28F73CE25C89BFD
cksum: 0
read: 0
write: 0
pool: tanka
But the pool is clear:
# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tanka 24,5T 2,94T 21,6T - 5% 12% 1.19x ONLINE -
# zpool status
pool: tanka
state: ONLINE
scan: resilvered 498G in 52h55m with 0 errors on Sat Jul 9 02:13:26 2016
config:
NAME STATE READ WRITE CKSUM
tanka ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
ata-TOSHIBA_DT01ACA300_Z37G820GS ONLINE 0 0 0
ata-WDC_WD30EZRZ-00Z5HB0_WD-WCC4N2TFU67F ONLINE 0 0 0
ata-ST3000DM001-9YN166_Z1F0NBDP ONLINE 0 0 0
ata-ST3000DM001-9YN166_S1F0F544 ONLINE 0 0 0
ata-WDC_WD30EZRZ-00Z5HB0_WD-WCC4N1LET275 ONLINE 0 0 0
ata-TOSHIBA_DT01ACA300_Z2O51H2GS ONLINE 0 0 0
ata-TOSHIBA_DT01ACA300_Z5QS77VKS ONLINE 0 0 0
ata-TOSHIBA_DT01ACA300_56U7SHTAS ONLINE 0 0 0
ata-TOSHIBA_DT01ACA300_Z2O51RZGS ONLINE 0 0 0
errors: No known data errors
Jul 14 2016 07:09:42.874641339 ereport.fs.zfs.io
class = "ereport.fs.zfs.io"
ena = 0x9302b36c500001
detector = (embedded nvlist)
version = 0x0
scheme = "zfs"
pool = 0xd2912d8866b767d
vdev = 0xa28f73ce25c89bfd
(end detector)
pool = "tanka"
pool_guid = 0xd2912d8866b767d
pool_context = 0x0
pool_failmode = "wait"
vdev_guid = 0xa28f73ce25c89bfd
vdev_type = "disk"
vdev_path = "/dev/disk/by-id/ata-ST3000DM001-9YN166_S1F0F544-part1"
vdev_ashift = 0xc
vdev_complete_ts = 0x7009302b256d
vdev_delta_ts = 0x44489
vdev_read_errors = 0x0
vdev_write_errors = 0x0
vdev_cksum_errors = 0x0
parent_guid = 0x27b370226659d459
parent_type = "raidz"
vdev_spare_paths =
vdev_spare_guids =
zio_err = 0x5
zio_flags = 0xc0880
zio_stage = 0x200000
zio_pipeline = 0x210000
zio_delay = 0x9f
zio_timestamp = 0x70090d226539
zio_delta = 0x1f8a9a49
zio_offset = 0x9d78c98000
zio_size = 0x1800
zio_objset = 0x1459
zio_object = 0xa
zio_level = 0x0
zio_blkid = 0x5882
time = 0x57871e96 0x3421f7bb
eid = 0xc
Jul 14 2016 07:09:42.874641339 ereport.fs.zfs.io
class = "ereport.fs.zfs.io"
ena = 0x9302baa9c00001
detector = (embedded nvlist)
version = 0x0
scheme = "zfs"
pool = 0xd2912d8866b767d
vdev = 0xa28f73ce25c89bfd
(end detector)
pool = "tanka"
pool_guid = 0xd2912d8866b767d
pool_context = 0x0
pool_failmode = "wait"
vdev_guid = 0xa28f73ce25c89bfd
vdev_type = "disk"
vdev_path = "/dev/disk/by-id/ata-ST3000DM001-9YN166_S1F0F544-part1"
vdev_ashift = 0xc
vdev_complete_ts = 0x7009302b256d
vdev_delta_ts = 0x44489
vdev_read_errors = 0x0
vdev_write_errors = 0x0
vdev_cksum_errors = 0x0
parent_guid = 0x27b370226659d459
parent_type = "raidz"
vdev_spare_paths =
vdev_spare_guids =
zio_err = 0x5
zio_flags = 0xc0880
zio_stage = 0x200000
zio_pipeline = 0x210000
zio_delay = 0x9f
zio_timestamp = 0x70090d1ab417
zio_delta = 0x1f9126a1
zio_offset = 0x9b00fec000
zio_size = 0x200
zio_objset = 0x0
zio_object = 0x71
zio_level = 0x0
zio_blkid = 0x5bb4
time = 0x57871e96 0x3421f7bb
eid = 0xd
Currently the disks are configured to spin down, so zfs have to wait sometimes some seconds when no read/write had been done for minutes, but this should not be a "I/O"-Error
Thats my dmesg:
[62997.726191] INFO: task zfs:16808 blocked for more than 120 seconds.
[62997.726259] Tainted: P O 4.6.3-1-ARCH #1
[62997.726309] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[62997.726379] zfs D ffff88083ff439f0 0 16808 9252 0x00000000
[62997.726388] ffff88083ff439f0 00ff880843fe7224 ffff88084ff54c40 ffff88083ebb8000
[62997.726394] ffff88083ff44000 ffff880843fe7220 ffff880843fe72e0 ffff880843fe7248
[62997.726400] 0000000000000000 ffff88083ff43a08 ffffffff815c372c ffff880843fe72d8
[62997.726406] Call Trace:
[62997.726418] [<ffffffff815c372c>] schedule+0x3c/0x90
[62997.726430] [<ffffffffa0135e29>] cv_wait_common+0x109/0x140 [spl]
[62997.726437] [<ffffffff810bd490>] ? wake_atomic_t_function+0x60/0x60
[62997.726445] [<ffffffffa0135e75>] __cv_wait+0x15/0x20 [spl]
[62997.726493] [<ffffffffa02e39b5>] txg_wait_synced+0xe5/0x130 [zfs]
[62997.726537] [<ffffffffa02bf194>] dsl_sync_task+0x184/0x270 [zfs]
[62997.726569] [<ffffffffa0299cc0>] ? restore_write+0x1d0/0x1d0 [zfs]
[62997.726599] [<ffffffffa029a0b0>] ? dmu_recv_end_sync+0x3f0/0x3f0 [zfs]
[62997.726627] [<ffffffffa029a0b0>] ? dmu_recv_end_sync+0x3f0/0x3f0 [zfs]
[62997.726654] [<ffffffffa0299cc0>] ? restore_write+0x1d0/0x1d0 [zfs]
[62997.726682] [<ffffffffa029c103>] dmu_recv_end+0x93/0x1b0 [zfs]
[62997.726719] [<ffffffffa030d614>] zfs_ioc_recv+0x4b4/0xbf0 [zfs]
[62997.726728] [<ffffffffa01370c9>] ? tsd_set+0x199/0x4f0 [spl]
[62997.726773] [<ffffffffa02c8768>] ? rrw_exit+0x68/0x150 [zfs]
[62997.726816] [<ffffffffa02df0ae>] ? spa_close+0xe/0x10 [zfs]
[62997.726856] [<ffffffffa02b8399>] ? dsl_pool_rele+0x29/0x30 [zfs]
[62997.726863] [<ffffffff811d419e>] ? __kmalloc+0x2e/0x230
[62997.726870] [<ffffffffa012eb75>] ? strdup+0x45/0x70 [spl]
[62997.726907] [<ffffffffa030c290>] zfsdev_ioctl+0x460/0x4f0 [zfs]
[62997.726914] [<ffffffff811f634b>] ? __vfs_read+0xcb/0x100
[62997.726920] [<ffffffff81209be3>] do_vfs_ioctl+0xa3/0x5d0
[62997.726927] [<ffffffff811f7176>] ? vfs_read+0x86/0x130
[62997.726932] [<ffffffff8120a189>] SyS_ioctl+0x79/0x90
[62997.726939] [<ffffffff815c71b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[62997.726945] INFO: task zfs:18831 blocked for more than 120 seconds.
[62997.727003] Tainted: P O 4.6.3-1-ARCH #1
[62997.727052] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[62997.727120] zfs D ffff8807aa32bbc0 0 18831 16007 0x00000000
[62997.727126] ffff8807aa32bbc0 0000000000000000 ffffffff8180d500 ffff88083ed82dc0
[62997.727132] ffff8807aa32c000 ffff880843fe7220 ffff880843fe72e0 ffff880843fe7248
[62997.727138] 0000000000000000 ffff8807aa32bbd8 ffffffff815c372c ffff880843fe72d8
[62997.727143] Call Trace:
[62997.727149] [<ffffffff815c372c>] schedule+0x3c/0x90
[62997.727157] [<ffffffffa0135e29>] cv_wait_common+0x109/0x140 [spl]
[62997.727162] [<ffffffff810bd490>] ? wake_atomic_t_function+0x60/0x60
[62997.727170] [<ffffffffa0135e75>] __cv_wait+0x15/0x20 [spl]
[62997.727213] [<ffffffffa02e39b5>] txg_wait_synced+0xe5/0x130 [zfs]
[62997.727254] [<ffffffffa02bf194>] dsl_sync_task+0x184/0x270 [zfs]
[62997.727287] [<ffffffffa033c1f0>] ? dsl_destroy_snapshot_sync_impl+0xb60/0xb60 [zfs]
[62997.727320] [<ffffffffa033d040>] ? dsl_destroy_snapshot_check_impl+0x90/0x90 [zfs]
[62997.727353] [<ffffffffa033d040>] ? dsl_destroy_snapshot_check_impl+0x90/0x90 [zfs]
[62997.727386] [<ffffffffa033c1f0>] ? dsl_destroy_snapshot_sync_impl+0xb60/0xb60 [zfs]
[62997.727418] [<ffffffffa033d211>] dsl_destroy_snapshots_nvl+0x91/0x100 [zfs]
[62997.727451] [<ffffffffa033d2bf>] dsl_destroy_snapshot+0x3f/0x60 [zfs]
[62997.727486] [<ffffffffa030e547>] zfs_ioc_destroy+0x37/0x50 [zfs]
[62997.727519] [<ffffffffa030c290>] zfsdev_ioctl+0x460/0x4f0 [zfs]
[62997.727525] [<ffffffff81209be3>] do_vfs_ioctl+0xa3/0x5d0
[62997.727530] [<ffffffff815c2fb9>] ? __schedule+0x399/0xad0
[62997.727536] [<ffffffff8120a189>] SyS_ioctl+0x79/0x90
[62997.727542] [<ffffffff815c71b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[63837.687235] INFO: task txg_sync:1618 blocked for more than 120 seconds.
[63837.687306] Tainted: P O 4.6.3-1-ARCH #1
[63837.687356] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[63837.687425] txg_sync D ffff880841e97aa8 0 1618 2 0x00000000
[63837.687434] ffff880841e97aa8 00ffffff810a3c2d ffff88084ff52dc0 ffff8808420a1e80
[63837.687441] ffff880841e98000 ffff880873c95940 7fffffffffffffff ffff88050ed05f30
[63837.687446] 0000000000000001 ffff880841e97ac0 ffffffff815c372c 0000000000000000
[63837.687452] Call Trace:
[63837.687464] [<ffffffff815c372c>] schedule+0x3c/0x90
[63837.687471] [<ffffffff815c6203>] schedule_timeout+0x1d3/0x260
[63837.687479] [<ffffffff810a4ad2>] ? default_wake_function+0x12/0x20
[63837.687485] [<ffffffff810bcddd>] ? __wake_up_common+0x4d/0x80
[63837.687490] [<ffffffff810eea31>] ? ktime_get+0x41/0xb0
[63837.687495] [<ffffffff815c2bb4>] io_schedule_timeout+0xa4/0x110
[63837.687506] [<ffffffffa0135dda>] cv_wait_common+0xba/0x140 [spl]
[63837.687511] [<ffffffff810bd490>] ? wake_atomic_t_function+0x60/0x60
[63837.687520] [<ffffffffa0135eb8>] __cv_wait_io+0x18/0x20 [spl]
[63837.687555] [<ffffffffa032dd5e>] zio_wait+0x10e/0x1f0 [zfs]
[63837.687596] [<ffffffffa02b7148>] dsl_pool_sync+0xb8/0x480 [zfs]
[63837.687643] [<ffffffffa02d2c2f>] spa_sync+0x37f/0xb40 [zfs]
[63837.687650] [<ffffffff810a4ad2>] ? default_wake_function+0x12/0x20
[63837.687694] [<ffffffffa02e4208>] txg_sync_thread+0x3b8/0x630 [zfs]
[63837.687731] [<ffffffffa02e3e50>] ? txg_delay+0x180/0x180 [zfs]
[63837.687739] [<ffffffffa0130ee1>] thread_generic_wrapper+0x71/0x80 [spl]
[63837.687746] [<ffffffffa0130e70>] ? __thread_exit+0x20/0x20 [spl]
[63837.687753] [<ffffffff81099998>] kthread+0xd8/0xf0
[63837.687760] [<ffffffff815c73c2>] ret_from_fork+0x22/0x40
[63837.687766] [<ffffffff810998c0>] ? kthread_worker_fn+0x170/0x170
[63837.687781] INFO: task zfs:16808 blocked for more than 120 seconds.
[63837.687838] Tainted: P O 4.6.3-1-ARCH #1
[63837.687887] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[63837.687955] zfs D ffff88083ff439f0 0 16808 9252 0x00000000
[63837.687961] ffff88083ff439f0 00ff88083ff439d0 ffff88084ff54c40 ffff88083ebb8000
[63837.687967] ffff88083ff44000 ffff880843fe7220 ffff880843fe72e0 ffff880843fe7248
[63837.687973] 0000000000000000 ffff88083ff43a08 ffffffff815c372c ffff880843fe72d8
[63837.687978] Call Trace:
[63837.687984] [<ffffffff815c372c>] schedule+0x3c/0x90
[63837.687992] [<ffffffffa0135e29>] cv_wait_common+0x109/0x140 [spl]
[63837.687998] [<ffffffff810bd490>] ? wake_atomic_t_function+0x60/0x60
[63837.688006] [<ffffffffa0135e75>] __cv_wait+0x15/0x20 [spl]
[63837.688042] [<ffffffffa02e39b5>] txg_wait_synced+0xe5/0x130 [zfs]
[63837.688084] [<ffffffffa02bf194>] dsl_sync_task+0x184/0x270 [zfs]
[63837.688116] [<ffffffffa0299cc0>] ? restore_write+0x1d0/0x1d0 [zfs]
[63837.688147] [<ffffffffa029a0b0>] ? dmu_recv_end_sync+0x3f0/0x3f0 [zfs]
[63837.688175] [<ffffffffa029a0b0>] ? dmu_recv_end_sync+0x3f0/0x3f0 [zfs]
[63837.688203] [<ffffffffa0299cc0>] ? restore_write+0x1d0/0x1d0 [zfs]
[63837.688231] [<ffffffffa029c103>] dmu_recv_end+0x93/0x1b0 [zfs]
[63837.688268] [<ffffffffa030d614>] zfs_ioc_recv+0x4b4/0xbf0 [zfs]
[63837.688276] [<ffffffffa01370c9>] ? tsd_set+0x199/0x4f0 [spl]
[63837.688321] [<ffffffffa02c8768>] ? rrw_exit+0x68/0x150 [zfs]
[63837.688366] [<ffffffffa02df0ae>] ? spa_close+0xe/0x10 [zfs]
[63837.688406] [<ffffffffa02b8399>] ? dsl_pool_rele+0x29/0x30 [zfs]
[63837.688413] [<ffffffff811d419e>] ? __kmalloc+0x2e/0x230
[63837.688419] [<ffffffffa012eb75>] ? strdup+0x45/0x70 [spl]
[63837.688456] [<ffffffffa030c290>] zfsdev_ioctl+0x460/0x4f0 [zfs]
[63837.688463] [<ffffffff811f634b>] ? __vfs_read+0xcb/0x100
[63837.688469] [<ffffffff81209be3>] do_vfs_ioctl+0xa3/0x5d0
[63837.688476] [<ffffffff811f7176>] ? vfs_read+0x86/0x130
[63837.688481] [<ffffffff8120a189>] SyS_ioctl+0x79/0x90
[63837.688488] [<ffffffff815c71b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[63837.688493] INFO: task zfs:18831 blocked for more than 120 seconds.
[63837.688549] Tainted: P O 4.6.3-1-ARCH #1
[63837.688598] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[63837.688665] zfs D ffff8807aa32bbc0 0 18831 16007 0x00000000
[63837.688671] ffff8807aa32bbc0 0000000000000000 ffff88084ff53d00 ffff88083ed82dc0
[63837.688677] ffff8807aa32c000 ffff880843fe7220 ffff880843fe72e0 ffff880843fe7248
[63837.688683] 0000000000000000 ffff8807aa32bbd8 ffffffff815c372c ffff880843fe72d8
[63837.688688] Call Trace:
[63837.688693] [<ffffffff815c372c>] schedule+0x3c/0x90
[63837.688701] [<ffffffffa0135e29>] cv_wait_common+0x109/0x140 [spl]
[63837.688706] [<ffffffff810bd490>] ? wake_atomic_t_function+0x60/0x60
[63837.688714] [<ffffffffa0135e75>] __cv_wait+0x15/0x20 [spl]
[63837.688757] [<ffffffffa02e39b5>] txg_wait_synced+0xe5/0x130 [zfs]
[63837.688797] [<ffffffffa02bf194>] dsl_sync_task+0x184/0x270 [zfs]
[63837.688830] [<ffffffffa033c1f0>] ? dsl_destroy_snapshot_sync_impl+0xb60/0xb60 [zfs]
[63837.688863] [<ffffffffa033d040>] ? dsl_destroy_snapshot_check_impl+0x90/0x90 [zfs]
[63837.688896] [<ffffffffa033d040>] ? dsl_destroy_snapshot_check_impl+0x90/0x90 [zfs]
[63837.688928] [<ffffffffa033c1f0>] ? dsl_destroy_snapshot_sync_impl+0xb60/0xb60 [zfs]
[63837.688961] [<ffffffffa033d211>] dsl_destroy_snapshots_nvl+0x91/0x100 [zfs]
[63837.688994] [<ffffffffa033d2bf>] dsl_destroy_snapshot+0x3f/0x60 [zfs]
[63837.689030] [<ffffffffa030e547>] zfs_ioc_destroy+0x37/0x50 [zfs]
[63837.689062] [<ffffffffa030c290>] zfsdev_ioctl+0x460/0x4f0 [zfs]
[63837.689069] [<ffffffff81209be3>] do_vfs_ioctl+0xa3/0x5d0
[63837.689073] [<ffffffff815c2fb9>] ? __schedule+0x399/0xad0
[63837.689079] [<ffffffff8120a189>] SyS_ioctl+0x79/0x90
[63837.689086] [<ffffffff815c71b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[64077.676215] INFO: task txg_sync:1618 blocked for more than 120 seconds.
[64077.676286] Tainted: P O 4.6.3-1-ARCH #1
[64077.676336] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[64077.676405] txg_sync D ffff880841e97aa8 0 1618 2 0x00000000
[64077.676415] ffff880841e97aa8 00ffffff810a3c2d ffff88084ff53d00 ffff8808420a1e80
[64077.676421] ffff880841e98000 ffff880873d15940 7fffffffffffffff ffff8800763a87b0
[64077.676427] 0000000000000001 ffff880841e97ac0 ffffffff815c372c 0000000000000000
[64077.676433] Call Trace:
[64077.676445] [<ffffffff815c372c>] schedule+0x3c/0x90
[64077.676451] [<ffffffff815c6203>] schedule_timeout+0x1d3/0x260
[64077.676460] [<ffffffff810a4ad2>] ? default_wake_function+0x12/0x20
[64077.676465] [<ffffffff810bcddd>] ? __wake_up_common+0x4d/0x80
[64077.676470] [<ffffffff810eea31>] ? ktime_get+0x41/0xb0
[64077.676476] [<ffffffff815c2bb4>] io_schedule_timeout+0xa4/0x110
[64077.676487] [<ffffffffa0135dda>] cv_wait_common+0xba/0x140 [spl]
[64077.676492] [<ffffffff810bd490>] ? wake_atomic_t_function+0x60/0x60
[64077.676500] [<ffffffffa0135eb8>] __cv_wait_io+0x18/0x20 [spl]
[64077.676535] [<ffffffffa032dd5e>] zio_wait+0x10e/0x1f0 [zfs]
[64077.676575] [<ffffffffa02b7148>] dsl_pool_sync+0xb8/0x480 [zfs]
[64077.676621] [<ffffffffa02d2c2f>] spa_sync+0x37f/0xb40 [zfs]
[64077.676628] [<ffffffff810a4ad2>] ? default_wake_function+0x12/0x20
[64077.676672] [<ffffffffa02e4208>] txg_sync_thread+0x3b8/0x630 [zfs]
[64077.676709] [<ffffffffa02e3e50>] ? txg_delay+0x180/0x180 [zfs]
[64077.676718] [<ffffffffa0130ee1>] thread_generic_wrapper+0x71/0x80 [spl]
[64077.676725] [<ffffffffa0130e70>] ? __thread_exit+0x20/0x20 [spl]
[64077.676732] [<ffffffff81099998>] kthread+0xd8/0xf0
[64077.676739] [<ffffffff815c73c2>] ret_from_fork+0x22/0x40
[64077.676745] [<ffffffff810998c0>] ? kthread_worker_fn+0x170/0x170
[64077.676760] INFO: task zfs:16808 blocked for more than 120 seconds.
[64077.676816] Tainted: P O 4.6.3-1-ARCH #1
[64077.676865] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[64077.676933] zfs D ffff88083ff439f0 0 16808 9252 0x00000000
[64077.676939] ffff88083ff439f0 00ff88083ff439d0 ffffffff8180d500 ffff88083ebb8000
[64077.676945] ffff88083ff44000 ffff880843fe7220 ffff880843fe72e0 ffff880843fe7248
[64077.676951] 0000000000000000 ffff88083ff43a08 ffffffff815c372c ffff880843fe72d8
[64077.676956] Call Trace:
[64077.676962] [<ffffffff815c372c>] schedule+0x3c/0x90
[64077.676970] [<ffffffffa0135e29>] cv_wait_common+0x109/0x140 [spl]
[64077.676975] [<ffffffff810bd490>] ? wake_atomic_t_function+0x60/0x60
[64077.676983] [<ffffffffa0135e75>] __cv_wait+0x15/0x20 [spl]
[64077.677020] [<ffffffffa02e39b5>] txg_wait_synced+0xe5/0x130 [zfs]
[64077.677062] [<ffffffffa02bf194>] dsl_sync_task+0x184/0x270 [zfs]
[64077.677095] [<ffffffffa0299cc0>] ? restore_write+0x1d0/0x1d0 [zfs]
[64077.677125] [<ffffffffa029a0b0>] ? dmu_recv_end_sync+0x3f0/0x3f0 [zfs]
[64077.677154] [<ffffffffa029a0b0>] ? dmu_recv_end_sync+0x3f0/0x3f0 [zfs]
[64077.677181] [<ffffffffa0299cc0>] ? restore_write+0x1d0/0x1d0 [zfs]
[64077.677210] [<ffffffffa029c103>] dmu_recv_end+0x93/0x1b0 [zfs]
[64077.677247] [<ffffffffa030d614>] zfs_ioc_recv+0x4b4/0xbf0 [zfs]
[64077.677255] [<ffffffffa01370c9>] ? tsd_set+0x199/0x4f0 [spl]
[64077.677300] [<ffffffffa02c8768>] ? rrw_exit+0x68/0x150 [zfs]
[64077.677343] [<ffffffffa02df0ae>] ? spa_close+0xe/0x10 [zfs]
[64077.677382] [<ffffffffa02b8399>] ? dsl_pool_rele+0x29/0x30 [zfs]
[64077.677390] [<ffffffff811d419e>] ? __kmalloc+0x2e/0x230
[64077.677396] [<ffffffffa012eb75>] ? strdup+0x45/0x70 [spl]
[64077.677433] [<ffffffffa030c290>] zfsdev_ioctl+0x460/0x4f0 [zfs]
[64077.677440] [<ffffffff811f634b>] ? __vfs_read+0xcb/0x100
[64077.677446] [<ffffffff81209be3>] do_vfs_ioctl+0xa3/0x5d0
[64077.677453] [<ffffffff811f7176>] ? vfs_read+0x86/0x130
[64077.677458] [<ffffffff8120a189>] SyS_ioctl+0x79/0x90
[64077.677465] [<ffffffff815c71b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[64077.677471] INFO: task zfs:18831 blocked for more than 120 seconds.
[64077.677527] Tainted: P O 4.6.3-1-ARCH #1
[64077.677576] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[64077.677643] zfs D ffff8807aa32bbc0 0 18831 16007 0x00000000
[64077.677649] ffff8807aa32bbc0 0000000000000000 ffff88084ff54c40 ffff88083ed82dc0
[64077.677655] ffff8807aa32c000 ffff880843fe7220 ffff880843fe72e0 ffff880843fe7248
[64077.677661] 0000000000000000 ffff8807aa32bbd8 ffffffff815c372c ffff880843fe72d8
[64077.677666] Call Trace:
[64077.677671] [<ffffffff815c372c>] schedule+0x3c/0x90
[64077.677679] [<ffffffffa0135e29>] cv_wait_common+0x109/0x140 [spl]
[64077.677684] [<ffffffff810bd490>] ? wake_atomic_t_function+0x60/0x60
[64077.677692] [<ffffffffa0135e75>] __cv_wait+0x15/0x20 [spl]
[64077.677734] [<ffffffffa02e39b5>] txg_wait_synced+0xe5/0x130 [zfs]
[64077.677776] [<ffffffffa02bf194>] dsl_sync_task+0x184/0x270 [zfs]
[64077.677809] [<ffffffffa033c1f0>] ? dsl_destroy_snapshot_sync_impl+0xb60/0xb60 [zfs]
[64077.677842] [<ffffffffa033d040>] ? dsl_destroy_snapshot_check_impl+0x90/0x90 [zfs]
[64077.677875] [<ffffffffa033d040>] ? dsl_destroy_snapshot_check_impl+0x90/0x90 [zfs]
[64077.677908] [<ffffffffa033c1f0>] ? dsl_destroy_snapshot_sync_impl+0xb60/0xb60 [zfs]
[64077.677941] [<ffffffffa033d211>] dsl_destroy_snapshots_nvl+0x91/0x100 [zfs]
[64077.677974] [<ffffffffa033d2bf>] dsl_destroy_snapshot+0x3f/0x60 [zfs]
[64077.678009] [<ffffffffa030e547>] zfs_ioc_destroy+0x37/0x50 [zfs]
[64077.678042] [<ffffffffa030c290>] zfsdev_ioctl+0x460/0x4f0 [zfs]
[64077.678048] [<ffffffff81209be3>] do_vfs_ioctl+0xa3/0x5d0
[64077.678053] [<ffffffff815c2fb9>] ? __schedule+0x399/0xad0
[64077.678059] [<ffffffff8120a189>] SyS_ioctl+0x79/0x90
[64077.678065] [<ffffffff815c71b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[64677.649034] INFO: task zfs:16808 blocked for more than 120 seconds.
[64677.649091] Tainted: P O 4.6.3-1-ARCH #1
[64677.649133] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[64677.649194] zfs D ffff88083ff439f0 0 16808 9252 0x00000000
[64677.649201] ffff88083ff439f0 00ff88083ff439d0 ffff88084ff52dc0 ffff88083ebb8000
[64677.649205] ffff88083ff44000 ffff880843fe7220 ffff880843fe72e0 ffff880843fe7248
[64677.649208] 0000000000000000 ffff88083ff43a08 ffffffff815c372c ffff880843fe72d8
[64677.649212] Call Trace:
[64677.649220] [<ffffffff815c372c>] schedule+0x3c/0x90
[64677.649228] [<ffffffffa0135e29>] cv_wait_common+0x109/0x140 [spl]
[64677.649232] [<ffffffff810bd490>] ? wake_atomic_t_function+0x60/0x60
[64677.649238] [<ffffffffa0135e75>] __cv_wait+0x15/0x20 [spl]
[64677.649268] [<ffffffffa02e39b5>] txg_wait_synced+0xe5/0x130 [zfs]
[64677.649294] [<ffffffffa02bf194>] dsl_sync_task+0x184/0x270 [zfs]
[64677.649314] [<ffffffffa0299cc0>] ? restore_write+0x1d0/0x1d0 [zfs]
[64677.649333] [<ffffffffa029a0b0>] ? dmu_recv_end_sync+0x3f0/0x3f0 [zfs]
[64677.649350] [<ffffffffa029a0b0>] ? dmu_recv_end_sync+0x3f0/0x3f0 [zfs]
[64677.649367] [<ffffffffa0299cc0>] ? restore_write+0x1d0/0x1d0 [zfs]
[64677.649384] [<ffffffffa029c103>] dmu_recv_end+0x93/0x1b0 [zfs]
[64677.649407] [<ffffffffa030d614>] zfs_ioc_recv+0x4b4/0xbf0 [zfs]
[64677.649412] [<ffffffffa01370c9>] ? tsd_set+0x199/0x4f0 [spl]
[64677.649440] [<ffffffffa02c8768>] ? rrw_exit+0x68/0x150 [zfs]
[64677.649467] [<ffffffffa02df0ae>] ? spa_close+0xe/0x10 [zfs]
[64677.649492] [<ffffffffa02b8399>] ? dsl_pool_rele+0x29/0x30 [zfs]
[64677.649497] [<ffffffff811d419e>] ? __kmalloc+0x2e/0x230
[64677.649502] [<ffffffffa012eb75>] ? strdup+0x45/0x70 [spl]
[64677.649524] [<ffffffffa030c290>] zfsdev_ioctl+0x460/0x4f0 [zfs]
[64677.649528] [<ffffffff811f634b>] ? __vfs_read+0xcb/0x100
[64677.649532] [<ffffffff81209be3>] do_vfs_ioctl+0xa3/0x5d0
[64677.649536] [<ffffffff811f7176>] ? vfs_read+0x86/0x130
[64677.649539] [<ffffffff8120a189>] SyS_ioctl+0x79/0x90
[64677.649544] [<ffffffff815c71b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[64677.649548] INFO: task zfs:18831 blocked for more than 120 seconds.
[64677.649597] Tainted: P O 4.6.3-1-ARCH #1
[64677.649639] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[64677.649699] zfs D ffff8807aa32bbc0 0 18831 16007 0x00000000
[64677.649703] ffff8807aa32bbc0 0000000000000000 ffff88084ff54c40 ffff88083ed82dc0
[64677.649707] ffff8807aa32c000 ffff880843fe7220 ffff880843fe72e0 ffff880843fe7248
[64677.649710] 0000000000000000 ffff8807aa32bbd8 ffffffff815c372c ffff880843fe72d8
[64677.649714] Call Trace:
[64677.649717] [<ffffffff815c372c>] schedule+0x3c/0x90
[64677.649722] [<ffffffffa0135e29>] cv_wait_common+0x109/0x140 [spl]
[64677.649726] [<ffffffff810bd490>] ? wake_atomic_t_function+0x60/0x60
[64677.649731] [<ffffffffa0135e75>] __cv_wait+0x15/0x20 [spl]
[64677.649757] [<ffffffffa02e39b5>] txg_wait_synced+0xe5/0x130 [zfs]
[64677.649782] [<ffffffffa02bf194>] dsl_sync_task+0x184/0x270 [zfs]
[64677.649803] [<ffffffffa033c1f0>] ? dsl_destroy_snapshot_sync_impl+0xb60/0xb60 [zfs]
[64677.649823] [<ffffffffa033d040>] ? dsl_destroy_snapshot_check_impl+0x90/0x90 [zfs]
[64677.649843] [<ffffffffa033d040>] ? dsl_destroy_snapshot_check_impl+0x90/0x90 [zfs]
[64677.649864] [<ffffffffa033c1f0>] ? dsl_destroy_snapshot_sync_impl+0xb60/0xb60 [zfs]
[64677.649884] [<ffffffffa033d211>] dsl_destroy_snapshots_nvl+0x91/0x100 [zfs]
[64677.649904] [<ffffffffa033d2bf>] dsl_destroy_snapshot+0x3f/0x60 [zfs]
[64677.649926] [<ffffffffa030e547>] zfs_ioc_destroy+0x37/0x50 [zfs]
[64677.649946] [<ffffffffa030c290>] zfsdev_ioctl+0x460/0x4f0 [zfs]
[64677.649950] [<ffffffff81209be3>] do_vfs_ioctl+0xa3/0x5d0
[64677.649953] [<ffffffff815c2fb9>] ? __schedule+0x399/0xad0
[64677.649956] [<ffffffff8120a189>] SyS_ioctl+0x79/0x90
[64677.649960] [<ffffffff815c71b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
[97401.943039] Process accounting resumed
[123180.950179] ata2.00: exception Emask 0x10 SAct 0xc0000 SErr 0x40000 action 0x6 frozen
[123180.950243] ata2.00: irq_stat 0x08000000, interface fatal error
[123180.950287] ata2: SError: { CommWake }
[123180.950316] ata2.00: failed command: READ FPDMA QUEUED
[123180.950355] ata2.00: cmd 60/01:90:60:87:80/00:00:4d:00:00/40 tag 18 ncq 512 in
res c0/00:01:f1:fa:e6/00:00:4e:00:00/40 Emask 0x12 (ATA bus error)
[123180.950468] ata2.00: status: { Busy }
[123180.950496] ata2.00: failed command: WRITE FPDMA QUEUED
[123180.950536] ata2.00: cmd 61/0c:98:c0:6c:bc/00:00:4e:00:00/40 tag 19 ncq 6144 out
res c0/00:01:f1:fa:e6/00:00:4e:00:00/40 Emask 0x12 (ATA bus error)
[123180.950650] ata2.00: status: { Busy }
[123180.950679] ata2: hard resetting link
[123181.259628] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[123181.461200] ata2.00: configured for UDMA/133
[123181.461292] sd 1:0:0:0: [sdb] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[123181.461294] sd 1:0:0:0: [sdb] tag#18 Sense Key : 0xb [current] [descriptor]
[123181.461295] sd 1:0:0:0: [sdb] tag#18 ASC=0x47 ASCQ=0x0
[123181.461297] sd 1:0:0:0: [sdb] tag#18 CDB: opcode=0x88 88 00 00 00 00 00 4d 80 87 60 00 00 00 01 00 00
[123181.461299] blk_update_request: I/O error, dev sdb, sector 1300268896
[123181.461372] sd 1:0:0:0: [sdb] tag#19 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[123181.461374] sd 1:0:0:0: [sdb] tag#19 Sense Key : 0xb [current] [descriptor]
[123181.461375] sd 1:0:0:0: [sdb] tag#19 ASC=0x47 ASCQ=0x0
[123181.461377] sd 1:0:0:0: [sdb] tag#19 CDB: opcode=0x8a 8a 00 00 00 00 00 4e bc 6c c0 00 00 00 0c 00 00
[123181.461378] blk_update_request: I/O error, dev sdb, sector 1320971456
[123181.461429] ata2: EH complete
Looks like there was an I/O error but zfs didn't count it as one... Strange!
This can be explained by the following comment: https://github.com/zfsonlinux/zfs/issues/1256#issuecomment-317913624
If you're reading a file and hit a read error, the data is re-constructed from the other drive and re-written to the bad sectors, but the read error counter is not incremented since it "self-healed". However, unlike a checksum error, a self-healed read error will generate an IO event in zpool events.
Your zpool events shows:
zio_err = 0x5 (EIO)
zio_flags = 0xc0880 (ZIO_FLAG_CANFAIL|ZIO_FLAG_DONT_CACHE|ZIO_FLAG_OPTIONAL|ZIO_FLAG_DONT_QUEUE)
These ZIOs without ZIO_FLAG_IO_RETRY don't bump vdev stats: this is the reason your zpool status is "clear".
vdev_stat_update(zio_t *zio, uint64_t psize)
...
/*
* If this is an I/O error that is going to be retried, then ignore the
* error. Otherwise, the user may interpret B_FAILFAST I/O errors as
* hard errors, when in reality they can happen for any number of
* innocuous reasons (bus resets, MPxIO link failure, etc).
*/
if (zio->io_error == EIO &&
!(zio->io_flags & ZIO_FLAG_IO_RETRY)) {
return;
}
Not an issue, feel free to join the discussion in the zfs-devel mailing list if you have some thoughts on the matter.
So the title is wrong, there was a reason zed wrote a mail.
Since this isn’t a bug anymore, I like to ask for a reopening:
A disk read-error is a read-error of a disk and should be displayed as such, for unrecoverable read-errors there is a line per raidz/pool to show this on that logical level.
I would never think that zpool status would not show such read-errors, which are corrected, at the device level. This clearly break the expectations of the users and should be changed.
@behlendorf what are you thoughts about that?
@RubenKelevra like i said, this is being debated in the zfs-devel mailing list which is better suited for this kind of discussion.
You may find some interesting work already being done here: http://list.zfsonlinux.org/pipermail/zfs-devel/2017-August/000132.html
@RubenKelevra I'm actively working on a fix for this. I'll assign this bug to myself.
@tonyhutter thanks Tony, highly appreciate it!
Yes, i appreciate it too! Much thanks.
I've got a fix out for review (https://github.com/zfsonlinux/zfs/pull/7817).
Most helpful comment
So the title is wrong, there was a reason zed wrote a mail.
Since this isn’t a bug anymore, I like to ask for a reopening:
A disk read-error is a read-error of a disk and should be displayed as such, for unrecoverable read-errors there is a line per raidz/pool to show this on that logical level.
I would never think that
zpool statuswould not show such read-errors, which are corrected, at the device level. This clearly break the expectations of the users and should be changed.@behlendorf what are you thoughts about that?