Type | Version/Name
--- | ---
Distribution Name | Scientific Linux
Distribution Version | 7.5
Linux Kernel | 3.10.0-862.14.4.el7
Architecture | x86_64
ZFS Version | zfs-0.7.11-1.el7_5
SPL Version | 0.7.11-1.el7_5
On a zpool with failmode=continue I/O continues to block resulting in un-killable application processes.
zpool create data1 single_HDD
zpool set failmode=continue data1
Start applications performing I/O on zpool and wait for HDD to fail.
Attempt to kill application processes and note they end up in the Z state.
[root@node2126 ~]# zpool list data1
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
data1 3.62T 564K 3.62T - 0% 0% 1.00x UNAVAIL -
[root@node2126 ~]# zpool status data1
pool: data1
state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://zfsonlinux.org/msg/ZFS-8000-JQ
scan: none requested
config:
NAME STATE READ WRITE CKSUM
data1 UNAVAIL 0 0 0 insufficient replicas
wwn-0x5000c5009cf653f1 FAULTED 6 0 0 too many errors
errors: List of errors unavailable: pool I/O is currently suspended
errors: 7 data errors, use '-v' for a list
After attempting to kill application pid 33345 it is blocked in the Zombie state and holding kernel resources I need to re-use (in particular a socket).
[root@node2126 ~]# top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
33345 hdfs 20 0 0 0 0 Z 0.0 0.0 0:06.59 java
[root@node2126 ~]# lsof | awk '$2 == "33345"'
...
java 33345 35895 hdfs 877u unix 0xffffa06b792d7000 0t0 146793 socket
java 33345 35895 hdfs 878uW REG 0,41 31 2 /data1/in_use.lock
What I need is for failmode=continue to not block I/O and allow this process to exit so I can start another one to manage a replacement disk in a new pool without having to reboot, i.e., I don't need to be able to destroy the original zpool, though that would be nice as indicated in other open issues.
I think this is a dup of https://github.com/zfsonlinux/zfs/issues/6649
I think this is a dup of #6649
That ticket is for failmode=wait whereas this ticket is for failmode=continue.
yes, but failmode isn't the issue here. The issue is how to remove a suspended pool from the system.
I was hoping that failmode=continue would obviate the need to wait for the enhancement to allow the removal of a suspended pool. My immediate need is to simply return an error and not block. I can live with an unusable suspended pool in the system until I need to reboot for another reason.
The issue seems to stem from the failmode=continue not aborting existing write requests (as one might expect), but only new ones. From man zpool:
continue
Returns EIO to any new write I/O requests but allows reads to any of the remaining healthy devices. Any write requests that have yet to be committed to disk would be blocked.
Also man zpool dosn't specify exactly what happens to reads from unhealthy devices (which, for a suspended pool, can possibly be _all of them_).
Thus, while the behaviour seen by the OP is kind-of as documented, failmode=continue is IMHO quite useless when effectively behaving identical to failmode=wait (hanging, unkillable I/O, for the in-flight ones when suspension occured) and should be made to cleanly abort _all_ outstanding I/O (writes _and_ reads) that can't complete as the pool went into suspension.
Possibly a timeout for zio in general (to abort them with a clean error condition on long enough time of inactivity) could solve the issue with I/O being stuck in an unkillable state?
This is related to the work I'm doing to support the "abandonment" of a pool from which, for example, IO has "hung" because the completions are no longer arriving (due to flaky hardware, bad driver, etc.) and for which I worked up a proof-of-concept at the OpenZFS hackathon this year. This issue is sort-of a different instance of the problem (in which a pool can't be exported).
The work to support abandoning a pool for which IO has hung is going to leverage the similarly-named "continue" mode of the zio deadman. I've got a patch almost ready to post as a PR which fixes some of the problems with zio deadman.
This particular issue will require somewhat different handling but it _is_ something I've planned on addressing as part of the larger "zpool abandon" feature.
This particular issue will require somewhat different handling but it _is_ something I've planned on addressing as part of the larger "zpool abandon" feature.
@dweeezil most excellent! I have a large number of unreliable HDD in a Hadoop cluster I would be willing to use to test a ZFS patch with when it is available. I most interested in the ability to optionally not block for, "pool I/O is currently suspended", however, I am also interested in testing the ability to abandon and destroy a zpool without having to reboot. Many thanks for working on this.
I would also like to test this feature, the deadman continue helps alot but sometimes I lose connection and wont be able to recover the connection and would like to not have to reboot.
WIP - Fix issues with zio deadman "continue" mode #8021
@dweeezil do you have a rough estimate on when external testing would be helpful?
Does 0.8.0 changes this behavior? Or make it any easier to implement a fix?
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
I'm opposing staleness. This might be old, but it's an issue.
@GregorKopka I've tagged this issue as a defect. I've also added the "Status: Understood" tag which will prevent the bot from marking it again.
Most helpful comment
This is related to the work I'm doing to support the "abandonment" of a pool from which, for example, IO has "hung" because the completions are no longer arriving (due to flaky hardware, bad driver, etc.) and for which I worked up a proof-of-concept at the OpenZFS hackathon this year. This issue is sort-of a different instance of the problem (in which a pool can't be exported).
The work to support abandoning a pool for which IO has hung is going to leverage the similarly-named "continue" mode of the zio deadman. I've got a patch almost ready to post as a PR which fixes some of the problems with zio deadman.
This particular issue will require somewhat different handling but it _is_ something I've planned on addressing as part of the larger "zpool abandon" feature.