Related: #4037
Also related: #1700
We can already do this using schedule
. The only problem is that the task can catch the exception and retry; there is no way to be sure the task ends. Maybe the interface can just be t.state = :done
. We just need to update schedule
to drop finished tasks.
We should probably still have terminate(t::Task) = (t.state = :done; :ok)
defined. Just seems odd that in this particular case, we expect the user to access a member field directly, while in other cases, a Task object is effectively used as an opaque handle.
Nah, that won't be sufficient anyway. We should remove the task from whatever wait queue it is in, so it can be GC'd.
Bumping. Do we have this functionality yet? Do we want it?
Don't have it yet. We should. Killing a task should also release whatever resource it is waiting on - file fd, socket, remote reference, etc.
I don't think we should add this, since "releasing whatever resource" is generally impractical and buggy. I don't know of any APIs that have this sort of feature but don't warn you not to use it due to the infeasibility of cleaning up state afterwards:
http://docs.oracle.com/javase/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html
https://msdn.microsoft.com/en-us/library/windows/desktop/ms686717%28v=vs.85%29.aspx
https://internals.rust-lang.org/t/thread-cancel-support/3056 (rust doesn't have it, this is a discussion on why not)
After reading those links, an appropriate solution would be to define an interrupt(t::Task)
.
Currently it should just throw an InterruptException
in the target task if it is waiting on I/O, remote reference, Condition, etc. For compute bound tasks, if and when we have tasks scheduled on different threads, it could send an interrupt signal to the specific thread (if possible).
+1 for interrupt(t::Task)
. @amitmurthy Do you have any idea of how to throw the InterruptException
on the task? I feel like this solution would also help me figure out how to implement a stacktrace(t::Task)
method (ie: run stacktrace()
in the task for debugging).
stacktrace(::Task)
is much easier than interrupt(::Task)
Another reference on this topic is https://news.ycombinator.com/item?id=13470452
Note, that the current preferred mechanism of aborting another Task is to close
the shared resource and let the runtime clean it up synchronously. This has the benefit of being reliable, easy to code, and already exists. It also is less racy, since close
is stateful (once closed, the resource remains closed) rather than an edge-driven event, and typically also an expected condition (so it doesn't require any extra effort to handle).
That works for libuv resources implementing close
and Channels only. For tasks waiting on a remotecall or waiting on a Future/RemoteChannel users have no access to the Condition variables the task is waiting on. And implementing close(::Condition)
which would invalidate all current and future calls on a Condition object I think is not correct. If we do that we may as well have interrupt(::Task)
call close
on the waiting condition which would bring us back to the issue of proper cleanup in the libuv case. Right?
No, it would still be different because it would no longer be specific to intended interruption. For example, you might end up aborting a call to close
or showerror
instead of the intended job.
I think the remotecall
functions generally have async versions which return a handle to the Channel? I think in most other cases, the resource is passed in as an argument which gives the caller some leverage. In the worst case for remotecall
, since the resource argument is worker-pid
, you could rmprocs(p)
to kill / close the connection to that remote worker.
I think the remotecall functions generally have async versions which return a handle to the Channel
The calls return a Future
and a wait on a Future results in a remote task that waits on the backing channel waiting for data. On the caller we are waiting on a Condition which will be triggered by a response from the remote wait.
In a statement like @async remotecall_fetch(....)
we only have access to a Task object. rmprocs(p)
seems like an overkill but is probably the correct way to do it currently, as we don't have a means to interrupt the specific remote task.
Right, but I thought that's why remotecall
is available. And terminating that Task wouldn't actually notify the remote worker to stop, but might confuse / corrupt it when it tries to report it's results. I realize that ensuring cancel-ability may require thinking about how it'll work and threading out handles to the objects that can be used to stop the work. But I don't see how it could be done any other way. There's no guarantee that in the @async
example there that it's not actually implemented as @async wait(@async remotecall_fetch())
(hopefully not intentionally...), so all that killing the Task directly accomplishes is destroying the monitoring process.
Someone has written an article that draws an equivalence of @schedule
-like behavior to the evils of goto
.
https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful
There is a Julia-specific thread to this conversation at https://discourse.julialang.org/t/schedule-considered-harmful/10540
As someone who learned programming on old-fashioned BASICs like Applesoft BASIC on the Apple II+, and then had to move to procedural programming in C, I initially had the same gut-response to this article as I did back then. I'd call it "instinctive repulsion". But of course we do all now recognize goto
as evil, so I reset my thinking and gave it an honest review.
In doing so, I have come to the conclusion that the article makes some great points. There is no good way to universally handle uncaught exceptions at the top of the @schedule
other than to drop them on the floor. There isn't really a global (or even local) repository of outstanding Tasks that were created by @schedule
, so you can't really even know what's running, or what may have been left out there by a black-box function call you have made.
But the reason I mention this article in this particular discussion is that I think having to support such unbounded @schedule
calls may be one of the reasons that the ability to cancel a task is so hard. I haven't reviewed the Trio Python library itself or delved into the details of the Nursery concept as described other than to recognize it as similar to wrapping @async
in @sync
. But it does occur to me that there may be some concepts in there relating to "checkpoints" that begin to enable task cancels. (https://trio.readthedocs.io/en/latest/reference-core.html#checkpoints). Perhaps, if @schedule
itself is dropped and the only way to schedule a task is to wrap an @async
inside a @sync
, then dealing with the resulting fallout cleaning up resources a task is holding on to may become easier.
Anyhow, just some food for thought here. I'm not necessarily proposing anything, but rather hoping to move the idea of task cancellation back into active discussion.
I think we should look closely at the Trio approach to I/O in the future—i.e. post 1.0 (so this may have to be optional in 1.x or it may have to wait until 2.0). It has some really nice properties, including:
Every task spawned within a function finishes before that function returns unless you spawn the task within an explicit "task nursery" that outlives the spawning function.
There's a natural parent task to handle every child task failure—no more task failures disappearing into the void. This has been a frequent point of contention between @JeffBezanson and myself; the Trio approach provides a nice clean solution that could make both of us happy.
There's a clear structure for cancellation of tasks and subtasks: if you kill a task, that also kills all subtasks. In particular, this means that instead of having timeout arguments on every possible blocking operation, you can do external cancellation of blocking tasks correctly—this composes better and means you don't have to wait until a potentially indefinite chain of timers expires.
The most effective way forward may be to see if I can get @JeffBezanson and @njsmith in a room together some time since Nathaniel is a much more effective and explainer of and advocate for the Trio model than I am. (Hope you don't mind the ping, Nathaniel!)
The most effective way forward may be to see if I can get @JeffBezanson and @njsmith in a room together
Sounds like a fun time to me :-)
The most effective way forward may be to see if I can get @JeffBezanson and @njsmith in a room together some time since Nathaniel is a much more effective and explainer of and advocate for the Trio model than I am.
BTW, we've just created a virtual room for cross-language discussions of structured concurrency, in case anyone is interested: https://trio.discourse.group/c/structured-concurrency
Cool, Nathan, I’ve joined :)
Nice, I've joined as well. Reading over the blog post I think this approach makes a lot of sense. (As a side note — it also means that it was "correct" to inherit loggers from their parent task. Phew!)
I think cancellation in Trio works nicely because Python forces you to write await
which becomes the (potential) checkpoints. As await
can only be used inside functions marked by async
, you don't need to care about cancellation inside blocking (non-async
) functions.
Is there any plans/ideas for (1) how to make non-I/O (compute-intensive) functions cancellable and (2) how to mark them as such in a way that the callers can know that they have to prepare for cancellation? I guess passing around cancellation tokens (see also https://vorpus.org/blog/timeouts-and-cancellation-for-humans/) and handling exit manually as in Go's errgroup
would be an option. But it sounds like a very clumsy API to use.
BTW, I played around a bit to see how Trio-like API would look like and how passing around "nursery" would work (although it's more like Go's errgroup
as there is no cancellation support). Here is a demo:
function bg(nursery)
@with_nursery nursery begin
println("in nursery")
Threads.@spawn begin
sleep(0.1)
Threads.@spawn begin
sleep(0.1)
println("world")
end
println("hello")
end
end
println("out of nursery")
end
function demo()
@sync begin
println("launching background tasks")
bg(@get_nursery)
bg(@get_nursery)
println("synchronising background tasks")
end
println("synchronized background tasks")
end
Quick-and-dirty implementation of
@get_nursery
and @with_nursery
(no cancellation support at all!))
struct TaskVector
tasks::Vector{Any}
lock::Threads.SpinLock
end
TaskVector(tasks) = TaskVector(tasks, Threads.SpinLock())
function Base.push!(v::TaskVector, x)
lock(v.lock)
try
return push!(v.tasks, x)
finally
unlock(v.lock)
end
end
macro get_nursery()
var = esc(Base.sync_varname)
quote
TaskVector($var)
end
end
macro with_nursery(nursery, body)
ex = quote
let $(Base.sync_varname) = $nursery
$body
end
end
return esc(ex)
end
demo()
should print
launching background tasks
in nursery
out of nursery
in nursery
out of nursery
synchronising background tasks
hello
hello
world
world
synchronized background tasks
I think Kotlin could be interesting here since it does structured concurrency without async
/await
keywords. Kotlin's approach to making computation cancellable is to call yield
function or checking isActive
variable. See: https://kotlinlang.org/docs/reference/coroutines/cancellation-and-timeouts.html#making-computation-code-cancellable
But it looks there is no mechanism for callers to know if the function is cancellable?
I started implementing a very minimal version of structured concurrency here https://github.com/tkf/Awaits.jl. Basically I implemented what I mentioned in the last bits of https://github.com/JuliaLang/julia/issues/32677#issuecomment-520115386. The idea is to define a macro @await body
which is expanded to (roughly speaking)
ans = $body
if ans isa Exception
cancel!($context)
return ans
end
ans
where $context
is a variable that tracks cancellation tokens (and tasks). There is also @go body
for (a thread safe version of) @spawn @await body
. Another important construct is @check
which expands to
shouldstop($context) && return Cancelled()
where Cancelled <: Exception
. This way, compute-intensive functions can define checkpoints manually by inserting @check
. Those functions can also "throw" an error by returning an Exception
when they hit a condition where the whole computation should stop. Callers of such cancellable functions can wrap the call with @await
which then explicitly marks a checkpoint. Sub-computations can be cancelled individually by something like
context = @cancelscope begin
@go ...
@go ...
end
cancel!(context)
I also wrote some minimal documentation https://tkf.github.io/Awaits.jl/dev/ and tests https://github.com/tkf/Awaits.jl/blob/master/test/test_simple.jl
Most helpful comment
I think we should look closely at the Trio approach to I/O in the future—i.e. post 1.0 (so this may have to be optional in 1.x or it may have to wait until 2.0). It has some really nice properties, including:
Every task spawned within a function finishes before that function returns unless you spawn the task within an explicit "task nursery" that outlives the spawning function.
There's a natural parent task to handle every child task failure—no more task failures disappearing into the void. This has been a frequent point of contention between @JeffBezanson and myself; the Trio approach provides a nice clean solution that could make both of us happy.
There's a clear structure for cancellation of tasks and subtasks: if you kill a task, that also kills all subtasks. In particular, this means that instead of having timeout arguments on every possible blocking operation, you can do external cancellation of blocking tasks correctly—this composes better and means you don't have to wait until a potentially indefinite chain of timers expires.
The most effective way forward may be to see if I can get @JeffBezanson and @njsmith in a room together some time since Nathaniel is a much more effective and explainer of and advocate for the Trio model than I am. (Hope you don't mind the ping, Nathaniel!)