Julia: Feature request: prevent spawning to thread 1

Created on 5 Jan 2020  Â·  27Comments  Â·  Source: JuliaLang/julia

Currently the scheduler through the Threads.@spawn macro will assign to random thread #'s, including the primary thread. There may be times where there is a "traffic cop" activity on Thread 1 that can be impacted if a heavy-computational (in essence, blocking) task is put onto that thread. It would be nice if there was a way to cordon off Thread 1 from spawning in those cases.

multithreading

Most helpful comment

@tanmaykm and I discussed this problem recently when trying to submit logs into CloudWatch.

We're using a task to push logs over the network into CloudWatch based both on the size of a queue and regularly in time.

If there's some highly parallel code using a lot of @spawn and mixed with a little logging, the log submission or timer tasks can be arbitrarily delayed and there's not a lot we can do about it.

I'm not sure what's best done about this, but a relatively simple thing might be to partition the threads into multiple thread pools at startup and to have the scheduler ensure that @spawn keeps any child tasks within the current pool by default. Then have a way to spawn part of the computation into a named pool? This isn't great or fully composable, but at least it would allow some kind of reliability at the whole-application level. Perhaps we could over-subscribe by one thread by default, and assign that thread to serve a latency-sensitive pool.

All 27 comments

Alternately, of course, the equivalent of @spawnat would allow the same, but I don’t know (doubt, actually) how feasible that is.

Mostly a duplicate of #16350. We could add a thread reserved for latency-bound tasks.

Agreed that this issue and #34240 are the same. #16350 is highly-related but a little different, in that a single thread was being used by both a mostly-blocking task and the REPL. (If the yield were not there, I suspect it would have locked completely.) So that one just about whether the REPL gets its own thread.

I can close this one out if you want to just track the other - your guys' call.

I'll just comment that I've had a desire for something like this as well; particularly in the context of running a microservice/kernel, you want to keep a "heartbeat" thread always ready to respond/beat, while being able to spawn threaded-tasks elsewhere w/o worrying about completely saturating resources to the point that the heartbeat can't respond.

What's the path forward here? I saw that @JeffBezanson suggested potentially having a @spawn background ex macro form that would only spawn tasks on non-main threads. Would that require some functionality on the C-side? Do @JeffBezanson / @vtjnash think it'd be worth me trying to take a stab at this, or would it be pretty hairy? I'm eager to see this, so happy to jump in and help any way I can.

Is this as simple as...

task.jl:542

       tid = 0
        if ccall(:jl_enqueue_task, Cint, (Any,), t) != 0
            # if multiq is full, give to a random thread (TODO fix)
+           if t.background  # Perhaps nontrivial like this due to Core.Task mod, but you get the idea
+               tid = mod(time_ns() % Int, Threads.nthreads()-1) + 2
+           else
                tid = mod(time_ns() % Int, Threads.nthreads()) + 1
+           end
            ccall(:jl_set_task_tid, Cvoid, (Any, Cint), t, tid-1)
            push!(Workqueues[tid], t)
        end

? I'd be happy to do it and test, but I don't have the prereqs to build Julia. Is there a simple way to toggle the Base reference? (Seems like I read that at some point)

Of course, you’d need an @spawnbg or somesuch that sets the background bit before launch. I’ll see if I can get a Julia build running and PR it. (If someone more prepped wants to take it, be my guest :-) )

What would that implementation do, if Julia is started with a single thread? IMHO, it needs some dynamic thread creation to be useful. I.e. start Julia with a single thread and increase the pool once @spawn is called.

Oh, totally agreed. The above code block is in the multi-threaded branch of the scheduler anyways - no change to single-thread. If it is a truly single thread situation with no ability to grow the thread count, then a non-yielding computational thread will block things and there isn't much to be done. Building a growable thread pool is a good idea, but I think will require a lot of stakeholder input before implementation. In my mind, the above would be more short-term, aimed at solving the frozen-GUI/REPL/Thread1-thing problem until the higher-level long term solution shows up.

I think initially, something like @spawn background expr should just throw an error if there's only a single thread. It's simple and straightforward and a lot easier (I assume) than trying to support dynamically added runtime threads. We could perhaps also support allowing @spawn background expr to run the task in the main thread if desired. That could be useful in certain microservice workflows where you want to scale from 1 cpu all the way up to N+ cores.

Yes, a step in-between is absolutely a good thing. Would prefer throwing an error instead of getting stuck. Would be great getting something in shape for 1.4.

No dice on the fix above - that "multiq full" branch never gets hit on my machine. (PC, 8 threads). So the thread assignment is being handled at the C level. If I try to use the jl_set_task_tid in the scheduler to manually assign the id, the machine locks up. I'll dig into the C side later tonight, but I won't be able to test - at best to find the proper way to change an id before launch.

Digging in a little more, I thought I found a 0/1 scheduler issue, but was wrong. 0 is just an uninitialized thread id. Also the "multiq full" conditional branch will never get hit - the limit from prtr.c is 65k tasks per thread, so isn't really part of the mainstream scheduling. The thread assignment appears to be a random number generated in multiq_insert() in partr.c. I won't be able to hack that, as I can't yet rebuild Julia on this machine.

Well, there is a hacked solution to prevent Thread 1 activity from Threads.@threads. Code can be found at https://discourse.julialang.org/t/one-trick-to-get-to-background-only-threads/33175. The trick is to modify the generated task function to exit off thread 1 and have the remaining threads take on a slightly bigger load.

Okay, folks, I have found a temporary solution for this and packaged it up in the ThreadPools.jl package I just submitted for registration. It uses the C-level jl_threading_run to set up individual Task handlers on all threads minus the primary and a primary-thread structure to manage the thread assignments.

I don't know where things will end up with the higher-level concurrency conversations going on, but hopefully the few who have run into this issue will find the package helpful. Once you all decide on an overall strategy, I'd be happy to help implement.

@quinnj - if this needs something more to help your use case, please let me know.

FWIW, jl_threading_run is mostly just a pretty inefficient wrapper around calling

function Threads.pin(t::Task, tid::UInt16)
    ccall(:jl_set_task_tid, Cvoid, (Any, Cint), t, tid-1)
    schedule(t)
    return t
end

with

schedule(Threads.pin(@task(work), tid)))

(which does not currently exist but should)

For whatever reason, any time I tried to use jl_set_task_tid, I got a lockup. I assumed (and still assume) driver error on my part in some way at the C level.

Efficiency on that call isn't super important - it is a one-time call to set up the Task Channels. After that, they do all the work. Feedback on the structure is welcome - this is my first Julia package, and I've only been using the language for a couple months. Thx.

Which I realize does give almost the same behaviors as:

@threads for i = 1:nthreads()
    i == 1 || @async do_some_work
end

(aka what you know as jl_threading_run), but just doesn't come with the extra work of doing the Threads.pin call on each separate thread and joining with them.

Calling jl_set_task_tid is not supposed to be capable of causing a thread to lockup. Misuse either should just do nothing (return failure) or result in bad scheduler performance.

I see your point, I could have implemeted the setup as (pseudocode)

@threads for i = 1:nthreads()
    i == 1 || chans[i-1] = Channel{Task}...
end

instead of using jl_threading_run. Probably a bit cleaner and certainly more Julian.

@vtjnash, thanks. I moved to your suggestion above - code is simpler, and tests all pass.

IIUC @threads for trick only works when it is the outer-most invocation?

julia> begin
       chan = Channel(Inf)
       @sync Threads.@threads for i in 1:Threads.nthreads()
           i == 1 || @async @sync Threads.@threads for j in 1:Threads.nthreads()
               j == 1 || @async push!(chan, (i, j, Threads.threadid()))
           end
       end
       close(chan)
       sort!(collect(chan))
       end
9-element Array{Any,1}:
 (2, 2, 2)
 (2, 3, 2)
 (2, 4, 2)
 (3, 2, 3)
 (3, 3, 3)
 (3, 4, 3)
 (4, 2, 4)
 (4, 3, 4)
 (4, 4, 4)

So I guess it's not a replacement of Threads.pin?

Regarding growable thread pool, it may also be useful for avoiding deadlocks when you are ccalling external libraries that have internal locks. IIUC that's what Go does; see Dmitry Vyukov — Go scheduler: Implementing language with lightweight concurrency - YouTube although this also creates another problem due to the lack of flow-control https://github.com/golang/go/issues/7903.

It'd be nice to resolve this issue quickly since it is a common practice to allocate per-thread workspace assuming nthreads does not change:

https://github.com/JuliaLang/julia/blob/45a0381baf5555952caea0ccb43b377ac5ec8ae2/stdlib/Random/src/RNGs.jl#L309-L311

https://github.com/JuliaLang/julia/blob/45a0381baf5555952caea0ccb43b377ac5ec8ae2/stdlib/LinearAlgebra/src/LinearAlgebra.jl#L440-L442

https://github.com/JuliaLang/julia/blob/45a0381baf5555952caea0ccb43b377ac5ec8ae2/base/pcre.jl#L42-L43

Code like this is not forward-compatible to future Julia versions with growable thread pool.

@tkf - you are correct. The scheduler for Julia (at least for me in 1.3.1-land) only allows the primary thread to spawn tasks to alternate threads. If a non-primary thread schedules a Task, it goes to the same thread mandatorily. To see this, let's remove your thread id constraint above:

julia> begin
        chan = Channel(Inf)
        @sync Threads.@threads for i in 1:Threads.nthreads()
            @async @sync Threads.@threads for j in 1:Threads.nthreads()
                @async push!(chan, (i, j, Threads.threadid()))
            end
        end
        close(chan)
        sort!(collect(chan))
       end
16-element Array{Any,1}:
 (1, 1, 1)
 (1, 2, 2)
 (1, 3, 3)
 (1, 4, 4)
 (2, 1, 2)
 (2, 2, 2)
 (2, 3, 2)
 (2, 4, 2)
 (3, 1, 3)
 (3, 2, 3)
 (3, 3, 3)
 (3, 4, 3)
 (4, 1, 4)
 (4, 2, 4)
 (4, 3, 4)
 (4, 4, 4)

Once an inner Task is off thread 1, its children stay where it is. I ran the same experiment with the @bgthreads macro from my package. Same result (it still uses the Julia scheduler), but everything stays off of thread 1:

julia> begin
        chan = Channel(Inf)
        @sync @bgthreads for i in 1:Threads.nthreads()
            @async @sync @bgthreads for j in 1:Threads.nthreads()
                @async push!(chan, (i, j, Threads.threadid()))
            end
        end
        close(chan)
        sort!(collect(chan))
       end
16-element Array{Any,1}:
 (1, 1, 2)
 (1, 2, 2)
 (1, 3, 2)
 (1, 4, 2)
 (2, 1, 3)
 (2, 2, 3)
 (2, 3, 3)
 (2, 4, 3)
 (3, 1, 4)
 (3, 2, 4)
 (3, 3, 4)
 (3, 4, 4)
 (4, 1, 2)
 (4, 2, 2)
 (4, 3, 2)
 (4, 4, 2)

@vtjnash, that Threads.pin implementation does seem to work like the @spawnat I was requesting. When I was getting lockups earlier, I was reworking the internals of schedule, so I was perhaps getting out of sync with Workqueues (as I said - driver error). Moving the C call to outside of schedule does seem to do the trick. I am finding the ThreadPool API convenient (at least the simple portion), but I may refactor some internal implementation along these lines.

@tanmaykm and I discussed this problem recently when trying to submit logs into CloudWatch.

We're using a task to push logs over the network into CloudWatch based both on the size of a queue and regularly in time.

If there's some highly parallel code using a lot of @spawn and mixed with a little logging, the log submission or timer tasks can be arbitrarily delayed and there's not a lot we can do about it.

I'm not sure what's best done about this, but a relatively simple thing might be to partition the threads into multiple thread pools at startup and to have the scheduler ensure that @spawn keeps any child tasks within the current pool by default. Then have a way to spawn part of the computation into a named pool? This isn't great or fully composable, but at least it would allow some kind of reliability at the whole-application level. Perhaps we could over-subscribe by one thread by default, and assign that thread to serve a latency-sensitive pool.

I have a similar application, where one thread is required to have low latency (it is doing data acquisition with a certain buffer size) and the other thread is responsible for updating the user interface.

Tim Holy had reported that Gtk (which runs a regular task on thread 1) does massively slow down @threads since the job is also spawned on thread 1, which then of course is fighting resources, while the other threads have less to do. In this case, a different scheduling strategy would of course be better, but still it would be nice to prevent the first thread from being used in that scenario.

https://github.com/JuliaGraphics/Gtk.jl/issues/503

+1 for wanting the ability to have Threads.@spawn schedule off the main thread for reasons similar to those posted above. For the moment I'm using @tro3's @tspawnat macro from ThreadPools.jl and getting the desired behaviour. Direct thread ID binding of tasks and/or the possibility to set an environment variable at startup something like we do already to control num threads (e.g. JULIA_THREADSPAWN_BG=1) to override default free-for-all scheduling would be amazing...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

musm picture musm  Â·  3Comments

i-apellaniz picture i-apellaniz  Â·  3Comments

ararslan picture ararslan  Â·  3Comments

StefanKarpinski picture StefanKarpinski  Â·  3Comments

Keno picture Keno  Â·  3Comments