Use what has been learned so far from the tokio-threadpool scheduler to refresh the internal structure.
Part of the focus will be to remove as much unnecessary logic and reduce the struct sizes.
At a high level, each thread in the threadpool is a standalone executor. When the thread is idle, it attempts to steal work from other threads. This provides good locality. Work is submitted to the threadpool using a shared queue.
A few aspects of the threadpool are proposed to change.
Use a fixed size, array MPMC queue for each thread's local queue. This is the most efficient algorithm for a concurrent queue, but comes with the drawback that the queue is fixed in size and cannot grow. If the queue is full when attempting to push, then push the task to the global shared queue.
Threads will loop, doing the following:
During normal operation, the local queues should not reach capacity. If a single thread starts to fill up, then other threads should steal, reducing that thread's load. If all threads are at capacity, then there is no advantage to using a queue that supports stealing. Additionally, when the threadpool is under load, using a shared queue has the advantage of improving fairness.
Remove logic that bounds the amount of concurrent blocking threads. Instead, when entering a blocking block, always permit this by spawning a new thread if needed. Limiting the number of concurrent blocking threads will be punted to higher level abstractions.
The pool will maintain a number of running threads equal to the number of workers assigned to the pool. When the pool is created, these threads will be eagerly spawned.
cc @stjepang
As part of this work, we should also look into tuning the "eagerness" of work stealing. I suspect that's where some of the remaining performance difference between tokio-io-pool and tokio comes from.
It would probably be possible to require a minimum queue depth to steal.
I'm been squinting at this since your reference to it, on close of #1147:
[鈥 always permit [blocking] by spawning a new thread if needed. [...] Limiting the number of concurrent blocking threads will be punted to higher level abstractions.
This would be great if possible. Would you be able to share more about how it could potentially work? I don't follow how some higher abstraction could limit thread count, at least not without blocking on the blocking call, which obviously isn't advisable.
You wouldn't limit thread count per se, but you would limit the max # of current blocking ops. You could do that w/ Semaphore.
In that case, if Semaphore::poll_permit returns Pending wouldn't blocking need to also return Pending? Besides, there would just be a 1-to-1 between permits and blocking threads and meanwhile N reactor (non-blocking) threads must be running?
@dekellum assuming you are asking about how to impl for Sink::poll_ready, the strategy would be that Sink::poll_ready would call poll_permit. The Sink is ready to send a value once the sempahore is acquired. Then, sending the value will be able to block.
Thanks @carllerche, I really appreciate that you read my use case per #1147. I see now that tokio-sync's Semaphore includes the waking mechanism I was originally asking about. Given the conditional blocking nature of my Sinks, I'd still need to buffer an Item, but waking would be taken care of.
If I arrange for a Semaphore in my own project code, then there would be some replication with the same needed by tokio-io (for AsyncWrite, etc.) and I'd no longer be able to specify a single set of (runtime::Builder::) core_threads(c) and blocking_threads(b), including for cases in my application that _can_ use AsyncWrite directly. Have or would you consider continuing to offer the blocking_threads(b) config and access to a shared Semaphore (with b number of permits) or wrapper method to obtain a Permit from the same?
Yes, I plan on having a runtime global "default" blocking set, where you can configure the max concurrency in the runtime builder. However, some things (like stdin / stdout) won't count against that anymore.
Any way I could help to implement the 芦global "default" blocking set禄, @carllerche?
@dekellum tracking issue is #588. I don't have a specific proposal in mind yet. If you want to think about it some and make a proposal, that would be :+1: I will try to leave my thoughts on #588.
Done on master!
Most helpful comment
It would probably be possible to require a minimum queue depth to steal.