Tokio: threadpool: refresh internal structure

Created on 24 Jun 2019 · 12Comments · Source: tokio-rs/tokio

Use what has been learned so far from the tokio-threadpool scheduler to refresh the internal structure.

Part of the focus will be to remove as much unnecessary logic and reduce the struct sizes.

Context

At a high level, each thread in the threadpool is a standalone executor. When the thread is idle, it attempts to steal work from other threads. This provides good locality. Work is submitted to the threadpool using a shared queue.

Proposal

A few aspects of the threadpool are proposed to change.

Queue structure.

Use a fixed size, array MPMC queue for each thread's local queue. This is the most efficient algorithm for a concurrent queue, but comes with the drawback that the queue is fixed in size and cannot grow. If the queue is full when attempting to push, then push the task to the global shared queue.

Threads will loop, doing the following:

Process tasks from local queue.
Process tasks from global queue.
Attempt to steal from other workers.

During normal operation, the local queues should not reach capacity. If a single thread starts to fill up, then other threads should steal, reducing that thread's load. If all threads are at capacity, then there is no advantage to using a queue that supports stealing. Additionally, when the threadpool is under load, using a shared queue has the advantage of improving fairness.

Blocking

Remove logic that bounds the amount of concurrent blocking threads. Instead, when entering a blocking block, always permit this by spawning a new thread if needed. Limiting the number of concurrent blocking threads will be punted to higher level abstractions.

Core threads

The pool will maintain a number of running threads equal to the number of workers assigned to the pool. When the pool is created, these threads will be eagerly spawned.

Misc

Ensure the runtime can be stored in a thread local (#593, #631). Thread-locals must not be accessed in drop. This can be done by not using a thread-cached wait strategy.
Benchmarks (#425).

Source

carllerche

👍3

Most helpful comment

It would probably be possible to require a minimum queue depth to steal.

carllerche on 25 Jun 2019

👍3

All 12 comments

cc @stjepang

carllerche on 24 Jun 2019

As part of this work, we should also look into tuning the "eagerness" of work stealing. I suspect that's where some of the remaining performance difference between tokio-io-pool and tokio comes from.

jonhoo on 25 Jun 2019

👍1

It would probably be possible to require a minimum queue depth to steal.

carllerche on 25 Jun 2019

👍3

I'm been squinting at this since your reference to it, on close of #1147:

[…] always permit [blocking] by spawning a new thread if needed. [...] Limiting the number of concurrent blocking threads will be punted to higher level abstractions.

This would be great if possible. Would you be able to share more about how it could potentially work? I don't follow how some higher abstraction could limit thread count, at least not without blocking on the blocking call, which obviously isn't advisable.

dekellum on 12 Jul 2019

You wouldn't limit thread count per se, but you would limit the max # of current blocking ops. You could do that w/ Semaphore.

carllerche on 12 Jul 2019

In that case, if Semaphore::poll_permit returns Pending wouldn't blocking need to also return Pending? Besides, there would just be a 1-to-1 between permits and blocking threads and meanwhile N reactor (non-blocking) threads must be running?

dekellum on 12 Jul 2019

@dekellum assuming you are asking about how to impl for Sink::poll_ready, the strategy would be that Sink::poll_ready would call poll_permit. The Sink is ready to send a value once the sempahore is acquired. Then, sending the value will be able to block.

carllerche on 12 Jul 2019

Thanks @carllerche, I really appreciate that you read my use case per #1147. I see now that tokio-sync's Semaphore includes the waking mechanism I was originally asking about. Given the conditional blocking nature of my Sinks, I'd still need to buffer an Item, but waking would be taken care of.

If I arrange for a Semaphore in my own project code, then there would be some replication with the same needed by tokio-io (for AsyncWrite, etc.) and I'd no longer be able to specify a single set of (runtime::Builder::) core_threads(c) and blocking_threads(b), including for cases in my application that _can_ use AsyncWrite directly. Have or would you consider continuing to offer the blocking_threads(b) config and access to a shared Semaphore (with b number of permits) or wrapper method to obtain a Permit from the same?

dekellum on 12 Jul 2019

Yes, I plan on having a runtime global "default" blocking set, where you can configure the max concurrency in the runtime builder. However, some things (like stdin / stdout) won't count against that anymore.

carllerche on 12 Jul 2019

👍1

Any way I could help to implement the «global "default" blocking set», @carllerche?

dekellum on 5 Aug 2019

@dekellum tracking issue is #588. I don't have a specific proposal in mind yet. If you want to think about it some and make a proposal, that would be :+1: I will try to leave my thoughts on #588.

carllerche on 5 Aug 2019

Done on master!

carllerche on 22 Nov 2019

Was this page helpful?

0 / 5 - 0 ratings