0.2.14 - 0.2.19
arch linux, but no IO involved.
Please see the issue reported first at futures-rs because I suspected JoinAll was maybe at fault.
JoinHandle often reports Pending even though it's task has already returned Ready. Over a collection of futures like in LocalSet, this can require several runs before all contained JoinHandles return Ready. When there are more tasks in the set than budget, this can lead to the JoinHandles never returning Ready and tasks awaiting them to hang.
It is possible that an inefficiency in the synchronization of JoinHandle also has an impact when using a threadpool, but it might not be so noticable since this just throws more resources at the problem.
I _think_ I know what's up here.
LocalSet wraps each time a local task is run in budget:
https://github.com/tokio-rs/tokio/blob/947045b9445f15fb9314ba0892efa2251076ae73/tokio/src/task/local.rs#L406
This is identical to what tokio's other schedulers do when running tasks, and in theory should give each task its own budget every time it's polled.
_However_, LocalSet is different from other schedulers. Unlike the runtime schedulers, a LocalSet is itself a future that's run on another scheduler, in block_on. block_on _also_ sets a budget:
https://github.com/tokio-rs/tokio/blob/947045b9445f15fb9314ba0892efa2251076ae73/tokio/src/runtime/basic_scheduler.rs#L131
The docs for budget state that:
https://github.com/tokio-rs/tokio/blob/947045b9445f15fb9314ba0892efa2251076ae73/tokio/src/coop.rs#L73
This means that inside of a LocalSet, the calls to budget are no-ops. Instead, each future polled by the LocalSet is subtracting from a single global budget.
LocalSet's RunUntil future polls the provided future before polling any other tasks spawned on the local set: https://github.com/tokio-rs/tokio/blob/947045b9445f15fb9314ba0892efa2251076ae73/tokio/src/task/local.rs#L525-L535
In this case, the provided future is JoinAll. Unfortunately, every time a JoinAll is polled, it polls _every_ joined future that has not yet completed. When the number of futures in the JoinAll is >= 128, this means that the JoinAll immediately exhausts the task budget. This would, in theory, be a _good_ thing --- if the JoinAll had a huge number of JoinHandles in it and none of them are ready, it would limit the time we spend polling those join handles.
However, because the LocalSet _actually_ has a single shared task budget, this means polling the JoinAll _always_ exhausts the entire budget. There is now no budget remaining to poll any other tasks spawned on the LocalSet, and they are never able to complete.
There are two possible solutions: LocalSet could use coop::limit to poll each future with a smaller budget, ensuring no one future can exhaust the entire budget. However, this would mean that tasks on a LocalSet would be given a smaller budget than comparable tasks on a regular worker or basic_scheduler. Alternatively, LocalSet could call coop::stop to reset the budget when it is polled. That way, each task spawned on the LocalSet _would_ get its own separate budget rather than starving other tasks. Since LocalSets can only be polled in block_on, and are not competing with other tasks, I think it's correct for it to exempt itself from budgeting.
That's some awesome debugging! Thanks for looking into that. There is one thing I don't understand though:
However, because the LocalSet actually has a single shared task budget, this means polling the JoinAll always exhausts the entire budget. There is now no budget remaining to poll any other tasks spawned on the LocalSet, and they are never able to complete.
From what I gather from the logs I posted, the inner tasks have in fact always run and completed before the outer ones. After they print, they immediately return Poll::Ready. However the joinhandles still said they hadn't. The outer tasks which were being polled would have completed had the JoinHandles not returned Pending.
It is very late here, so I have to sleep, but tomorrow I will run the test with your PR. I imagine it will solve the hang, but I suspect there is still an underlying issue that could have solved this from another angle and that may be desirable to solve as well. I'll have a look at the code of JoinHandle tomorrow as well.
When the JoinHandles are polled, the budget is 0. This causes them to return Pending immediately, without checking if the task has actually completed.
Yes, thanks. It dawned to me overnight that the budget was returning Pending, not the JoinHandle...
Most helpful comment
I _think_ I know what's up here.
LocalSetwraps each time a local task is run inbudget:https://github.com/tokio-rs/tokio/blob/947045b9445f15fb9314ba0892efa2251076ae73/tokio/src/task/local.rs#L406
This is identical to what tokio's other schedulers do when running tasks, and in theory should give each task its own budget every time it's polled.
_However_,
LocalSetis different from other schedulers. Unlike the runtime schedulers, aLocalSetis itself a future that's run on another scheduler, inblock_on.block_on_also_ sets a budget:https://github.com/tokio-rs/tokio/blob/947045b9445f15fb9314ba0892efa2251076ae73/tokio/src/runtime/basic_scheduler.rs#L131
The docs for
budgetstate that:https://github.com/tokio-rs/tokio/blob/947045b9445f15fb9314ba0892efa2251076ae73/tokio/src/coop.rs#L73
This means that inside of a
LocalSet, the calls tobudgetare no-ops. Instead, each future polled by theLocalSetis subtracting from a single global budget.LocalSet'sRunUntilfuture polls the provided future before polling any other tasks spawned on the local set: https://github.com/tokio-rs/tokio/blob/947045b9445f15fb9314ba0892efa2251076ae73/tokio/src/task/local.rs#L525-L535In this case, the provided future is
JoinAll. Unfortunately, every time aJoinAllis polled, it polls _every_ joined future that has not yet completed. When the number of futures in theJoinAllis >= 128, this means that theJoinAllimmediately exhausts the task budget. This would, in theory, be a _good_ thing --- if theJoinAllhad a huge number ofJoinHandles in it and none of them are ready, it would limit the time we spend polling those join handles.However, because the
LocalSet_actually_ has a single shared task budget, this means polling theJoinAll_always_ exhausts the entire budget. There is now no budget remaining to poll any other tasks spawned on theLocalSet, and they are never able to complete.There are two possible solutions:
LocalSetcould usecoop::limitto poll each future with a smaller budget, ensuring no one future can exhaust the entire budget. However, this would mean that tasks on aLocalSetwould be given a smaller budget than comparable tasks on a regular worker orbasic_scheduler. Alternatively,LocalSetcould callcoop::stopto reset the budget when it is polled. That way, each task spawned on theLocalSet_would_ get its own separate budget rather than starving other tasks. SinceLocalSets can only be polled inblock_on, and are not competing with other tasks, I think it's correct for it to exempt itself from budgeting.