It has been reported (https://github.com/rust-lang-nursery/futures-rs/issues/1170) that there have been lost wake ups.
Downgrading to threadpool 0.1.4 reportedly solves the problem. #459 is the only meaningful change.
After thinking a bunch about the description of the problem, I don't believe that the bug being fixed is real. From the PR:
When unparking a worker thread, we do the following steps:
1) Try transitioning IDLE -> NOTIFY.
2) Lock the mutex.
3) Try transitioning SLEEP -> NOTIFY and wakeup on success.The problem occurs if the thread transitions from SLEEP to IDLE between steps 1 and 2. In that case, step 3 will fail to transition and simply return from unpark.
However, this case would not be a problem as the sleeping thread that transitions from SLEEP -> IDLE will establish a happens before relationship with the notifying thread (due to the SeqCst ordering). So, when the sleeping thread wakes up, it is guaranteed to see the change that the notifying thread is notifying for. This requires SeqCst and would not be true w/ acquire / release.
That said, that doesn't mean the PR introduced a bug...
This line is suspicious to me now... it isn't acquiring anything from notifier threads...
It probably should be a swap.
Edit: This line too: https://github.com/tokio-rs/tokio/blob/0490280d662f000aff674593cc9a4f69a1cd1171/tokio-threadpool/src/park/default_park.rs#L138
This line is suspicious to me now... it isn't acquiring anything from notifier threads...
Well, it's consuming the notification and going back to the IDLE state. The worst that could happen is a spurious wakeup. std::thread::park has the same store operation.
Edit: This line too:
std::thread::park uses a cmpxchg operation here, but again - the worst case scenario is just a spurious wakeup.
@stjepang that isn't the issue. The issue is that there is no acquire ordering established with this line.
Acquire is needed to get access to the change that the unpark thread is notifying for.
That said, on x86 assuming mfence is used it would be ok...
This code should be arch-independent which makes it hard to analyze. On x86 locked stores are totally ordered WRT locked stores and loads from the same location, so it's reasonably easy to get right.
I'm assuming that because everything is SeqCst then that also applies to other architectures with weaker ordering rules.
(The heavy use of compare_and_swap makes me wonder whether the "check state outside of the lock" is actually worthwhile - how often do the "park when already notified" and "unpark while not sleeping" cases get hit? Depending on how mutex is implemented, the cmpxchg might be more expensive, esp if the lock is not contended.)
@jsgf I understand that it _should_ be arch dependent... however, I'm trying to figure out if there currently is a bug on x86.
Either way, the issues pointed out here should be fixed.
Getting a possibly-relevant deadlock in https://github.com/remexre/garlichat ; cd garlichat && cargo run, the first character typed prints in the top region (as it should), but further characters are lost. (Note: the terminal is put into raw mode, so it's necessary to kill the garlichat process.)
eprintln!() shows the getch-ing thread receives additional characters on keypresses, but the event loop doesn't get woken up.
I am debugging this case.
It is looking like the issue in garlichat is not related :(
I believe I have a potential reproduction for this issue: https://github.com/hawkw/tokio-deadlock-repro. I've written a simple program which spawns a task that sends numbers into an MPSC, and then repeatedly blocks on reading values out of the channel. After a varying number of calls to Runtime::block_on, this program will hang. This occurs with both the current master, and tokio 0.1.7.
I've tested this reproduction on macOS 10.12.6 with rustc 1.28.0.
@stjepang Are you able to investigate this repro? ^^
If it's useful, here's logs from the threadpool prior to and while it hits (what I believe is) the deadlock:
Logs
DEBUG deadlock_repro > Some(246) received
TRACE tokio_threadpool::task > -> task complete
TRACE tokio_threadpool::worker > task complete; state=pool::State { lifecycle: Running, num_futures: 1 }
INFO deadlock_repro > 246th block_on finished
TRACE tokio_threadpool::worker > try_steal_task -- signal_work; self=2; from=1
TRACE tokio_threadpool::sender > execute; count=2
TRACE tokio_threadpool::pool > -> submitting to random; idx=3
TRACE tokio_threadpool::worker > Worker::check_run_state; delegate signal
TRACE tokio_threadpool::worker > found work while draining; signal_work
TRACE tokio_threadpool::task > Task::run; state=Running
DEBUG deadlock_repro > Some(247) received
TRACE tokio_threadpool::task > -> task complete
TRACE tokio_threadpool::worker > task complete; state=pool::State { lifecycle: Running, num_futures: 1 }
INFO deadlock_repro > 247th block_on finished
TRACE tokio_threadpool::sender > execute; count=2
TRACE tokio_threadpool::pool > -> submitting to random; idx=0
TRACE tokio_threadpool::worker > Worker::check_run_state; delegate signal
TRACE tokio_threadpool::worker > found work while draining; signal_work
TRACE tokio_threadpool::task > Task::run; state=Running
DEBUG deadlock_repro > Some(248) received
TRACE tokio_threadpool::task > -> task complete
INFO deadlock_repro > 248th block_on finished
TRACE tokio_threadpool::worker > task complete; state=pool::State { lifecycle: Running, num_futures: 1 }
TRACE tokio_threadpool::sender > execute; count=2
TRACE tokio_threadpool::pool > -> submitting to random; idx=7
TRACE tokio_threadpool::worker > Worker::sleep; worker=WorkerId(6)
TRACE tokio_threadpool::worker > sleeping -- push to stack; idx=6
TRACE tokio_threadpool::worker > Worker::sleep; worker=WorkerId(4)
TRACE tokio_threadpool::worker > -> starting to sleep; idx=6
TRACE tokio_threadpool::worker > Worker::sleep; worker=WorkerId(5)
TRACE tokio_threadpool::worker > sleeping -- push to stack; idx=4
TRACE tokio_threadpool::worker > Worker::sleep; worker=WorkerId(1)
TRACE tokio_threadpool::worker > Worker::sleep; worker=WorkerId(2)
TRACE tokio_threadpool::worker > Worker::sleep; worker=WorkerId(3)
TRACE tokio_threadpool::worker > Worker::sleep; worker=WorkerId(0)
TRACE tokio_threadpool::worker > sleeping -- push to stack; idx=5
TRACE tokio_threadpool::worker > -> starting to sleep; idx=4
TRACE tokio_threadpool::worker > sleeping -- push to stack; idx=1
TRACE tokio_threadpool::worker > sleeping -- push to stack; idx=2
TRACE tokio_threadpool::worker > sleeping -- push to stack; idx=3
TRACE tokio_threadpool::worker > sleeping -- push to stack; idx=0
TRACE tokio_threadpool::worker > -> starting to sleep; idx=5
TRACE tokio_threadpool::worker > -> starting to sleep; idx=1
TRACE tokio_threadpool::worker > -> starting to sleep; idx=2
TRACE tokio_threadpool::worker > -> starting to sleep; idx=0
TRACE tokio_threadpool::worker > -> starting to sleep; idx=3
After that point, it doesn't log anything else.
@hawkw Are you running --release or just normal build? What kind of numbers of iterations do you see before a hang?
With:
RUST_LOG=info cargo run --release
I'm not able to repo after 1.2M block_on calls. (Rust 1.28, Linux, 32 cores.)
@jsgf I've been running this in debug mode primarily, and I see it hang typically after <500 block_ons. That's interesting to hear, I'll see if it still hangs in release mode.
Hmm, I'm still seeing it hang in release mode, after anywhere from between 350-2.5k block_ons...
@hawkw can you try bisecting this?
I think I've root-caused at least some of my hang to a deadlock between backtrace and jemalloc heap profiling (specifically, dl_iterate_phdr). I'm making some changes to confirm, and I'm not sure if it completely explains what we're seeing.
@carllerche I did a bisect with my reproduction, and it appears that the first bad commit is 990186ec9d396a3cd2111c934d4dfc81d2aa866f ("Optimize spinning in Worker::run"). I haven't looked _too_ closely at that change, but it certainly seems plausible that it introduced the issue, based on what changed.
/cc @stjepang
I have a potential fix for this issue. Doing some testing before I open a PR.
I've done some additional testing, and it looks like the issue I've reproduced and fixed is macOS-specific. So, if the issue reported in rust-lang-nursery/futures-rs#1170 was observed on Linux, there's probably two+ unrelated hangs going on here...
That code doesn't look macOS specific - is there a difference in the behaviour of thread::yield_now() or something that makes it so?
I think this dates back even further? Or is.... Unrelated in a way? :confused:
Locally I pulled 9aaa8f0 (going back further wont compile), and altered @hawkw code to do the oneshot itself (block_on doesn't exist) and this still reproduces (Linux Debian 4.9.110). This will reproduce (debug build) sometimes as early as the 1st/2nd iteration.
One funny thing I am noticing however. with unbounded() this reproduces right away. if i use channel(n) it runs totally fine... As you increase the channel size (i did 500K up to 25M) you will notice a pause and then it will continue running just fine. This pause seemingly happening right ~ the number i commonly see unbounded() stop. The larger the channel bound, the longer the pause. 10M ~ 5s, 25M ~ 10s.
Is it possible the send_all is _taking over_ or something? If you run with unbounded() you will see the memory usage continues to grow even after it _hangs_ leading me to believe the process isn't totally hung, and the sender is still sending/loading the channel?, but for some reason the new oneshot/block_on isn't getting called? But if the channel reaches its max (i am assuming) then it triggers it to end up getting called anyway?
I am new to rust so I am not sure how much, if this helps at all. The ticket intrigues me :)
I hope this will be helpful, even if I'm not quite sure yet about what the best way to solve this is
(or how many different underlying bugs there are regarding this issue).
To reproduce this easier:
1000000000 to get a lost wakeup after the first iteration consistentlyTrace Output
TRACE mio::poll > registering with poller
TRACE tokio_threadpool::builder > build; num-workers=2
DEBUG tokio_reactor::background > starting background reactor
TRACE mio::sys::windows::selector > select; timeout=None
TRACE tokio_threadpool::sender > execute; count=1
TRACE mio::sys::windows::selector > polling IOCP
TRACE tokio_threadpool::pool > submit to existing worker; idx=1; state=worker::State { lifecycle: Shutdown, is_pushed: true }
TRACE tokio_threadpool::sender > execute; count=2
TRACE tokio_threadpool::worker > found work while draining; signal_work
TRACE tokio_threadpool::pool > submit to existing worker; idx=0; state=worker::State { lifecycle: Shutdown, is_pushed: true }
TRACE tokio_threadpool::worker > try run WorkerId(1)
TRACE tokio_threadpool::task > Task::run; state=Running
TRACE tokio_threadpool::worker > found work while draining; signal_work
TRACE tokio_threadpool::task > Task::run; state=Running
DEBUG deadlock_repro > Some(1) received
TRACE tokio_threadpool::task > -> task complete
INFO deadlock_repro > 1th block_on finished
TRACE tokio_threadpool::worker > task complete; state=pool::State { lifecycle: Running, num_futures: 1 }
TRACE tokio_threadpool::sender > execute; count=2
TRACE tokio_threadpool::pool > -> submitting to random; idx=0
TRACE tokio_threadpool::worker > Worker::check_run_state; delegate signal
TRACE tokio_threadpool::worker > found work while draining; signal_work
TRACE tokio_threadpool::task > Task::run; state=Running
DEBUG deadlock_repro > Some(2) received
TRACE tokio_threadpool::task > -> task complete
INFO deadlock_repro > 2th block_on finished
TRACE tokio_threadpool::sender > execute; count=2
TRACE tokio_threadpool::pool > -> submitting to random; idx=1
It actually isn't a deadlock but just unfortunate:
send_all is currently running and never stops, since data is always ready..inbound (external) queue (it was pushed there in /1/) of the worker, but stealing only works from the .worker queue (internal). Thanks everyone for digging into the problem, this is super helpful! <3
I can confirm that @hawkw's code reproduces the issue on my Linux machine.
The diagnosis by @steffengy seems like a legitimate problem. I've tried replacing inbound per-worker queues with a single globally shared queue of tasks so that spawned tasks never get stuck in a busy worker's inbound queue. However, the bug issue was still reproducing.
After that I realized it is possible for a task to be spawned just before a worker pushes itself onto the sleeper stack and goes to sleep. That way the thread that spawned the task thinks someone will pick it up while the thread going to sleep is convinced there are no tasks available. We need an additional check just after the thread is pushed onto the sleeper stack. After adding the check I can no longer reproduce the issue.
The changes in question are submitted in #621. I hope this also fixes the originally reported problems, but we'll see.
@stjepang
I can confirm that @hawkw's code reproduces the issue on my Linux machine.
Now _that_'s interesting --- I was only able to get that repro code to hang on macOS. Curious! It's possible I didn't try for long enough on Linux.
@hawkw It hangs pretty much instantly in both debug and release mode with either 2 worker threads or 4 threads - doesn't really matter. So it's pretty consistent.
Sometimes it even hangs at the second block_on:
tokio-deadlock-repro master $ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.06s
Running `target/debug/deadlock-repro`
DEBUG 2018-09-07T16:53:48Z: deadlock_repro: Some(1) received
INFO 2018-09-07T16:53:48Z: deadlock_repro: 1th block_on finished
@stjepang huh, interesting! I haven't seen it hang that fast even on my Mac. I'll run it again on Linux and see what happens.
I'm also going to try running against your changes in #621 to make sure you've fixed the same issues I've been seeing. It seems like there could possibly be one or more platform-specific issues as well, so I want to try and rule that out.
@stjepang It sounds like there are two separate. The first is the behavior described by @steffengy and the second an actual bug with submitting tasks to workers.
Is it possible to split up the two into separate PRs?
I'm not necessarily against the change, but a task that does not yield back to the executor is a rogue task (if there is only one thread, there is nothing to do). Also, this change would go against the goal of #424.
So, if possible, lets fix the bug you described. If it is broken into a dedicated PR, it will be easier for me to understand and review. Then, we can discuss moving to a single inbound queue separately.
As I mentioned in https://github.com/tokio-rs/tokio/pull/621#pullrequestreview-153481767, I've tested @stjepang's fix against my repro, and it appears to fix the problem on my Mac as well. :)
@hawkw were you able to repro the original issue using this code? https://github.com/facebookexperimental/mononoke/blob/06f619717f69cceeb856533bf57d79ead38c2597/cmds/blobimport.rs#L102
@StanislavGlebik are you still seeing the deadock? If so, we could use help reproducing it.
@carllerche No, but IIRC, I didn't spend as much time trying to repro with mononoke as I did trying to write a minimal repro --- I can make another attempt to repro with that code.
Also @jsgf, could you help repro the issue (if it exists) with mononoke?
@carllerche The only deadlocks we've seen lately is a lock-ordering deadlock between the backtrace crate and jemalloc heap profiling. That could have been our deadlock all along.
Ok, I'm going to close this issue since, as far as we know, there is no bug.
@sfackler if you find that you are still having problems and believe that it is due to the Tokio pool, please open a new issue with your findings and reference this one for context.
Most helpful comment
I have a potential fix for this issue. Doing some testing before I open a PR.