Tokio version 0.2.11 on x86_64 Linux 5.3 in debug.
I'm not actually sure if this is a bug, but I found it super confusing, and the docs on exactly how a future should be woken aren't super well-specified here.
I wrote a custom waker/Context which is passed to tokio::io::split::WriteHalf
st for the parts in tokio:
9: core::task::wake::Waker::wake
at /usr/src/rustc-1.40.0/src/libcore/task/wake.rs:252
10: tokio::sync::task::atomic_waker::AtomicWaker::wake
at /home/matt/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.11/src/sync/task/atomic_waker.rs:243
11: tokio::io::driver::Inner::register
at /home/matt/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.11/src/io/driver/mod.rs:292
12: tokio::io::registration::Registration::poll_ready
at /home/matt/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.11/src/io/registration.rs:228
13: tokio::io::registration::Registration::poll_write_ready
at /home/matt/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.11/src/io/registration.rs:193
14: tokio::io::poll_evented::PollEvented<E>::poll_write_ready
at /home/matt/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.11/src/io/poll_evented.rs:308
15: tokio::net::tcp::stream::TcpStream::poll_write_priv
at /home/matt/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.11/src/net/tcp/stream.rs:673
16: <tokio::net::tcp::stream::TcpStream as tokio::io::async_write::AsyncWrite>::poll_write
at /home/matt/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.11/src/net/tcp/stream.rs:836
17: <tokio::io::split::WriteHalf<T> as tokio::io::async_write::AsyncWrite>::poll_write
at /home/matt/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.11/src/io/split.rs:128
@TheBlueMatt I think what you are noticing is something that is in the process of being removed in #2221.
If the change goes through as is, then the first time poll_write is called it will not call wake_by_ref--instead just ensuring it's waker is registered and returning Ready(Ok(n)).
Does that answer your question?
@TheBlueMatt now that #2221 has landed, this _should_ no longer be happening. Can you give it a try on master?
Indeed, this seems to be fixed on master, thanks!
Hit this issue again on a release build (so sadly backtrace is...less than useful) on Tokio v0.2.13, though it does seem rather hard to hit (instead of reliably every time).
#1 0x0000ffffbc400ea0 in __GI___pthread_mutex_lock (mutex=0xffffb0002220) at pthread_mutex_lock.c:80
#2 0x0000aaaab4a757b8 in lightning_net_tokio::wake_socket_waker_by_ref ()
# Calls <tokio::io::split::WriteHalf<T> as tokio::io::async_write::AsyncWrite>::poll_write
#3 0x0000aaaab4a75b04 in <lightning_net_tokio::SocketDescriptor as lightning::ln::peer_handler::SocketDescriptor>::send_data ()
# async task spawned by tokio::spawn that wakes every 60 seconds
Note that in this case I don't actully know if it was going to return Ready() or some other value - the above lock is a deadlock.
Can you try to put this in your Cargo.toml to obtain a more helpful backtrace in case you hit it again?
[profile.release]
debug = true
Hmm, sadly that didn't do much for readability of the st, but here it is:
#5 lightning_net_tokio::wake_socket_waker_by_ref (orig_ptr=0xffff78000fe0) at /root/.cargo/git/checkouts/rust-lightning-61cf011b3b289b3b/d16c1a9/lightning-net-tokio/src/lib.rs:380
#6 0x0000aaaab34577d4 in core::task::wake::Waker::wake_by_ref (self=<optimized out>) at /usr/src/rustc-1.40.0/src/libcore/task/wake.rs:267
#7 tokio::io::split::Inner<T>::poll_lock (self=<optimized out>, cx=0xffff84ef6bf0) at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/io/split.rs:159
#8 <tokio::io::split::WriteHalf<T> as tokio::io::async_write::AsyncWrite>::poll_write (cx=0xffff84ef6bf0, buf=..., self=...) at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/io/split.rs:127
#9 <lightning_net_tokio::SocketDescriptor as lightning::ln::peer_handler::SocketDescriptor>::send_data (self=<optimized out>, data=..., resume_read=<optimized out>) at /root/.cargo/git/checkouts/rust-lightning-61cf011b3b289b3b/d16c1a9/lightning-net-tokio/src/lib.rs:459
Looking at the source, I have a hard time seeing how this can happen, and this has been inactive for a long time. I am closing it. Feel free to reopen if you can still reproduce it.
I've worked around this issue in my own stuff so it may definitely still be happening but I wont hit it anymore.