Tokio: How to handle Writable | Hup?

Created on 25 Jun 2018  路  25Comments  路  Source: tokio-rs/tokio

Related questions: https://github.com/tokio-rs/tokio-socks5/issues/13

When socket is Writable | Hup, https://github.com/tokio-rs/tokio/blob/master/tokio-reactor/src/poll_evented.rs#L520 poll_read_ready return Ok(Ready(Hup)), but self.get_ref().read(buf) return WouldBlock, how to handle with this situation?

any help?

C-bug

Most helpful comment

@stjepang It is always possible as Relaxed vs. Release impacts compiler optimizations. So, the code that gets run could be different.

All 25 comments

If this is true, then it is a bug. Receiving HUP should make the socket readable and the next read should return 0.

Hi, macOS version 10.13.6. Follow the steps below to reproduce the problem:

  1. add debug log to https://github.com/tokio-rs/tokio/blob/master/tokio-reactor/src/poll_evented.rs#L390
let x = self.poll_read_ready(mio::Ready::readable());
println!("poll read ready {:?}", x);
if let Async::NotReady = x? {
    return Err(io::ErrorKind::WouldBlock.into())
}

let r = self.get_mut().read(buf);
println!("poll read {:?}", r);
if is_wouldblock(&r) {
    self.clear_read_ready(mio::Ready::readable())?;
}
  1. open Mac Terminal, run cargo run --example proxy and cargo run --example echo

  2. run telnet 127.0.0.1 8081, input some data

  3. kill -9 [telnet process]

  4. proxy log:

poll read ready Ok(Ready(Readable | Hup))
poll read Ok(0)
poll read ready Ok(NotReady)

# ......
poll read ready Ok(Ready(Hup))
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup))
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup))
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup))
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup))
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup))
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup))
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup))
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Readable | Hup))
poll read Ok(0)
client wrote 6 bytes and received 6 bytes

Hi @SerhoLiu, can I get you to let me know which version of tokio-reactor you were using when you observed this behaviour?

I've been trying to reproduce this by following the steps you posted above (on macOS 10.12.6). I don't see the behaviour you described on master, but I do see it with tokio-reactor v0.1.2 (which I believe was the current version when you opened this issue) and v0.1.1.

It's possible this was fixed between versions?


Proxy logs from master

Listening on: 127.0.0.1:8081
Proxying to: 127.0.0.1:8080
poll read ready Ok(NotReady);
poll read ready Ok(NotReady);
poll read ready Ok(Ready(Readable));
poll read Ok(8)
poll read ready Ok(Ready(Readable));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(NotReady);
poll read ready Ok(NotReady);
poll read ready Ok(Ready(Readable));
poll read Ok(8)
poll read ready Ok(Ready(Readable));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Readable));
poll read Ok(9)
poll read ready Ok(Ready(Readable));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(NotReady);
poll read ready Ok(NotReady);
poll read ready Ok(Ready(Readable));
poll read Ok(9)
poll read ready Ok(Ready(Readable));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Readable | Hup));
poll read Ok(0)
poll read ready Ok(NotReady);
poll read ready Ok(Ready(Readable | Hup));
poll read Ok(0)
client wrote 17 bytes and received 17 bytes



Proxy logs from tokio-reactor v0.1.2

Listening on: 127.0.0.1:8081
Proxying to: 127.0.0.1:8080
poll read ready Ok(NotReady);
poll read ready Ok(NotReady);
poll read ready Ok(Ready(Readable));
poll read Ok(6)

// ...

poll read ready Ok(Ready(Readable | Hup));
poll read Ok(0)
poll read ready Ok(NotReady);
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Ok(0)
client wrote 19 bytes and received 19 bytes


Proxy logs from tokio-reactor v0.1.1

Listening on: 127.0.0.1:8081
Proxying to: 127.0.0.1:8080
poll read ready Ok(NotReady);
poll read ready Ok(NotReady);
poll read ready Ok(Ready(Readable));
poll read Ok(8)

// ...

poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Ok(0)
client wrote 21 bytes and received 21 bytes

It looks like the issue also exists in the current release version of tokio-reactor (v0.1.3), so we ought to cut a new release. @SerhoLiu, can I get you to confirm that the issue is fixed for you with the current master, in order to make sure that a new release would resolve your issues?


Proxy logs from tokio-reactor v0.1.3

Listening on: 127.0.0.1:8081
Proxying to: 127.0.0.1:8080
poll read ready Ok(NotReady);
poll read ready Ok(NotReady);
poll read ready Ok(Ready(Readable));
poll read Ok(5)
poll read ready Ok(Ready(Readable));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(NotReady);
poll read ready Ok(NotReady);
poll read ready Ok(Ready(Readable));
poll read Ok(5)
poll read ready Ok(Ready(Readable));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Readable));
poll read Ok(5)
poll read ready Ok(Ready(Readable));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(NotReady);
poll read ready Ok(NotReady);
poll read ready Ok(Ready(Readable));
poll read Ok(5)
poll read ready Ok(Ready(Readable));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Readable));
poll read Ok(5)
poll read ready Ok(Ready(Readable));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(NotReady);
poll read ready Ok(NotReady);
poll read ready Ok(Ready(Readable));
poll read Ok(5)
poll read ready Ok(Ready(Readable));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Readable | Hup));
poll read Ok(0)
poll read ready Ok(NotReady);
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Err(Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" })
poll read ready Ok(Ready(Hup));
poll read Ok(0)
client wrote 15 bytes and received 15 bytes

I did a git bisect and it appears that this issue was fixed in commit 989262fe6e30a0342feebf240e2e0b0d665b312c (and manually verified it again by checking that commit and the prior commit).

It looks like the only change in that commit that could actually effect behaviour was the version bump for crossbeam-deque, from 0.6.0 to 0.6.1 (the other change only effects CI). According to the crossbeam-deque changelog, the only change in that version is "Change a few Relaxed orderings to Release in order to fix false positives by tsan." (crossbeam-rs/crossbeam-deque@64b174b4db4069106062be34644753dd5fd4a8d2). Perhaps it's possible that the errors tsan reported were not actually false positives?

@stjepang, do you have any insight into this?

@hawkw Are you testing that on an x86-64 machine?

Assuming the answer is "yes", I'm surprised because Release fences compile down to noop and Release stores are equivalent to Relaxed stores. In other words, there should be absolutely no difference between crossbeam-deque v0.6.0 and v0.6.1 on x86-64 machines. 馃

(and manually verified it again by checking that commit and the prior commit)

Just double-checking: was the prior commit d91c775?

Also, I wonder whether the bug persists if we use the latest master branch, but change 0.6.1 to =0.6.0 in Cargo.toml (would try for myself, but I don't have a Mac).

@stjepang I changed the crossbeam-deque dependency to =0.6.0 on the latest commit to master, and the bug persists. So, it seems the crossbeam-deque change _did_ effect something --- I'm curious about what we'd see if we compared the assembly generated between those two commits.

Also, to answer your other questions: yes, I'm testing on x86_64 (Core i7 7920HQ); and yes, the prior commit I tested was d91c775f360d875afb36b56a812ff5f77940981a.

@hawkw I use https://github.com/tokio-rs/tokio/tree/e964c4136c91c535fb2dfee0a3327dc5a6599903 produce the problem https://github.com/tokio-rs/tokio/issues/449#issuecomment-407339147, this problem also exists on FreeBSD 11.2.

On my computer, current master also has this problem 馃槄. So I guess it might be related to the behavior of kqueue, maybe EV_EOF will only be processed when EVFILT_READ, like go and this. But kqueue man page say EV_EOF may also appear in EVFILT_WRITE, really confused.

@SerhoLiu interesting! Thanks for the additional information. I'll keep looking into it.

Tests must be written to catch this bug in the future.

@stjepang It is always possible as Relaxed vs. Release impacts compiler optimizations. So, the code that gets run could be different.

@SerhoLiu

On my computer, current master also has this problem 馃槄

I ran the reproduction against the current master a few more times, and it looks like (on my machine, at least) the issue does still exist, but it occurs intermittently. That would explain why git bisect was giving me such confusing results.

@stjepang, it seems like the crossbeam-deque dependency bump was a red herring, sorry to bother you about that.

I ran into this problem recently with tokio-0.1.16. We have two middle boxes that proxy traffic between each other. In this scenario, it is important for us to maintain half-closed state. However, futures are polled excessively when it is actually in half-closed state.

The key trigger logic is inside PollEvented2's write method.

Suppose someone has a simple future that only does "try_ready!(Socket.write(buf))". Then when this future starts to return NotReady, and the remote socket has shutdown the write channel, following will happen:

  1. calls poll_write_ready, and it returns Ready(HUP).
  2. treats this as a write ready event, and proceeds to TCP write
  3. TCP write returns EAGAIN since write is actually not ready
  4. check result is EAGAIN, and call clear_write_ready()
  5. in clear_write_ready() it calls poll_write_ready() again to test on its is_ready method
  6. it is ready (still returning Ready(HUP)), notify current task to be scheduled again.
  7. tokio threadpool worker will insert the task to local queue and soon repeats everything from step 1.

This will continue until a threshold (run local task for 32 times) is reached, so the worker will turn and call reactor(epoll). But even epoll won't break the loop unless the underlying write buffer has been cleared. HUP makes it work like a spinlock.

To fix this, we may need to distinguish RDHUP and HUP event, so that write is not affected by read shutdown on the other side. However, this also has something to do with mio, which currently does not distinguish the RDHUP and HUP (https://github.com/carllerche/mio/blob/master/src/sys/unix/epoll.rs#L231). I have no idea how to deal with this without breaking current users.

This becomes a serious problem for us. It easily freezes a multi-core machine with only a handful of connections. Wondering if we can discuss about the solution. I am more than happy to contribute to it, too.

Are you able to provide a failing test case?

yes. I attached a ready to build tarball and a README inside. Let me know if anything is unclear.
issue-91.tar.gz

You may find similar tarball in https://github.com/quininer/tokio-rustls/issues/33, which is quite related I believe. That one uses TLS, so it's more complicated. But the condition to trigger the problem is the same: client shuts down write side, and the server side PollEvented.write returns EAGAIN due to write pressure or packet loss.

@carllerche do you have any plan to look into this issue (again) recently? It's being a showstopper for us right now so I hope it can be dealt with as soon as possible. The root cause of the problem is pretty clear to us at the moment, would you like me to add more details here or in a new ticket, such as approaches that we have tried to fix the problem?

Could you open a PR w/ a failing test case illustrating the problem? I think I understand, but it would be most helpful to have this as a minimized test case that can be committed to the repo.

The problem is if a HUP is not treated as writable, then in some cases, there is no signal to the write tasks to shutdown.

One strategy would be to treat a HUP as a write event only one time (to avoid the loop).

Got it. I took "failing test case" as "some external program". I will create a PR for this. Thanks

This should be resolved by the upcoming release of mio: https://github.com/tokio-rs/mio/pull/942.

On my macOS version 10.14.4, update mio to v0.6.17 also has this problem. I can't write some minimized test case, but can follow the steps below to reproduce the problem:

  1. add some assert to mio kqueue
diff --git a/src/sys/unix/kqueue.rs b/src/sys/unix/kqueue.rs
index 371e0cf..7179d95 100644
--- a/src/sys/unix/kqueue.rs
+++ b/src/sys/unix/kqueue.rs
@@ -319,6 +319,13 @@ impl Events {
             }
         }

+        for event in &self.events {
+            if event.readiness().is_hup() {
+                assert!(event.readiness().is_writable(), "EV_EOF and EVFILT_WRITE");
+                assert!(event.readiness().is_readable(), "EV_EOF with EVFILT_WRITE not read ready");
+            }
+        }
+
         ret
     }

  1. open Mac Terminal, run cargo run --example proxy, cd tokio && cargo run --example echo

  2. run telnet 127.0.0.1 8081, input some data

  3. kill -9 [telnet process]

  4. proxy process log

thread 'tokio-runtime-worker-1' panicked at 'EV_EOF with EVFILT_WRITE not read ready', .../registry/src/github.com-1ecc6299db9ec823/mio-0.6.17/src/sys/unix/kqueue.rs:324:17
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

As I said, EV_EOF with EVFILT_WRITE does not indicate EVFILT_READ.

This should be resolved by the upcoming release of mio: tokio-rs/mio#942.

Yep. Braden and I have talked about it so epoll is good now.

@SerhoLiu Just because you don't get readable in the same call to poll doesn't mean the socket isn't readable. I'm not exactly sure what specific behavior you would like to see changed.

@carllerche ok, but https://github.com/tokio-rs/tokio/issues/449#issuecomment-407339147 problem still exists, maybe it鈥檚 not related to https://github.com/tokio-rs/mio/pull/942 ...

The latest release of mio fixes this on kqueue platforms.

Was this page helpful?
0 / 5 - 0 ratings