Hyper: AWS API errors, triggering assertion failure, killing reactor

Created on 11 Jan 2018 · 7Comments · Source: hyperium/hyper

I am writing a syslog -> AWS kinesis bridge using Rusoto and its hyper-0.11-compatible branch. When I spawn many kinesis requests (we're talking on the order of 50-100), I eventually hit a situation where connections are dropped. This is fine (ish), but what's worse is the hyper tokio reactor gets in to bad state:

thread '<unnamed>' panicked at 'assertion failed: !self.can_read_head() && !self.can_read_body()', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.11.11/src/proto/conn.rs:262:9
note: Run with `RUST_BACKTRACE=1` for a backtrace.
thread '<unnamed>' panicked at 'failed to retrieve response from reactor: Canceled', libcore/result.rs:916:5
### And a bit later
thread '<unnamed>' panicked at 'failed to send request to reactor: send failed because receiver is gone', /root/.cargo/git/checkouts/rusoto-async-35a81945dd339d5f/64777cd/rusoto/core/src/reactor.rs:151:13

The entire process gets horked.

You can see my bridge project here: https://github.com/tureus/log-sloth/ .

S-bug

Source

xrl

Most helpful comment

Well, it did seem like there was a problem with the connection being closed prematurely, but hyper shouldn't panic because of it.

I think master fixed the panic, so I'm going to close for now.

seanmonstar on 12 Jan 2018

👍2

All 7 comments

Yikes, I'll take a look!

seanmonstar on 11 Jan 2018

Adding a bit more color here, the second and third panic (failed to retrieve response from reactor: Canceled as well as failed to send request to reactor: send failed because receiver is gone) are caused by code in rusoto.

However these panics seem to be caused by hyper itself panicking (due to the assertion failure) and taking down a thread (which holds the channels that the two other panics refer to).

The panic in hyper seems to be due to some issue with the connection pooling, but I'll let @seanmonstar sort that one out :crossed_fingers:

srijs on 11 Jan 2018

@srijs thanks for chiming in! That is correct, the first one is a panic from the reactor, the rest are from calls to a dead reactor.

I have extract my code and made a single-shot stress tester. It spawns kinesis writer threads and try to send data as quicker as possible.

You will need to set up your AWS credentials, either through ENV variables or through the ~/.aws/credentials file. Then you will need a Kinesis topic with writeable shards (I have 80 at time of writing).

The code is here: https://github.com/tureus/kinesis-hyper-bug

root@doit-1800981089-sz6zt:~/kinesis-hyper-bug# cargo build --release && RUST_LOG=info ./target/release/kinesis-hyper-bug 100 1000 500
    Finished release [optimized] target(s) in 0.0 secs
INFO:<unknown>: testing kinesis put_records num_threads=100 num_puts=1000 puts_size=500 stream_name=itsecmon-logs
ERROR:<unknown>: failed to send to kinesis: "end of file reached before parsing could complete"
ERROR:<unknown>: failed to send to kinesis: "end of file reached before parsing could complete"
thread '<unnamed>' panicked at 'assertion failed: !self.can_read_head() && !self.can_read_body()', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.11.12/src/proto/conn.rs:262:9
note: Run with `RUST_BACKTRACE=1` for a backtrace.
ERROR:<unknown>: failed to send to kinesis: "connection reset"
thread '<unnamed>' panicked at 'failed to retrieve response from reactor: Canceled', libcore/result.rs:916:5
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: ErrorImpl { code: EofWhileParsingValue, line: 1, column: 0 }', libcore/result.rs:916:5
thread 'thread '<unnamed><unnamed>' panicked at '' panicked at 'failed to retrieve response from reactor: Canceledfailed to retrieve response from reactor: Canceled', ', libcore/result.rslibcore/result.rs::916916::55

xrl on 11 Jan 2018

I believe the latest commit to master properly handles the incorrect state that this assertion is catching. Would you it be possible to test with master?

seanmonstar on 11 Jan 2018

@seanmonstar unfortunately for this bug report, I have fixed my environment. I did more debugging and ruled out hyper/rusoto as the source of my slowness/errors by using other tools (I should have done that earlier... oops!).

My AWS account was using a VPN connection to a corporate network. This VPN, or something along the path to AWS services, was messing up my connections. I have moved to a direct connect setup and things are stable and I can no longer reproduce this bug.

xrl on 12 Jan 2018

I think this bug is now safe to close because I cannot reproduce it.

xrl on 12 Jan 2018

Well, it did seem like there was a problem with the connection being closed prematurely, but hyper shouldn't panic because of it.

I think master fixed the panic, so I'm going to close for now.

seanmonstar on 12 Jan 2018

👍2

Was this page helpful?

0 / 5 - 0 ratings