Hyper: AWS API errors, triggering assertion failure, killing reactor

Created on 11 Jan 2018  路  7Comments  路  Source: hyperium/hyper

I am writing a syslog -> AWS kinesis bridge using Rusoto and its hyper-0.11-compatible branch. When I spawn many kinesis requests (we're talking on the order of 50-100), I eventually hit a situation where connections are dropped. This is fine (ish), but what's worse is the hyper tokio reactor gets in to bad state:

thread '<unnamed>' panicked at 'assertion failed: !self.can_read_head() && !self.can_read_body()', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.11.11/src/proto/conn.rs:262:9
note: Run with `RUST_BACKTRACE=1` for a backtrace.
thread '<unnamed>' panicked at 'failed to retrieve response from reactor: Canceled', libcore/result.rs:916:5
### And a bit later
thread '<unnamed>' panicked at 'failed to send request to reactor: send failed because receiver is gone', /root/.cargo/git/checkouts/rusoto-async-35a81945dd339d5f/64777cd/rusoto/core/src/reactor.rs:151:13

The entire process gets horked.

You can see my bridge project here: https://github.com/tureus/log-sloth/ .

S-bug

Most helpful comment

Well, it did seem like there was a problem with the connection being closed prematurely, but hyper shouldn't panic because of it.

I think master fixed the panic, so I'm going to close for now.

All 7 comments

Yikes, I'll take a look!

Adding a bit more color here, the second and third panic (failed to retrieve response from reactor: Canceled as well as failed to send request to reactor: send failed because receiver is gone) are caused by code in rusoto.

However these panics seem to be caused by hyper itself panicking (due to the assertion failure) and taking down a thread (which holds the channels that the two other panics refer to).

The panic in hyper seems to be due to some issue with the connection pooling, but I'll let @seanmonstar sort that one out :crossed_fingers:

@srijs thanks for chiming in! That is correct, the first one is a panic from the reactor, the rest are from calls to a dead reactor.

I have extract my code and made a single-shot stress tester. It spawns kinesis writer threads and try to send data as quicker as possible.

You will need to set up your AWS credentials, either through ENV variables or through the ~/.aws/credentials file. Then you will need a Kinesis topic with writeable shards (I have 80 at time of writing).

The code is here: https://github.com/tureus/kinesis-hyper-bug

root@doit-1800981089-sz6zt:~/kinesis-hyper-bug# cargo build --release && RUST_LOG=info ./target/release/kinesis-hyper-bug 100 1000 500
    Finished release [optimized] target(s) in 0.0 secs
INFO:<unknown>: testing kinesis put_records num_threads=100 num_puts=1000 puts_size=500 stream_name=itsecmon-logs
ERROR:<unknown>: failed to send to kinesis: "end of file reached before parsing could complete"
ERROR:<unknown>: failed to send to kinesis: "end of file reached before parsing could complete"
thread '<unnamed>' panicked at 'assertion failed: !self.can_read_head() && !self.can_read_body()', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.11.12/src/proto/conn.rs:262:9
note: Run with `RUST_BACKTRACE=1` for a backtrace.
ERROR:<unknown>: failed to send to kinesis: "connection reset"
thread '<unnamed>' panicked at 'failed to retrieve response from reactor: Canceled', libcore/result.rs:916:5
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: ErrorImpl { code: EofWhileParsingValue, line: 1, column: 0 }', libcore/result.rs:916:5
thread 'thread '<unnamed><unnamed>' panicked at '' panicked at 'failed to retrieve response from reactor: Canceledfailed to retrieve response from reactor: Canceled', ', libcore/result.rslibcore/result.rs::916916::55

I believe the latest commit to master properly handles the incorrect state that this assertion is catching. Would you it be possible to test with master?

@seanmonstar unfortunately for this bug report, I have fixed my environment. I did more debugging and ruled out hyper/rusoto as the source of my slowness/errors by using other tools (I should have done that earlier... oops!).

My AWS account was using a VPN connection to a corporate network. This VPN, or something along the path to AWS services, was messing up my connections. I have moved to a direct connect setup and things are stable and I can no longer reproduce this bug.

I think this bug is now safe to close because I cannot reproduce it.

Well, it did seem like there was a problem with the connection being closed prematurely, but hyper shouldn't panic because of it.

I think master fixed the panic, so I'm going to close for now.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

seanmonstar picture seanmonstar  路  3Comments

Firstyear picture Firstyear  路  4Comments

da2018 picture da2018  路  3Comments

nate-onesignal picture nate-onesignal  路  3Comments

Visic picture Visic  路  4Comments