Vector: 0.6.0 trace output when -qq is used

Created on 19 Dec 2019 · 11Comments · Source: timberio/vector

Just upgraded to 0.6.0, and we're now seeing:

TRACE tower_limit::rate::service: rate limit exceeded, disabling service

in our logs, despite using -qq.

1 - What is this error? (I've seen other issues similar to this, but they all include the sink that's causing it)
2 - Why are trace outputs being generated when -qq is specified?

Thanks

(should point out that this is an upgrade for 0.4.0...we skipped 0.5.0 because of https://github.com/timberio/vector/issues/1021 so this might not be new in 0.6.0)

networking observability should bug

Source

karlseguin

All 11 comments

Hey @karlseguin, thanks for raising this. It does appear to be a bug within the http client that's causing this. It's fairly easy to reproduce so hopefully we'll have a fix for it soon.

Jeffail on 19 Dec 2019

Thanks

Does this at all relate to any rate limiting (elastic search and prometheus sinks)? I assume not because

1 - We have very few messages
2 - I assume dropped messages would be categorized as more than TRACE

Just want to make sure that this message doesn't mean logs aren't being dropped.

karlseguin on 19 Dec 2019

There is a limitation in tracing that causes us to expose this, related conversation https://github.com/timberio/vector/pull/1151#discussion_r342793691

cc @lukesteensen

LucioFranco on 19 Dec 2019

@karlseguin but this does not mean logs are being dropped :) All this is saying is that our network stack is currently applying back pressure on the stream of logs. This can happen because the downstream service has a rate limit and we are hitting the rate limit, so requests will get backpressure applied and then will be submitted when there is more space. That means logs will be stored either in the memory buffer or on disk buffer whichever one you have configured. Hopefully, that explains it a bit better.

LucioFranco on 19 Dec 2019

❤1

Yeah, this specific log line is, unfortunately, a special case right now. The goal was to make it obvious to users when their configured internal limits are being hit and leading to lower throughput.

I can definitely understand why this is confusing and potentially not desirable in all cases. I could see an argument for including this log line at the DEBUG level only (and potentially submitting a PR to make it that in tower as well, eliminating the special case).

lukesteensen on 19 Dec 2019

👍1

Hey, was this problem resolved?
I have vector as a daemon service in kubernetes and my pods regularly restarts with this error.

vector version - 0.6.0
Should I upgrade it to next version?

last printed log

Feb 18 22:11:25.128 TRACE sink{name=s3_logs type=aws_s3}: tower_limit::rate::service: rate limit exceeded, disabling service thread 'tokio-runtime-worker-1' panicked at 'service not ready; poll_ready must be called first', /opt/rust/registry/src/github.com-1ecc6299db9ec823/tower-limit-0.1.1/src/rate/service.rs:107:35 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace. Feb 18 22:12:11.613 ERROR vector::topology: Unhandled error Feb 18 22:12:11.613 INFO vector: Shutting down.

mikhno-s on 18 Feb 2020

This looks like a different issue to me

karlseguin on 19 Feb 2020

Ok, created new issue https://github.com/timberio/vector/issues/1850

mikhno-s on 19 Feb 2020

I'm seeing this in version 0.8.2, FWIW.

Apr 08 11:59:26.898 TRACE sink{name=data_stream type=aws_kinesis_streams}: tower_limit::rate::service: rate limit exceeded, disabling service

thethomp on 8 Apr 2020

Hi @thethomp, sorry about that. As noted above this logline is a special case. It means that you're bumping into the configured request.rate_limit_num setting. But, for the aws_kinesis_streams sink this shouldn't be raised since it aligns with AWS' rate limits. You should consider increasing your stream shard count if this is happening often. Otherwise, a spike here and there (depending on your use case), should be fine.