Vector: 0.6.0 trace output when -qq is used

Created on 19 Dec 2019  路  11Comments  路  Source: timberio/vector

Just upgraded to 0.6.0, and we're now seeing:

TRACE tower_limit::rate::service: rate limit exceeded, disabling service

in our logs, despite using -qq.

1 - What is this error? (I've seen other issues similar to this, but they all include the sink that's causing it)
2 - Why are trace outputs being generated when -qq is specified?

Thanks

(should point out that this is an upgrade for 0.4.0...we skipped 0.5.0 because of https://github.com/timberio/vector/issues/1021 so this might not be new in 0.6.0)

networking observability should bug

All 11 comments

Hey @karlseguin, thanks for raising this. It does appear to be a bug within the http client that's causing this. It's fairly easy to reproduce so hopefully we'll have a fix for it soon.

Thanks

Does this at all relate to any rate limiting (elastic search and prometheus sinks)? I assume not because

1 - We have very few messages
2 - I assume dropped messages would be categorized as more than TRACE

Just want to make sure that this message doesn't mean logs aren't being dropped.

There is a limitation in tracing that causes us to expose this, related conversation https://github.com/timberio/vector/pull/1151#discussion_r342793691

cc @lukesteensen

@karlseguin but this does not mean logs are being dropped :) All this is saying is that our network stack is currently applying back pressure on the stream of logs. This can happen because the downstream service has a rate limit and we are hitting the rate limit, so requests will get backpressure applied and then will be submitted when there is more space. That means logs will be stored either in the memory buffer or on disk buffer whichever one you have configured. Hopefully, that explains it a bit better.

Yeah, this specific log line is, unfortunately, a special case right now. The goal was to make it obvious to users when their configured internal limits are being hit and leading to lower throughput.

I can definitely understand why this is confusing and potentially not desirable in all cases. I could see an argument for including this log line at the DEBUG level only (and potentially submitting a PR to make it that in tower as well, eliminating the special case).

Hey, was this problem resolved?
I have vector as a daemon service in kubernetes and my pods regularly restarts with this error.

vector version - 0.6.0
Should I upgrade it to next version?

last printed log

Feb 18 22:11:25.128 TRACE sink{name=s3_logs type=aws_s3}: tower_limit::rate::service: rate limit exceeded, disabling service thread 'tokio-runtime-worker-1' panicked at 'service not ready; poll_ready must be called first', /opt/rust/registry/src/github.com-1ecc6299db9ec823/tower-limit-0.1.1/src/rate/service.rs:107:35 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace. Feb 18 22:12:11.613 ERROR vector::topology: Unhandled error Feb 18 22:12:11.613 INFO vector: Shutting down.

This looks like a different issue to me

I'm seeing this in version 0.8.2, FWIW.

Apr 08 11:59:26.898 TRACE sink{name=data_stream type=aws_kinesis_streams}: tower_limit::rate::service: rate limit exceeded, disabling service

Hi @thethomp, sorry about that. As noted above this logline is a special case. It means that you're bumping into the configured request.rate_limit_num setting. But, for the aws_kinesis_streams sink this shouldn't be raised since it aligns with AWS' rate limits. You should consider increasing your stream shard count if this is happening often. Otherwise, a spike here and there (depending on your use case), should be fine.

I'd like to also note that #1858 attempts to replace these static limits with dynamic ones, so you won't have to tweak any of these settings.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

leebenson picture leebenson  路  3Comments

binarylogic picture binarylogic  路  3Comments

LucioFranco picture LucioFranco  路  3Comments

a-rodin picture a-rodin  路  3Comments

valyala picture valyala  路  3Comments