Just upgraded to 0.6.0, and we're now seeing:
TRACE tower_limit::rate::service: rate limit exceeded, disabling service
in our logs, despite using -qq.
1 - What is this error? (I've seen other issues similar to this, but they all include the sink that's causing it)
2 - Why are trace outputs being generated when -qq is specified?
Thanks
(should point out that this is an upgrade for 0.4.0...we skipped 0.5.0 because of https://github.com/timberio/vector/issues/1021 so this might not be new in 0.6.0)
Hey @karlseguin, thanks for raising this. It does appear to be a bug within the http client that's causing this. It's fairly easy to reproduce so hopefully we'll have a fix for it soon.
Thanks
Does this at all relate to any rate limiting (elastic search and prometheus sinks)? I assume not because
1 - We have very few messages
2 - I assume dropped messages would be categorized as more than TRACE
Just want to make sure that this message doesn't mean logs aren't being dropped.
There is a limitation in tracing that causes us to expose this, related conversation https://github.com/timberio/vector/pull/1151#discussion_r342793691
cc @lukesteensen
@karlseguin but this does not mean logs are being dropped :) All this is saying is that our network stack is currently applying back pressure on the stream of logs. This can happen because the downstream service has a rate limit and we are hitting the rate limit, so requests will get backpressure applied and then will be submitted when there is more space. That means logs will be stored either in the memory buffer or on disk buffer whichever one you have configured. Hopefully, that explains it a bit better.
Yeah, this specific log line is, unfortunately, a special case right now. The goal was to make it obvious to users when their configured internal limits are being hit and leading to lower throughput.
I can definitely understand why this is confusing and potentially not desirable in all cases. I could see an argument for including this log line at the DEBUG level only (and potentially submitting a PR to make it that in tower as well, eliminating the special case).
Hey, was this problem resolved?
I have vector as a daemon service in kubernetes and my pods regularly restarts with this error.
vector version - 0.6.0
Should I upgrade it to next version?
last printed log
Feb 18 22:11:25.128 TRACE sink{name=s3_logs type=aws_s3}: tower_limit::rate::service: rate limit exceeded, disabling service
thread 'tokio-runtime-worker-1' panicked at 'service not ready; poll_ready must be called first', /opt/rust/registry/src/github.com-1ecc6299db9ec823/tower-limit-0.1.1/src/rate/service.rs:107:35
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
Feb 18 22:12:11.613 ERROR vector::topology: Unhandled error
Feb 18 22:12:11.613 INFO vector: Shutting down.
This looks like a different issue to me
Ok, created new issue https://github.com/timberio/vector/issues/1850
I'm seeing this in version 0.8.2, FWIW.
Apr 08 11:59:26.898 TRACE sink{name=data_stream type=aws_kinesis_streams}: tower_limit::rate::service: rate limit exceeded, disabling service
Hi @thethomp, sorry about that. As noted above this logline is a special case. It means that you're bumping into the configured request.rate_limit_num setting. But, for the aws_kinesis_streams sink this shouldn't be raised since it aligns with AWS' rate limits. You should consider increasing your stream shard count if this is happening often. Otherwise, a spike here and there (depending on your use case), should be fine.
I'd like to also note that #1858 attempts to replace these static limits with dynamic ones, so you won't have to tweak any of these settings.