Throttling logs in KafkaTransport does not work even if the server status become THROTTLED.
As result, KafkaTransport continues to write messages to disk, and disk usage is increasing.
KafkaTransport throttles input data when disk journal is grater than 100% (or lb_throttle_threshold_percentage). Then KafkaTransport pause to subscribe messages until the server status become ALIVE or buffer gets free spaces.
Throttling does not work. KafkaTransport continues to write to journal even if utilization of disk journal is grater than 100%.
Throttle state in ThrottleableTransport is changed by updateThrottleState, but the method does not invoked from anywhere when a node become throttled. The reason is nobody post ThrottleState to EventBus.
I expect there are two solutions
ThrottleState to EvenBus.Lifecycle.Kafka is a one of solution to construct an at-least-once data delivery systems, because the Kafka garantee that. Therefore, KafkaTransport should garantee at-least-once transport from Kafka to Elasticsearch by observing throttled state.
We are experiencing the same issue with graylog version 2.3.2 and kafka 10.2.1
same; without throttling, when having multiple sources into a Kafka cluster; the Graylog journals can be easily over run due to volume, rather than having the input throttled and then leaving the messages in the kafka queue when journals near max. Kind of defeats the purpose of using Kafka. I would suggest this needs to be corrected prior to version 3.0 ; e.g 2.4.x maybe?
I also have this issue with Raw/Plaintext AMQP inputs using Graylog 2.4.4-1 Docker image. The option to "Allow throttling this input" is set, but doesn't seem to actually work. It just drains the queue and fills up the disk journal. I have plenty of space on the RabbitMQ nodes that I was hoping would reduce the need for a large disk journal in Graylog. Am I missing something? Is this expected behavior?
@gizmonicus AMQP transport inherits ThrottleableTransport, too. Your problem might be cased by same bug.
@ueokande have you found any work arounds? I hadn't planned on using the Graylog disk journal for queueing when I have RabbitMQ for that but I suppose one work around would be to add more disk space for the journal. It's not totally pointless since the RabbitMQ nodes still can queue messages when GL is down, it's just not ideal.
Any updates or plans for this? If you need any input to track down the issue please contact me. I am glad to help.
Thank you for the reports. This seems to be broken since https://github.com/Graylog2/graylog2-server/pull/1948.
We will work on a fix and let you know once it's done.
This will be fixed in Graylog 3.0 and the next 2.4 stable release.
Most helpful comment
This will be fixed in Graylog 3.0 and the next 2.4 stable release.