Graylog2-server: Throttling does not work in ThrottleableTransport

Created on 6 Nov 2017  路  8Comments  路  Source: Graylog2/graylog2-server

Throttling logs in KafkaTransport does not work even if the server status become THROTTLED.
As result, KafkaTransport continues to write messages to disk, and disk usage is increasing.

Expected Behavior

KafkaTransport throttles input data when disk journal is grater than 100% (or lb_throttle_threshold_percentage). Then KafkaTransport pause to subscribe messages until the server status become ALIVE or buffer gets free spaces.

Current Behavior

Throttling does not work. KafkaTransport continues to write to journal even if utilization of disk journal is grater than 100%.

Possible Solution

Throttle state in ThrottleableTransport is changed by updateThrottleState, but the method does not invoked from anywhere when a node become throttled. The reason is nobody post ThrottleState to EventBus.

I expect there are two solutions

  1. Post ThrottleState to EvenBus.
  2. ThrottleableTransport subscribe Lifecycle.

Steps to Reproduce (for bugs)

  1. Stop elasticsearch cluster
  2. Launch Raw/Plaintext Kafka
  3. Wait for disk utilization get to throttled

Context

Kafka is a one of solution to construct an at-least-once data delivery systems, because the Kafka garantee that. Therefore, KafkaTransport should garantee at-least-once transport from Kafka to Elasticsearch by observing throttled state.

Your Environment

  • Graylog Version: 2.3 (4824e52ed0cf5eae051f90a04e39fa33a96880b7) and master (f1ab02147fd1b5389a1ecf832bf2e683bde5389e)
  • Elasticsearch Version: 5.6.3
  • MongoDB Version: 2.6.10
  • Operating System: Ubuntu 16.04
  • Browser version:
bug triaged

Most helpful comment

This will be fixed in Graylog 3.0 and the next 2.4 stable release.

All 8 comments

We are experiencing the same issue with graylog version 2.3.2 and kafka 10.2.1

same; without throttling, when having multiple sources into a Kafka cluster; the Graylog journals can be easily over run due to volume, rather than having the input throttled and then leaving the messages in the kafka queue when journals near max. Kind of defeats the purpose of using Kafka. I would suggest this needs to be corrected prior to version 3.0 ; e.g 2.4.x maybe?

I also have this issue with Raw/Plaintext AMQP inputs using Graylog 2.4.4-1 Docker image. The option to "Allow throttling this input" is set, but doesn't seem to actually work. It just drains the queue and fills up the disk journal. I have plenty of space on the RabbitMQ nodes that I was hoping would reduce the need for a large disk journal in Graylog. Am I missing something? Is this expected behavior?

@gizmonicus AMQP transport inherits ThrottleableTransport, too. Your problem might be cased by same bug.

@ueokande have you found any work arounds? I hadn't planned on using the Graylog disk journal for queueing when I have RabbitMQ for that but I suppose one work around would be to add more disk space for the journal. It's not totally pointless since the RabbitMQ nodes still can queue messages when GL is down, it's just not ideal.

Any updates or plans for this? If you need any input to track down the issue please contact me. I am glad to help.

Thank you for the reports. This seems to be broken since https://github.com/Graylog2/graylog2-server/pull/1948.

We will work on a fix and let you know once it's done.

This will be fixed in Graylog 3.0 and the next 2.4 stable release.

Was this page helpful?
0 / 5 - 0 ratings