Logstash: [discuss] batch size maximisation vs batch read time minimisation

Created on 10 Nov 2017  路  9Comments  路  Source: elastic/logstash

In the PQ design stage we discussed in #5652 two different strategies for building batches:

  • The first was to minimize wait time and return as soon as possible with some data, potentially issuing a small batch in the case where outputs are faster then inputs
  • While the other was to maximize the batch size within the given timeout period. The timeout period would be used to its limit to maximize the batch size.

We decided to go with the wait time minimization with the side effect of building small batches.

We have two issues related to that here:

  1. The batch building semantics is currently different between the Persistent Queue (PQ) implementation and the Memory Queue (MQ) implementation. PQ uses the above strategy of wait time minimization while the MQ uses the batch size maximization, see LsQueueUtils and related fix in #8632.

  2. As discussed in #7243 this semantics difference between PQ and MQ result in very different behaviours in outputs plugins like the ES output, which uses the batch object to bulk send to its output. When using PQ and when outputs are faster than inputs, small batches are created which results in small bulk output and this is a non-optimal side effect.

I would like to suggest the following:

  • We should unify the batch building semantics between MQ and PQ.
  • I think I'd go with unifying toward batch size maximization, similar to what MQ does.
  • Using the batch size maximization semantics (and when properly implemented) one can then leverage the pipeline.batch.delay option to control batch creation latency. A low value would provide a behaviour similar to the minimize wait time semantic and a higher value would give more time to maxime size toward pipeline.batch.size.
  • A 5ms default for pipeline.batch.delay seems low, I don't have any hard evidence that it is but it feels arbitrary low and I'd look into increasing it to 50ms or 100ms if we want to tend toward batch size maximization
bug discuss persistent queues

Most helpful comment

Ok so to recap per above discussion and also in #8632, here's the revised proposal:

  • Keep the current MQ per-poll/drain resettable timeout of pipeline.batch.delay which favours batch size maximization instead of strictly respecting the timeout.
  • Update logstash.yml option description of pipeline.batch.delay to reflect the intended behaviour.
  • Modify PQ to use the same batch size maximization strategy as MQ.

@jakelandis @original-brownbear @jordansissel let me know what you think.

All 9 comments

I commented on https://github.com/elastic/logstash/pull/8632 in short too but this is the long form of what I think:

  • I wouldn't change the semantics of the in-memory queue and as you suggest here, move PQ to do what MQ does (the linked issue https://github.com/elastic/logstash/issues/7243 is an example of why the MQ semantics fit our use cases better).

    • The only gain of minimizing wait times is lower latency, in all other respects the approach is slower and more costly in terms of memory and CPU (more lock contention + GC)

  • I also benchmarked the hard wait time approach at one point when implementing the set of speedups I implemented for the MQ and it always came out behind the current approach
  • The current approach has been around for quite a while in 5.x + MQ is used much more widely than PQ => there is no point in messing up individual optimizations to the pipeline settings for the majority of users without any tangible gain

For the sake of the discussion I will bring my comments from #8632 here. It more appropriate to actually have the design discussion here then we can followup in #8632.

For others not aware of #8632, I am proposing that we change the behaviour of the batch creation to respect pipeline.batch.delay as a hard limit in creation timeout, as it was probably intended initially.

I agree this is a change in behaviour from 5.6 but keep in mind that the queue polling behaviour/strategy has also evolved/changed across version and what we have in 5.6 does not necessarily represent the "correct" behaviour. So I'd rather focus on the merits or not of the proposed change.

I get your concern about batch size maximisation and it relates to the discussion in #8633. My concern is about behaviour expectations with respect to what the pipeline.batch.delay actually mean. In the end we can also decide to change the pipeline.batch.delay doc/explanation to reflect what we think is the best behaviour it should have.

A few points to consider:

  • Side effects of these changes will be visible only for the "always empty queue" (or faster outputs than inputs) scenarios and this is the only case where batch creation latency wound be relevant depending on your use-case.
  • I tend to think it is easier to reason about pipeline.batch.delay as being a overall maximum delay to fill a batch. It is easy to understand if batch size creation latency is important for your use-case.
  • If we consider it as a maximum timeout to fill a batch as proposed here:

    • using a larger value will work toward batch size maximization.

    • using a smaller value will work for wait time minimization and batch creation latency control.

  • If we consider it as a per-read & resettable timeout in the batch read loop:

    • using a larger value will work toward batch size maximization but could result in delays multiple times the original pipeline.batch.delay value

    • using a smaller value will still work toward batch size maximization but with a somewhat lower but still undefined overall batch creation latency.

We can probably take 2 directions here:

  • find agreement on the more desirable strategy and then validate with benchmarks, especially for use cases such as #7243
  • create benchmarks anyways to help drive the decision.

IMO pipeline.batch.delay should be at best a hint, not a guarantee of timing. I don't think anyone changing this cares about the difference between 5ms and 100ms, they functionally want to achieve more batching or less. Arguably we should kill this setting in favor of something easier to understand.

However, If we want to take it literally as the hard timeout, then I think it should default to a much higher value, like 1s.

I am in favor of larger batches at the cost of a little bit of latency. No human is going to notice the difference of 1-2 seconds in the context of data arriving at the destination.

I just want to highlight that pipeline.batch.delay basically should have a mesurable impact in the situation of faster outputs than inputs or when the queue is in an always empty state (which should be a desirable healthy state)

@jakelandis

IMO pipeline.batch.delay should be at best a hint, not a guarantee of timing. don't think anyone changing this cares about the difference between 5ms and 100ms

That seems reasonable.

they functionally want to achieve more batching or less

We have 2 tuning knobs in that respect: pipeline.batch.delay and pipeline.batch.size. The former (per its definition) should controls the maximum latency of forming the batches (and again this is in the case of the workers having to wait for data because outputs are faster than inputs) and the latter should control the batch sizes. In that respect pipeline.batch.delay is actually a way to say "by increasing it I want to tend toward batch size maximization when the queue is in an always empty state".

However, If we want to take it literally as the hard timeout, then I think it should default to a much higher value, like 1s.

I am not sure about that, 1s would result in a weird delay for the getting-started or training situations where users use LS in an interactive way with stdin for example. I doubt there would be a significative change in behaviour by using say a 250ms vs 1s in real-life streaming use-cases vs providing a better feedback/latency in getting-started use-cases. I guess that's all testable and not critical for the sake of this discussion.

I am in favor of larger batches at the cost of a little bit of latency.

That seems fair. Should we let the users have better control over this?

@colinsurprenant @jakelandis the problem of defining a hard overall upper bound for waiting is that mathematically you get screwed on bursty large writes. Nagle in isolation suffers from this https://en.wikipedia.org/wiki/Nagle%27s_algorithm#Negative_effect_on_larger_writes and it lead to a ton of algorithms trying to fix the throughput problems arising from it (just Google TCP congestion issue).
Moving to a hard wait for up to batch size x will mathematically always lower throughput for slow producers on the path (input -> filter) (for perfectly constant throughput on the producer you get exactly the same behaviour here, the more bursty the producer the higher the hit you take).
Without a clear mathematical argument pro bounded spins up to size x and time t (which I really can't see looking at the theory of queue congestion) I don't see what we'd gain from this change. In general I don't see much value in chaging the semantics of this, in the end we currently have a solution that introduces barely any framework overhead relative to plugins and to many knowledge hasn't received negative feedback (again, note that ES output is our most important output and it heavily profits from large batch sizes). A change to this will come with various problems of low batch size or low throughput for bursty inputs that would need to be worked around.

Also just in a "meta" sense ... we have a solution that is now finally blazing fast (you can easily do 600k+ events/s generator -> stdout on a 3GHZ i7) and took quite a while to get there. No one complained about it. The problem is very hard mathematically and the correct solution heavily dependant on the semantics of your filters and outputs. Why take the risk of making a change to this when it's not backed up by any math or open issue?

@original-brownbear I don't think I see the relation to nagle to our context. An input burst (filling the queue) would immediately translate into producing full batches without delay. As for the 芦clear mathematical argument禄 I think the inverse is also true - we would have to figure the clear mathematical argument from every changes made to the queue behaviour to date, not sure what that would buy us.

Also, the goal here is not to reduce batch size maximization but to gain a clear understanding & semantic of the pipeline.batch.delay. As I mentioned above, I believe that batch size maximization would be preserved with a higher default pipeline.batch.delay.

(1) If we agree that optimizing for batch creation latency is not important for anyone then the current queue polling & batch creation strategy with regards to pipeline.batch.delay works. (but logstash.yml need updating for the option explanation).

(2) Do we agree that we also want to use that same strategy as (1) for PQ? It seems obvious that the answer should be yes. Anyone disagrees?

Ok so to recap per above discussion and also in #8632, here's the revised proposal:

  • Keep the current MQ per-poll/drain resettable timeout of pipeline.batch.delay which favours batch size maximization instead of strictly respecting the timeout.
  • Update logstash.yml option description of pipeline.batch.delay to reflect the intended behaviour.
  • Modify PQ to use the same batch size maximization strategy as MQ.

@jakelandis @original-brownbear @jordansissel let me know what you think.

@colinsurprenant 100% on board with it :)

closing in favor of #8674
thanks @original-brownbear @jakelandis

Was this page helpful?
0 / 5 - 0 ratings