In the PQ design stage we discussed in #5652 two different strategies for building batches:
We decided to go with the wait time minimization with the side effect of building small batches.
We have two issues related to that here:
The batch building semantics is currently different between the Persistent Queue (PQ) implementation and the Memory Queue (MQ) implementation. PQ uses the above strategy of wait time minimization while the MQ uses the batch size maximization, see LsQueueUtils and related fix in #8632.
As discussed in #7243 this semantics difference between PQ and MQ result in very different behaviours in outputs plugins like the ES output, which uses the batch object to bulk send to its output. When using PQ and when outputs are faster than inputs, small batches are created which results in small bulk output and this is a non-optimal side effect.
I would like to suggest the following:
pipeline.batch.delay option to control batch creation latency. A low value would provide a behaviour similar to the minimize wait time semantic and a higher value would give more time to maxime size toward pipeline.batch.size.pipeline.batch.delay seems low, I don't have any hard evidence that it is but it feels arbitrary low and I'd look into increasing it to 50ms or 100ms if we want to tend toward batch size maximizationI commented on https://github.com/elastic/logstash/pull/8632 in short too but this is the long form of what I think:
5.x + MQ is used much more widely than PQ => there is no point in messing up individual optimizations to the pipeline settings for the majority of users without any tangible gainFor the sake of the discussion I will bring my comments from #8632 here. It more appropriate to actually have the design discussion here then we can followup in #8632.
For others not aware of #8632, I am proposing that we change the behaviour of the batch creation to respect pipeline.batch.delay as a hard limit in creation timeout, as it was probably intended initially.
I agree this is a change in behaviour from 5.6 but keep in mind that the queue polling behaviour/strategy has also evolved/changed across version and what we have in 5.6 does not necessarily represent the "correct" behaviour. So I'd rather focus on the merits or not of the proposed change.
I get your concern about batch size maximisation and it relates to the discussion in #8633. My concern is about behaviour expectations with respect to what the pipeline.batch.delay actually mean. In the end we can also decide to change the pipeline.batch.delay doc/explanation to reflect what we think is the best behaviour it should have.
A few points to consider:
pipeline.batch.delay as being a overall maximum delay to fill a batch. It is easy to understand if batch size creation latency is important for your use-case.pipeline.batch.delay valueWe can probably take 2 directions here:
IMO pipeline.batch.delay should be at best a hint, not a guarantee of timing. I don't think anyone changing this cares about the difference between 5ms and 100ms, they functionally want to achieve more batching or less. Arguably we should kill this setting in favor of something easier to understand.
However, If we want to take it literally as the hard timeout, then I think it should default to a much higher value, like 1s.
I am in favor of larger batches at the cost of a little bit of latency. No human is going to notice the difference of 1-2 seconds in the context of data arriving at the destination.
I just want to highlight that pipeline.batch.delay basically should have a mesurable impact in the situation of faster outputs than inputs or when the queue is in an always empty state (which should be a desirable healthy state)
@jakelandis
IMO pipeline.batch.delay should be at best a hint, not a guarantee of timing. don't think anyone changing this cares about the difference between 5ms and 100ms
That seems reasonable.
they functionally want to achieve more batching or less
We have 2 tuning knobs in that respect: pipeline.batch.delay and pipeline.batch.size. The former (per its definition) should controls the maximum latency of forming the batches (and again this is in the case of the workers having to wait for data because outputs are faster than inputs) and the latter should control the batch sizes. In that respect pipeline.batch.delay is actually a way to say "by increasing it I want to tend toward batch size maximization when the queue is in an always empty state".
However, If we want to take it literally as the hard timeout, then I think it should default to a much higher value, like 1s.
I am not sure about that, 1s would result in a weird delay for the getting-started or training situations where users use LS in an interactive way with stdin for example. I doubt there would be a significative change in behaviour by using say a 250ms vs 1s in real-life streaming use-cases vs providing a better feedback/latency in getting-started use-cases. I guess that's all testable and not critical for the sake of this discussion.
I am in favor of larger batches at the cost of a little bit of latency.
That seems fair. Should we let the users have better control over this?
@colinsurprenant @jakelandis the problem of defining a hard overall upper bound for waiting is that mathematically you get screwed on bursty large writes. Nagle in isolation suffers from this https://en.wikipedia.org/wiki/Nagle%27s_algorithm#Negative_effect_on_larger_writes and it lead to a ton of algorithms trying to fix the throughput problems arising from it (just Google TCP congestion issue).
Moving to a hard wait for up to batch size x will mathematically always lower throughput for slow producers on the path (input -> filter) (for perfectly constant throughput on the producer you get exactly the same behaviour here, the more bursty the producer the higher the hit you take).
Without a clear mathematical argument pro bounded spins up to size x and time t (which I really can't see looking at the theory of queue congestion) I don't see what we'd gain from this change. In general I don't see much value in chaging the semantics of this, in the end we currently have a solution that introduces barely any framework overhead relative to plugins and to many knowledge hasn't received negative feedback (again, note that ES output is our most important output and it heavily profits from large batch sizes). A change to this will come with various problems of low batch size or low throughput for bursty inputs that would need to be worked around.
Also just in a "meta" sense ... we have a solution that is now finally blazing fast (you can easily do 600k+ events/s generator -> stdout on a 3GHZ i7) and took quite a while to get there. No one complained about it. The problem is very hard mathematically and the correct solution heavily dependant on the semantics of your filters and outputs. Why take the risk of making a change to this when it's not backed up by any math or open issue?
@original-brownbear I don't think I see the relation to nagle to our context. An input burst (filling the queue) would immediately translate into producing full batches without delay. As for the 芦clear mathematical argument禄 I think the inverse is also true - we would have to figure the clear mathematical argument from every changes made to the queue behaviour to date, not sure what that would buy us.
Also, the goal here is not to reduce batch size maximization but to gain a clear understanding & semantic of the pipeline.batch.delay. As I mentioned above, I believe that batch size maximization would be preserved with a higher default pipeline.batch.delay.
(1) If we agree that optimizing for batch creation latency is not important for anyone then the current queue polling & batch creation strategy with regards to pipeline.batch.delay works. (but logstash.yml need updating for the option explanation).
(2) Do we agree that we also want to use that same strategy as (1) for PQ? It seems obvious that the answer should be yes. Anyone disagrees?
Ok so to recap per above discussion and also in #8632, here's the revised proposal:
pipeline.batch.delay which favours batch size maximization instead of strictly respecting the timeout.logstash.yml option description of pipeline.batch.delay to reflect the intended behaviour.@jakelandis @original-brownbear @jordansissel let me know what you think.
@colinsurprenant 100% on board with it :)
closing in favor of #8674
thanks @original-brownbear @jakelandis
Most helpful comment
Ok so to recap per above discussion and also in #8632, here's the revised proposal:
pipeline.batch.delaywhich favours batch size maximization instead of strictly respecting the timeout.logstash.ymloption description ofpipeline.batch.delayto reflect the intended behaviour.@jakelandis @original-brownbear @jordansissel let me know what you think.