0.10.0
As kafka does its own batching we don't expose any batching config on the Vector side as that would be redundant. This is a usability challenge and we should allow passing batch configuration options into librdkafka.
Old issue
Currently, the Vector docs state that batching in the kafka sink is unsupported. However, this would be very useful in order to achieve the highest throughput when dealing with a high amount of data:
This is greatly explained here:
https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md#performance
For example, Kafka-to-Kafka use-case:
https://vector.dev/guides/integrate/sources/kafka/kafka/
If Vector supported Kafka batching it'd be a really great alternative to Kafka MirrorMaker, Replicator, etc in this Kafka-to-Kafka use-case.
So with support for librdkafka options: https://github.com/timberio/vector/issues/1821 and given default values for librdkafka https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md, a kafka sink will still not batch data?
Sorry for the confusion here. The kafka sink does utilize all of the standard librdkafka batching functionality, for all of the reasons you described. The docs are worded imprecisely and we will fix that.
The intended message is that the kafka sink does not expose the standard batch.* configuration options because we do not do our own independent batching ahead of librdkafka, which would be redundant. This is a little bit of a usability wart and I think it could be a good idea for us to translate those options into their librdkafka equivalents and pass them down. But there are currently no functional limits on your ability to use batching with the kafka sink.
Most helpful comment
Sorry for the confusion here. The
kafkasink does utilize all of the standardlibrdkafkabatching functionality, for all of the reasons you described. The docs are worded imprecisely and we will fix that.The intended message is that the
kafkasink does not expose the standardbatch.*configuration options because we do not do our own independent batching ahead oflibrdkafka, which would be redundant. This is a little bit of a usability wart and I think it could be a good idea for us to translate those options into theirlibrdkafkaequivalents and pass them down. But there are currently no functional limits on your ability to use batching with thekafkasink.