Vector: Pass batch options to librdkafka

Created on 17 Aug 2020  路  2Comments  路  Source: timberio/vector

Current Vector Version

0.10.0

As kafka does its own batching we don't expose any batching config on the Vector side as that would be redundant. This is a usability challenge and we should allow passing batch configuration options into librdkafka.


Old issue

Use-cases

Currently, the Vector docs state that batching in the kafka sink is unsupported. However, this would be very useful in order to achieve the highest throughput when dealing with a high amount of data:

  1. the larger the batch the higher likelihood of a higher compression ratio
  2. amortizes the messaging overhead and eliminates the adverse effect of the round trip time

This is greatly explained here:
https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md#performance

For example, Kafka-to-Kafka use-case:
https://vector.dev/guides/integrate/sources/kafka/kafka/
If Vector supported Kafka batching it'd be a really great alternative to Kafka MirrorMaker, Replicator, etc in this Kafka-to-Kafka use-case.

kafka enhancement

Most helpful comment

Sorry for the confusion here. The kafka sink does utilize all of the standard librdkafka batching functionality, for all of the reasons you described. The docs are worded imprecisely and we will fix that.

The intended message is that the kafka sink does not expose the standard batch.* configuration options because we do not do our own independent batching ahead of librdkafka, which would be redundant. This is a little bit of a usability wart and I think it could be a good idea for us to translate those options into their librdkafka equivalents and pass them down. But there are currently no functional limits on your ability to use batching with the kafka sink.

All 2 comments

So with support for librdkafka options: https://github.com/timberio/vector/issues/1821 and given default values for librdkafka https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md, a kafka sink will still not batch data?

Sorry for the confusion here. The kafka sink does utilize all of the standard librdkafka batching functionality, for all of the reasons you described. The docs are worded imprecisely and we will fix that.

The intended message is that the kafka sink does not expose the standard batch.* configuration options because we do not do our own independent batching ahead of librdkafka, which would be redundant. This is a little bit of a usability wart and I think it could be a good idea for us to translate those options into their librdkafka equivalents and pass them down. But there are currently no functional limits on your ability to use batching with the kafka sink.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kaarolch picture kaarolch  路  3Comments

jhgg picture jhgg  路  4Comments

a-rodin picture a-rodin  路  3Comments

lewisthompson picture lewisthompson  路  3Comments

Hoverbear picture Hoverbear  路  3Comments