Kafka-node: What is difference btw Highlevel producer and producer ?

Created on 21 Nov 2015  路  13Comments  路  Source: SOHU-Co/kafka-node

question

Most helpful comment

A HighLevelProducer writes to all available partitions in a topic on a round robin basis. E.g. for

Topic: name: myTopic1, number of partitions: 20

a HighLevelProducer will first create a message for partition 0, then 1, then 2, then 3 ... then 18, then 19, then 0, then 1 ... and so on.
This leads to an even distribution of the messages across partitions.

A "normal" producer sends a message to a specified partition. You can specify it in the payload:

{
   topic: 'topicName',
   messages: ['message body'],// multi messages should be a array, single message can be just a string or a KeyedMessage instance 
   partition: 1, //default 0 
   attributes: 2, // default: 0 
}

If you omit the partition key for a message, it will default to partition 0. If you have only one partition, then the Producer and HighLevelProducer are doing the same thing.

Pretty much the same is true for HighLevelConsumers vs normal Consumers. A HighLevelConsumer will receive messages from all available partitions, whereas the normal Consumer only receives messages from the specified partition when adding the topic, or by default just from partition 0.

All 13 comments

Anybody know the answer to this? Same with Consumer vs. HighLevelConsumer. In one setup I'm working with, Producer works fine, but Consumer does not. I had to use HighLevelConsumer.

Would be great if the difference was documented in the README.

If anybody is wondering, while doing some trial and error, there's at least one difference I can surface between Consumer and HighLevelConsumer.

While Consumer allows you to set an offset, HighLevelConsumer ignores it. To use an offset with Consumer, you must have fromOffset set to true.

In addition, when you retrieve the offset, make sure you subtract by one if you want to start listening for new messages from the moment you create your consumer on.

A HighLevelProducer writes to all available partitions in a topic on a round robin basis. E.g. for

Topic: name: myTopic1, number of partitions: 20

a HighLevelProducer will first create a message for partition 0, then 1, then 2, then 3 ... then 18, then 19, then 0, then 1 ... and so on.
This leads to an even distribution of the messages across partitions.

A "normal" producer sends a message to a specified partition. You can specify it in the payload:

{
   topic: 'topicName',
   messages: ['message body'],// multi messages should be a array, single message can be just a string or a KeyedMessage instance 
   partition: 1, //default 0 
   attributes: 2, // default: 0 
}

If you omit the partition key for a message, it will default to partition 0. If you have only one partition, then the Producer and HighLevelProducer are doing the same thing.

Pretty much the same is true for HighLevelConsumers vs normal Consumers. A HighLevelConsumer will receive messages from all available partitions, whereas the normal Consumer only receives messages from the specified partition when adding the topic, or by default just from partition 0.

When I diff the code for the high level producer and the producer they're almost identical. The only significant difference appears to be the default partitioner type, which is configurable on the both anyway. Am I missing something?

@cressie176 you are correct. There's duplicated code for a small difference. It can be cleaned up. PRs are welcome!

Thanks for the quick response @hyperlink. Would consider submitting a PR, but unless I've misread there's no reliable way throttle incoming messages (been reading some of the pause() issues). This is likely to be important for us, so I'm going to investigate alternative clients.

We throttle outside of the module using async.queue. We pause() in our message handler and push the message to the queue. Once the queue is drained we call resume().

It has worked well so far. Good luck!

so would it be correct to say that high-level versions should only be used when partitions are being used to "randomly" split the load across partitions for load-balancing when the order in which messages are processed __does not__ matter?

@tony-kerz that sounds reasonable.

@hyperlink , @peterjuras
How do I specify number of partitions while creating Topics through 'HighLevelProducer' ? , I see that we could specify it in normal 'Producer' createTopic method but not in createTopics method of ''HighLevelProducer'

I think you have to statically specify the number of partitions in the kafka config of each node and can't do that dynamically on the creation of a new topic. I might be wrong though.

Also, dynamic creation of topics might not always be what you want (see e.g. https://stackoverflow.com/questions/43563977/can-a-kafka-producer-create-topics-and-partitions/43625219)

@peterjuras
That makes sense. Thank you.

@MUI-Pop not published yet but #958 added the ability to create topics using kafka's admin protocol and gives you control over number of replica and partitions.

Was this page helpful?
0 / 5 - 0 ratings