We have some producers on production that are causing high CPU usage (60% of cpu usage on the service). Our current throughput is about 200k messages/300MB per second
.
Confluent.Kafka Version="1.0.1.1"
librdkafka.redist Version="1.0.1"
var bootstrapServers = "x";
var clientConfig = new ClientConfig
{
BootstrapServers = bootstrapServers
};
var producerConfig = new ProducerConfig(clientConfig);
We are running on a Windows Machine (Windows Server 2016 Datacenter) with Dotnet Core 3.0.
We are using one of our abstractions of queue of our library https://github.com/takenet/elephant/blob/master/src/Take.Elephant.Kafka/KafkaSenderQueue.cs
And we are using for producing this EnqueueAsync method.
Please provide the following information:
that doesn't seem outrageous to me at first glance. you might like to try the Produce method which is significantly more performant as it avoids the overhead of tasks. it may be significant that the high CPU relates to StartPollTask, because it handles the completion of the tasks (or in the produce case, the execution of the delivery callback). In the case of tasks i'm 95% sure it should be offloading execution of application logic to threadpool threads though in all cases (I'd need to read the code again).
@mhowlett I'm trying to understand the code for that method. I'm guessing that the native Poll function has some externally visible behavior - and from what I managed to read that seems to be the case - otherwise this StartPollTask method would be apparently useless (since the only thing it seems to do is increment a private instance field which doesn't seem to be used anywhere else -- unless someone is using reflection (?!)). I still have a couple questions though.
Does the native Poll function blocks the calling thread in some way? If not, then that while (true) in the StartPollTask clearly explains the high CPU usage we're seeing.
Does Poll need to be called even if Produce/ProduceAsync is never called? If so, why? I understand this is probably Kafka behavior, but I searched about it but still couldn't really understand the mechanism. And if not, then why use a while (true) instead of, say, calling Poll only after ProduceAsync is called?
poll blocks until librdkafka is ready to inform the application of a new event. it doesn't busy-wait, so won't result in high-CPU if there are no events. you don't need to call poll for every produce call, just periodically. when it's called, callbacks are executed corresponding to every currently outstanding event (if a corresponding callback is registered, else the event is dropped). events include delivery callback notifications, or error or log events. this all happens behind the scenes in the .net library.
What can be done to use the Produce instead of the ProduceAsync more effectively and guarantee that the message was sent with sucess? There's best practices for this case? Or there's a case which we can still use the Async method and manual poll, I was a little confused reading the source code and documentations.
One thing that we tried too is using the parameters LingerMs and batch size without success.
you might like to read up on the guarantees offered by the idempotent producer, and transactions (coming in 1.4). the confluent blog has some good articles.
@mhowlett Just so we're on the same page: is this expected though? Is it normal to have the library alone (actually - _that method alone_) being responsible for ~80% of the total CPU usage? Knowing if that's expected will at least lead us on the right direction.
And by the way, for context: those RPS numbers are actually quite smaller - those 200k are actually horizontally divided up between some machines. So it's a F16s machine dealing with a fraction of those RPS.
you don't say how many messages per machine, but it seems like some reasonably large fraction of 200k. it also seems as though your messages are quite large (~1.5Mb). for context, a single producer can do somewhere between 150k and 600k msg/s for tiny messages, and this will be CPU bound. based on that, it seems like you might be pushing things quite hard and high CPU is not unexpected.
it may be worth experimenting with a very simple producer application in different environments to get a feel for what is possible (linux and windows - under the hood there are a few significant differences, it's possible they may vary). vary the message size up and down. Also feel free to paste a (simple) code snippet here and i'll tell you if anything is obviously wrong with it.
@mhowlett Actually, we're sorry, those numbers were wrong. Just by the way, however, even with those numbers that are on the OP, 300MB / 200k requests would be actually 1.5Kb per message, not 1.5Mb. So I don't think that's the issue.
In any way, those numbers on the OP are actually per minute, not per second. Based on these info and what you said, I'll assume then that this CPU utilization is not expected. We'll do some more investigation on our part and see what we can find. Thanks a lot for you help!
Edit: Oh, and by the way, we currently have seven F16 machines behind an LB dealing with these requests.
@andre-ss6 What confluent-kafka-dotnet version are you on? There have been producer performance issues in (librdkafka) v1.0.0 .. v1.2.1. Suggest upgrading to v1.3.0
Hi, @edenhill . Sorry about this long pause on this issue...
We have upgraded librdkafka.redist to 1.6.1, and still have the same behavior.
We have tried changing configurations like linger.ms (changed to 10) and batch.num.messages (but at this time is back to default of 10000) also, but without any expressive enhancements.
At this time we create a producer for each topic. Do you think that reusing the same producer for all topics on the same instance could have any improvement on CPU usage?
Do you have any other suggestions?
Most helpful comment
poll blocks until librdkafka is ready to inform the application of a new event. it doesn't busy-wait, so won't result in high-CPU if there are no events. you don't need to call poll for every produce call, just periodically. when it's called, callbacks are executed corresponding to every currently outstanding event (if a corresponding callback is registered, else the event is dropped). events include delivery callback notifications, or error or log events. this all happens behind the scenes in the .net library.