Merely subscribing seems to be enough. This is on a windows machine (Win 8).
I've tested this using the example AdvancedProducer. Using "consume" as an argument (any topic), having rewritten the loop into this:
while (!cancelled)
{
System.Threading.Thread.Sleep(10000);
}
The program uses 13% of my CPU. That translates to roughly one full core.
I noticed this in a program I'm developing where I have 3 consumers and the CPU usage goes to 50%. I've tried deleting kafka-logs and zookeeper data multiple times. With a few more consumers I can get it up to almost 100%. The consumer just seems to sit there taking up CPU. Is there something wrong on my end?
Actually I don't even have to subscribe it seems. And it's the same when using "poll". If I add "debug" => "all" to my consumer config without subscribing I get this:
7|2017-03-03 11:45:04.737|rdkafka#consumer-1|SEND| [thrd:10.0.0.14:9092/0]: 10.0
.0.14:9092/0: Sent HeartbeatRequest (v0, 97 bytes @ 0, CorrId 6)
7|2017-03-03 11:45:04.737|rdkafka#consumer-1|RECV| [thrd:10.0.0.14:9092/0]: 10.0
.0.14:9092/0: Received HeartbeatResponse (v0, 2 bytes, CorrId 6, rtt 0.00ms)
7|2017-03-03 11:45:04.767|rdkafka#consumer-1|HEARTBEAT| [thrd:main]: 10.0.0.14:9
092/0: Heartbeat for group "advanced-csharp-consumer" generation id 1
7|2017-03-03 11:45:05.737|rdkafka#consumer-1|SEND| [thrd:10.0.0.14:9092/0]: 10.0
.0.14:9092/0: Sent HeartbeatRequest (v0, 97 bytes @ 0, CorrId 7)
7|2017-03-03 11:45:05.737|rdkafka#consumer-1|RECV| [thrd:10.0.0.14:9092/0]: 10.0
.0.14:9092/0: Received HeartbeatResponse (v0, 2 bytes, CorrId 7, rtt 0.00ms)
7|2017-03-03 11:45:05.783|rdkafka#consumer-1|HEARTBEAT| [thrd:main]: 10.0.0.14:9
092/0: Heartbeat for group "advanced-csharp-consumer" generation id 1
7|2017-03-03 11:45:06.737|rdkafka#consumer-1|SEND| [thrd:10.0.0.14:9092/0]: 10.0
.0.14:9092/0: Sent HeartbeatRequest (v0, 97 bytes @ 0, CorrId 8)
7|2017-03-03 11:45:06.737|rdkafka#consumer-1|RECV| [thrd:10.0.0.14:9092/0]: 10.0
.0.14:9092/0: Received HeartbeatResponse (v0, 2 bytes, CorrId 8, rtt 0.00ms)
7|2017-03-03 11:45:06.799|rdkafka#consumer-1|HEARTBEAT| [thrd:main]: 10.0.0.14:9
092/0: Heartbeat for group "advanced-csharp-consumer" generation id 1
7|2017-03-03 11:45:07.737|rdkafka#consumer-1|SEND| [thrd:10.0.0.14:9092/0]: 10.0
.0.14:9092/0: Sent HeartbeatRequest (v0, 97 bytes @ 0, CorrId 9)
7|2017-03-03 11:45:07.737|rdkafka#consumer-1|RECV| [thrd:10.0.0.14:9092/0]: 10.0
.0.14:9092/0: Received HeartbeatResponse (v0, 2 bytes, CorrId 9, rtt 0.00ms)
Which seems like nothing out of the ordinary.
Which client version are you using?
Den 3 mar 2017 11:12 skrev "Sebastian Zander" notifications@github.com:
Merely subscribing seems to be enough. This is on a windows machine (Win
8).I've tested this using the example AdvancedProducer. Using "consume" as an
argument (any topic), having rewritten the loop into this:while (!cancelled) { System.Threading.Thread.Sleep(10000); }The program uses 13% of my CPU. That translates to roughly one full core.
I noticed this in a program I'm developing where I have 3 consumers and
the CPU usage goes to 50%. I've tried deleting kafka-logs and zookeeper
data multiple times. The consumer just seems to sit there taking up CPU. Is
there something wrong on my end?—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/confluentinc/confluent-kafka-dotnet/issues/87, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAgCvl06avKDczizzU8apjdrGzOaxT-lks5rh-d3gaJpZM4MSFSY
.
I'm using the 0.9.4-preview8.
As to Kafka and Zookeeper I'm using:
kafka 2.12-0.10.2.0
zookeeper-3.4.9
Hi @zsebastian - just confirmed this issue on Windows 10 / .NET Core 1.03 / confluent-kafka-dotnet 0.9.4. This is high priority, we're looking into it.
happens immediately after the call to SafeKafkaHandle.Create(RdKafkaType.Consumer, configPtr); in the Consumer constructor.
the rdkafka_consumer_example_cpp example that comes with librdkafka also shows the high CPU problem for me (using version 0.9.3-pre1 - have not tried to update to/compile 0.9.4 yet).
here's a dump of my debug=all output taken during the main loop of the advanced consumer example running the 'consume' method (poll time =100ms).
7|2017-03-06 08:58:05.608|rdkafka#consumer-1|FETCH| [thrd:10.211.55.2:9092/0]: 10.211.55.2:9092/0: Topic aabb [0] MessageSet size 0, error "Success", MaxOffset 0, Ver 2/2
7|2017-03-06 08:58:05.611|rdkafka#consumer-1|FETCH| [thrd:10.211.55.2:9092/0]: 10.211.55.2:9092/0: Fetch topic aabb [0] at offset 0 (v2)
7|2017-03-06 08:58:05.612|rdkafka#consumer-1|FETCH| [thrd:10.211.55.2:9092/0]: 10.211.55.2:9092/0: Fetch 1/1/1 toppar(s)
7|2017-03-06 08:58:05.612|rdkafka#consumer-1|SEND| [thrd:10.211.55.2:9092/0]: 10.211.55.2:9092/0: Sent FetchRequest (v1, 63 bytes @ 0, CorrId 1287)
7|2017-03-06 08:58:05.713|rdkafka#consumer-1|RECV| [thrd:10.211.55.2:9092/0]: 10.211.55.2:9092/0: Received FetchResponse (v1, 36 bytes, CorrId 1287, rtt 94.00ms)
7|2017-03-06 08:58:05.714|rdkafka#consumer-1|FETCH| [thrd:10.211.55.2:9092/0]: 10.211.55.2:9092/0: Topic aabb [0] MessageSet size 0, error "Success", MaxOffset 0, Ver 2/2
7|2017-03-06 08:58:05.717|rdkafka#consumer-1|FETCH| [thrd:10.211.55.2:9092/0]: 10.211.55.2:9092/0: Fetch topic aabb [0] at offset 0 (v2)
7|2017-03-06 08:58:05.717|rdkafka#consumer-1|FETCH| [thrd:10.211.55.2:9092/0]: 10.211.55.2:9092/0: Fetch 1/1/1 toppar(s)
7|2017-03-06 08:58:05.717|rdkafka#consumer-1|SEND| [thrd:10.211.55.2:9092/0]: 10.211.55.2:9092/0: Sent FetchRequest (v1, 63 bytes @ 0, CorrId 1288)
7|2017-03-06 08:58:05.819|rdkafka#consumer-1|RECV| [thrd:10.211.55.2:9092/0]: 10.211.55.2:9092/0: Received FetchResponse (v1, 36 bytes, CorrId 1288, rtt 94.00ms)
7|2017-03-06 08:58:05.819|rdkafka#consumer-1|FETCH| [thrd:10.211.55.2:9092/0]: 10.211.55.2:9092/0: Topic aabb [0] MessageSet size 0, error "Success", MaxOffset 0, Ver 2/2
7|2017-03-06 08:58:05.822|rdkafka#consumer-1|FETCH| [thrd:10.211.55.2:9092/0]: 10.211.55.2:9092/0: Fetch topic aabb [0] at offset 0 (v2)
7|2017-03-06 08:58:05.823|rdkafka#consumer-1|FETCH| [thrd:10.211.55.2:9092/0]: 10.211.55.2:9092/0: Fetch 1/1/1 toppar(s)
7|2017-03-06 08:58:05.823|rdkafka#consumer-1|SEND| [thrd:10.211.55.2:9092/0]: 10.211.55.2:9092/0: Sent FetchRequest (v1, 63 bytes @ 0, CorrId 1289)
Thanks Matt.
That output looks okay, fetch.wait.max.ms defaults to 100ms so those wakeups are expected.
Will need to put this beauty through the visual studio profiler.
Any update on this issue? Im having the same problem. I am using 0.9.4.
Any news on this issue? It's a big problem for us since we are using multiple consumers on the same server.
We've scheduled to look into this issue early next week, will keep you posted.
@mhowlett, met you at Elasticon, good to see you're working on this. We are eager to get this fix. Thanks.
Hi @gcampbell-epiq - was great to meet you.
we're acutely aware of this problem and it's near the top of the priority list.
same issue here running in .net core
Hhhhmmmmmmm librdkafka seems like it has not had any work done on it in quite some time
@ByronAP Most recent commits are from earlier today... https://github.com/edenhill/librdkafka/commits/master
werd tnx, any reason why it is not under this account?
@ByronAP We sync to https://github.com/confluentinc/librdkafka as needed, but the project started outside of Confluent.
yeah linkedin right, just need to get this fixed even if I have to do it so I just needed the working repo, we are migrating a lot of things to .net core and Kafka so vested interest makes fixes worth looking into :-)
@ByronAP - are you seeing 100% CPU simply on constructing a Consumer instance (i.e. before you even start using it)? It would be good to understand the circumstances around this in more detail - i've not personally come across this issue on any linux system (I maintain a fairly substantial system that uses the dotnet client on ubuntu). note that librdkafka is used at scale in many production systems (e.g. by Blizzard). librdkafka is presently used a lot less on Windows, but demand is accelerating (we did not start supporting a dotnet client by accident...) and it's a safe bet to rely on librdkafka being maintained / supported for that platform going forward.
@ByronAP Can you provide some more details?:
It only manifests on Windows (both core and net45), spinning starts as soon as the consumer is instantiated (holding that handle), it maxes 1 thread, I have seen this before when a loop is spinning to fast but not doing any real work, http://imgur.com/a/l7KGT
NET45 simple example used for debug
using System;
using System.Collections.Generic;
using System.Net;
using Confluent.Kafka;
namespace KafkaNet45Test
{
class Program
{
private static Consumer<Null, Null> _consumer;
static void Main(string[] args)
{
var config = new Dictionary<string, object>
{
{"bootstrap.servers", "redacted-host:1234"},
{"group.id", $"{Dns.GetHostName()}feed"},
{
"default.topic.config", new Dictionary<string, object>
{
{"auto.offset.reset", "latest"}
}
}
};
try
{
_consumer = new Consumer<Null, Null>(config, null, null);
}
catch (Exception e)
{
Console.WriteLine(e);
}
Console.ReadLine();
}
}
}
I was just playing with this a little: The issue isn't specific to the .NET client - in the librdkafka example cpp consumer, CPU goes to 100% immediately after the call to rd_kafka_brokers_add0 in rd_kafka_new. Presumably this means the librdkafka main thread starts doing something meaningful at that point. I'll hopefully get a bit more time to keep stepping through this soon to pinpoint the problem more specifically.
I have no idea if this is related or not, but: https://github.com/edenhill/librdkafka/pull/1121
If not, I'm suspecting this issue is something like the above.
v. sorry to have this issue open for so long, we're very time constrained ATM.
This is related. I found this bug while using the C# wrapper.
This has been been fixed by @orthrus on librdkafka master.
Until there's a a new NuGet package I suggest finding librdkafka.dll in .nuget\packages\librdkafka\0.9.4\runtimes\win7-..\native and replacing it with the corresponding Release librdkafka.dll from Appveyor
Do note that the above dll features other fixes and enhancements since 0.9.4., it is not only the CPU fix, and should not be used in production until a formal release is made.
Any idea when a new nuget package could be available?
I can confirm that this fix helped us with our applications. CPU is now low and not killing our servers. Just need a NuGet package now so we can build and ship like regular workflow.
@raskolnikoov Thanks! Will let you know when we have a date
@edenhill i tried to use the fix you provided for the CPU - the CPU usage has been brought down drastically, but i feel it is raising another concern when the server re-balances the clients, multiple clients in the same group, receives the messages from same topic-partition.
I am not sure whether this is the issue caused by this fix alone..
Thanks for verifying the fix, @suganthkumar
The issue with receiving the same message is most likely not related, can you file a new issue and provide some more information:
Thanks
Hi @edenhill, I tried your suggestion of replacing the librdkafka.dll.
"This has been been fixed by @orthrus on librdkafka master.
Until there's a a new NuGet package I suggest finding librdkafka.dll in .nuget\packages\librdkafka\0.9.4\runtimes\win7-..\native and replacing it with the corresponding Release librdkafka.dll from Appveyor
Do note that the above dll features other fixes and enhancements since 0.9.4., it is not only the CPU fix, and should not be used in production until a formal release is made."
However, I'm now getting the below exception:
"Unable to load DLL 'librdkafka': The specified module could not be found. (Exception from HRESULT: 0x8007007E)"
And here's the stack trace:
" at Confluent.Kafka.Impl.LibRdKafka.NativeMethods.rd_kafka_version()\r\n at Confluent.Kafka.Impl.LibRdKafka.version()\r\n at Confluent.Kafka.Impl.LibRdKafka..cctor()"
Any help or ideas would be appreciated.
Thanks!
Hey @mhowlett , can you follow my instructions above and see if replacing librdkafka.dll works for you or you see the same issue as @thebmusic ?
I think I got it figure out. I had a mis-configured app pool. Sorry for the confusion.
@edenhill ready for release? :)
I was scratching my head for this issue and @edenhill 's instruction on 3/19 saved the day! Hooray!
Hi,
If i use librdkafka.dll from Appveyor, then i get below error. With current version of confluent kafka available in GITHub, my app works with high CPU usage.
System.TypeInitializationException was caught
_HResult=-2146233036
_message=The type initializer for 'Confluent.Kafka.Impl.LibRdKafka' threw an exception.
HResult=-2146233036
IsTransient=false
Message=The type initializer for 'Confluent.Kafka.Impl.LibRdKafka' threw an exception.
Source=Confluent.Kafka
TypeName=Confluent.Kafka.Impl.LibRdKafka
StackTrace:
at Confluent.Kafka.Impl.LibRdKafka.conf_new()
at Confluent.Kafka.Impl.SafeConfigHandle.Create()
at Confluent.Kafka.Consumer..ctor(IEnumerable`1 config)
at InDrive.RdKafkaQueueProcessor.DequeueMessages(Object obj) in c:\ProjectsStashMT\hum_mt\Common\RdKafkaQueueProcessor.cs:line 289
InnerException: System.DllNotFoundException
_HResult=-2146233052
_message=Unable to load DLL 'librdkafka': The specified module could not be found. (Exception from HRESULT: 0x8007007E)
HResult=-2146233052
IsTransient=false
Message=Unable to load DLL 'librdkafka': The specified module could not be found. (Exception from HRESULT: 0x8007007E)
Source=Confluent.Kafka
ResourceId=0
TypeName=""
StackTrace:
at Confluent.Kafka.Impl.LibRdKafka.NativeMethods.rd_kafka_version()
at Confluent.Kafka.Impl.LibRdKafka.version()
at Confluent.Kafka.Impl.LibRdKafka..cctor()
InnerException:
Hmm.. I use librdkafka.dll from Appveyor and my producers and consumers work stable.
in that case, can you share your one? i can try that if that works.
@vaibhavg1985apr - the error suggests you might have put it in the wrong location?
My apps runs in x64
I realized that this dll is for x64 and not x86. But, i can't switch to x64 as other dlls used by our app are for x86. So, could you please share x86 version for that dll?
Sorry I dont have the x86 version. Hopefully they will release a new nuget package this week with the fix. :)
Hi Matt/Eden,
Could you please help soon. We have to go live soon. With high CPU usages, we can't as it will crash our boxes.
Here is is the AppVeyor build history for librdkafka:
https://ci.appveyor.com/project/edenhill/librdkafka/history
I believe the build @edenhill was pointing you to is:
https://ci.appveyor.com/project/edenhill/librdkafka/build/0.9.4-R-post23
you should be able to use the x86 .dll from here:
https://ci.appveyor.com/project/edenhill/librdkafka/build/0.9.4-R-post23/job/qunugqxfbbpmyh0w/artifacts
sorry we haven't got the release out yet.
Thanks Matt. That worked
@edenhill @mhowlett please, any updates on when a nuget is gonne be released? :(
soon. @treziac's #133 effort helps immensely as we really want that resolved as well.
👌👌👌
Hey guys,
We've had this exact issue as well, however it appears our issue was sprung from not specifying the consumer.Poll timeout parameter. See for example:

When we didn't specify this, our application would peg at about 20% CPU usage. Suprisingly, even when the socket was closed, the app would not drop the usage (so as more clients connected, the CPU would gradually just hit 100%).
you see this behaviour when using the replacement .dll?
note: you should generally be using a timeout to make sure your app never gets stuck in the poll loop.
Not using the replacement .dll but can't quite test that just yet.
Would it be worth adding a comment to the parameterless Poll() method? Just explaining the potential risk of not providing a timeout? Alternatively, removing/deprecating it and forcing the user to specify a '0' timeout if they don't want one (Although I am not sure the use case as to why people would not want one, thus this comment)
@mhowlett when will the fix be released? We´re waiting... :)
@NicholasFaneDev - I just added #142 for this.
@raskolnikoov - thanks for the hassling... @edenhill prepared the binaries late last week. @treziac has provided great input on sorting out the issues with packaging of those. After the new librdafka.redist package is out, the .NET side is straight forward. Shouldn't be too long...
Great!
just an update on this. there is an unlisted package 0.9.5-RC2 on nuget.org which is working for .NET core, but packaging is broken for .NET Framework projects. we're working on it.
@mhowlett any idea when the .NET Framework version can be released? That's the one I need. Thanks
@raskolnikoov and many others -
Confluent.Kafka version 0.9.5-RC3 is now on nuget.org and resolves this issue.
big thanks to @orthrus for finding and resolving the problem and @treziac for help with the packaging of 0.9.5.
closing - 0.9.5 has been released.
I have upgraded to Confluent.Kafka v0.9.5, but I am still seeing 100% CPU consumption. I have created a windows service that is responsible for listening to around 40 topics, so it is running 40 consumers each on a separate thread. The server is Windows 2003 R2 with SP1, x64, 4 processors, 8GB RAM. The Windows service application is created using c# on .NET 4.6.1.
Can you check you are using librdkafka 0.9.5 (and not just confluent.kafka 0.9.5)? You can check the dll or call Confluent.Kafka.Library.VersionString
how many messages per second are being consumed?
do you know if there is a problem when no messages are being consumed?
what poll timeout are you using?
have you run any tests where you are only using a single consumer? two consumers? when does CPU usage become an issue?
I have not personally tested the scenario you describe, though if there is little load on the consumers, 100% CPU would seem unexpected.
Also note: I'm note sure what your use case is, but it's unusual to have 40 consumers operating in the one process.
I am in the process of migrating a JMS based consumer application to Kafka. These consumers receive around 500K messages / hour (peak time throughput). They are spread over 10 topics with 4 parallel consumers per topic. This is my initial testing attempt. The current setup was ported as is from JMS based design. I have configured poll timeout somewhere between 1 sec to 30 secs based on the anticipated throughput on each topic. I will be going thru a capacity planning exercise to figure out the right balance of processes / topics and number of consumers.
Anyway, the problem was with librdkafka version. It was still pointing at 0.9.4 even thou I upgraded to Confluent.Kafka 0.9.5. Once I wiped off the binaries from bin folder and rebuild the application, it now shows correct version 0.9.5 and now I no longer see 100% CPU spike. Thank you @treziac and @mhowlett for your insights.
Most helpful comment
@raskolnikoov and many others -
Confluent.Kafka version 0.9.5-RC3 is now on nuget.org and resolves this issue.
big thanks to @orthrus for finding and resolving the problem and @treziac for help with the packaging of 0.9.5.