Node-rdkafka: Memory leak on disconnect() method usage with producer

Created on 17 Dec 2019  路  18Comments  路  Source: Blizzard/node-rdkafka

Hi,

In one of our application we are using nodejs based microservice which internally uses node-rdkafka module to send messages using kafka producer which involves making runtime call scheduled at specific interval to perform below operations:

  • Invoking kafka producer connect method
  • Calling producer using kafka produce method to send messages
  • Invoking kafka producer disconnect method

Disconnect method usage:

let prod =  new Kafka.Producer({
// config properties
})
prod.disconnect()

However, we have been noticing memory leak when calling node-rdfafka producer and suspecting that it is disconnect method which is causing the leak.

Thanks!

Most helpful comment

I've pushed preliminary fix on fix-producer-leaks branch. I still need help with testing. It would be great if some of you can try it out and see if it helps to prevent Producer memory leaks and also check if events are still flowing in as expected.

Also, it seems that Consumer has leaks when rebalance_cb or offset_commit_cb are set and these need to be addressed as well.

All 18 comments

What version of Node and node-rdkafka are you using? For clarification, are you creating multiple Producer instances and then disconnecting? Or is there just a few Producer instances instantiated at startup? A simple gist that can isolate and reproduce the problem would be very helpful.

@codeburke Yes we are creating multiple producer instances once for each runtime call (scheduled at specific time interval) we make which involves invoking connect() and disconnect() method at every call

Node version - 10
node-rdkafka - 2.7.4

Hi, I think I'm encountering the same issue with KafkaConsumer in my use of node-rdkafka in KafkaSSE. I also connect and then disconnect many KafkaConsumer instances.

I believe the issue has something to do with event_cb. If I set this to false, the memory leak goes away. From what I can tell, the native code is holding on to a reference of the _client in its event handler.

In client.js:

  if (!no_event_cb) {
    this._client.onEvent(function eventHandler(eventType, eventData) {
    ...
   }

In connection.cc's NodeOnEvent:

  Connection* obj = ObjectWrap::Unwrap<Connection>(info.This());

  v8::Local<v8::Function> cb = info[0].As<v8::Function>();
  obj->m_event_cb.dispatcher.AddCallback(cb);

I found this after debugging a memory leak in what I thought was my code. KafkaConsumer references were piling up, but I didn't have any JS references to them. I found the reference to 'self' (The KafkaConsumer instance) in the eventHandler function in client.js:

Screen Shot 2020-03-18 at 19 56 24

I'm not sure how to fix this, but likely something needs to remove dispatcher/callback in the C++ code after the client finishes disconnecting.

This doesn't help with the leak, but have you considered not doing that? I've just reviewed, by coincidence, a list of recommendations on mistakes to avoid when using Kafka, and one of the things on the list was continuously reconnecting. But perhaps this isn't possible in your use case.

I'm not really re-connecting that often. KafkaSSE is a Kafka -> HTTP bridge. Each new HTTP connection results in a new KafkaConsumer. The HTTP connections themselves are long lived, but over time as the service operates the memory leak eventually shows up in a big way. In Wikimedia's usage in EventStreams, it currently takes about 8 hours before a process reaches its memory limits and is killed and restarted.

Screen Shot 2020-03-19 at 14 01 11

Each new HTTP connection results in a new KafkaConsumer.

I think that's pretty much the definition of "often"!

Maybe consumers can be cached? Though if consumer has params specific to the connection you are out of luck.

I think that's pretty much the definition of "often"!

Each connection is long lived and unending. The HTTP response body streamed to the client in chunked-transfer encoding. We have about 60-80 concurrent connections, with about what currently looks like 6-10 reconnects per minute (which still seems like a lot to me, likely some remote client is not doing it right :p).

Each consumer has specific subscription params (topics, offsets, etc.)

BTW all our grafana dashboards are public :D
https://grafana.wikimedia.org/d/znIuUcsWz/eventstreams-k8s

Setting event_cb = false does indeed fix the memory leak.

Screen Shot 2020-03-23 at 11 08 07

Hi, I think there is a second memory leak affecting the HighLevelProducer which is pretty similar to the one related to event_cb. This time it's related to dr_cb and dr_msg_cb. Like with the event_cb bug the native code is holding a reference to self in it's event handler.

https://github.com/Blizzard/node-rdkafka/blob/583d24dc0769011eaf5d2e6d853c4fd0c17783ac/lib/producer.js#L89-L98

Unfortunately just applying the event_cb fix by setting dr_cb and dr_msg_cb to false won't work, because the HighLevelProducer will automatically set dr_cb to true again.

https://github.com/Blizzard/node-rdkafka/blob/583d24dc0769011eaf5d2e6d853c4fd0c17783ac/lib/producer/high-level-producer.js#L89

We have seen this issue on our producer as well. As a workaround, we set dr_cb = false when creating the producer. Is there any plan to address this type of mem leak due to callback?

Is there any plan to address this type of mem leak due to callback?

Yes. I'm working on it.

I've pushed preliminary fix on fix-producer-leaks branch. I still need help with testing. It would be great if some of you can try it out and see if it helps to prevent Producer memory leaks and also check if events are still flowing in as expected.

Also, it seems that Consumer has leaks when rebalance_cb or offset_commit_cb are set and these need to be addressed as well.

Hi @iradul,
I just tried out your fix and it looks like the memory leak in HighLevelProducer is gone now! The memory allocation timeline of our service running with your fix shows a much cleaner heap.

image

In comparison, this is the memory allocation timeline of our service running on the latest master.

memory-allocation-master

The active objects allocated in the middle of the master graph are HighLevelProducer instances which should have already been garbage collected by the end of the recording

@ArneSchulze thanks for testing this!

This is fixed with the latest version 2.9.0.

I think same issue is present in the consumer as well. Even after consumer disconnect queued messages are kept in the memory. Following is the code snippet which is leaking the memory.

Library version: 2.9.1

 ```
   this.consumer = new Kafka.KafkaConsumer({
        'group.id': Math.random().toString(),
        'metadata.broker.list': this.kafkaUrl,
        'enable.auto.commit': false,
        'queued.max.messages.kbytes': 10240, // 10mb queue size
    }, { 'auto.offset.reset' : 'earliest'});

    this.consumer.assign([{
            topic: this.topic,
            partition: 0,
            offset: this.from_offset,
     }])

     this.consumer.unassign();
     this.consumer.disconnect();
     this.consumer = null;

```

@PT10 please open a new issue and give a complete working example where memory leak happens.

Was this page helpful?
0 / 5 - 0 ratings