Hi, this is my first bug report for node, hopefully i get it right!
TLDR: http2session.request appears to be leaking memory outside of javascript (heap is clean, resident set size keeps growing).
I am attaching a rather crude test code which demonstrates the issue and is able to repro it every time.
The zip file attached contains client.js, server.js and "keys" folder containing self signed SSL keys.
node server.jsnode --expose-gc client.jsDepending on the performance of your machine, you should see the "test output" in < 15 seconds or so.
What happens
process.memoryUsage() for later comparisonprocess.memoryUsage and calculate the delta for rss, heapTotal and heapUsedYou should get an output similar to this
Deltas: 0 KB Heap Total, -52 KB Heap Used, 10436 KB RSS
Feel free to play with the amount of requests you make, obviously higher number takes longer to execute but leaves much higher rss.
Here is a graph of rss, heap total and heap used over 24h on a system that does ~30k> requests per day, it has been running for several days (hence the larger rss).

Sorry i wasn't able to narrow it down any further (took me over a month to trace the issue this far in my system as it kept leaking rather slowly and doing heap snapshot comparison never revealed anything).
@nodejs/http2
Thanks @cTn-dev for the extremely comprehensive bug report, that helps.
/cc @addaleax maybe you know what's wrong here?
@ryzokuken Looked into it, but couldn’t find anything yet… It’s not https://github.com/nodejs/node/pull/21336, that just popped up while investigating. :)
I wished it was related, and we had a fix ready :)
I'll try reproducing this first. Do you think we should also ping v8 peeps because this might be something with the GC (highly unlikely, but maybe).
@ryzokuken It’s an increase in RSS but not in heap size … it’s not what we’d expect from a typical leak in JS
Also, as I understand it, this only happens with http2?
@addaleax If you were referring to http/https, i haven't wrote a detailed test to have a look there, but after switching to https in my project the abnormal RSS increase completely went away, so i would say https is fine as of right now.
I'll investigate over the weekend if this isn't fixed by then.
@ctn-dev I couldn’t find anything so far … while they don’t directly affect the results in your reproduction, could you check whether https://github.com/nodejs/node/pull/21336 and/or https://github.com/nodejs/node/pull/21373 solve the problem for you in production?
I couldn't reproduce the result under valgrind --tool=massif, perhaps I am doing something wrong?
Also I don't observe a linearly growing RSS with increasing the total requests count. This doesn't look like a memory leak to me.
I expect the inconsitencies in the observed RSS memory usage to be caused by the memory allocator here.
Moreover, calling gc() often (e.g. every 100 requests) seems to decrease the observed RSS change significantly. With total=100000, parallel=5, gc=100 it changes the RSS difference from ~35 MiB down to ~5 MiB. That works even with scavenges (gc(true));
My test code (client only, server is the same): https://gist.github.com/ChALkeR/aa392f5bb957279f0527e7f74e3b72da
I would say that something unoptimal is going on with memory allocation and garbage collection here, but this doesn't look like a memory leak. I.e. the memory is freed correctly, jut RSS doesn't decrease completely because of how memory allocator works.
@addaleax could you point me to a nightly build that contains these changes? (i don't really have much experience with compiling node nor have the environment setup for such).
I can test it in production to see if it will behave differently, i can definitely see #21336 being related as there are some "short" custom headers used.
Considering what @ChALkeR said, i am less sure about this being a leak in http2, if that's the case, i am really sorry for pointing you in the wrong direction :(
What initially put me of the "this is just how GC behaves" is the fact that observed RSS growth appears to completely ignore --max_old_space_size.
@cTn-dev
RSS not going down is not always a sign of a memory leak, it could be caused by memory fragmentation. E.g., if you malloc (with glibc impl) 1 GiB worth of small strings in pure C, then free() them all except the _last one_, RSS would still stay at 1 GiB. That's because glibc uses sbrk for small allicatons which could not be shrinked back partially in that case. There are memory allocators that don't use brk, e.g. the OpenBSD malloc, but they pay for that by being slower or more memory consuming in some (likely common) cases.
Based on running your testcase with valgrind massif I don't see a memleak (it uses it's own memory allocator to track things afaik), so I assume that the RSS not going down is caused by memory fragmentation.
My testcase with frequent gc calls shows that those objects are indeed collected, and if gc() is called faster — the fragmentation seems to be lower because there is more freed memory to reuse instead of allocating new.
RSS slowly growing might be indeed caused by the GC optimizations, but to my observations both 30k, 100k, and 1000k requests (with same concurrency=5) result in an identical increase of ~37 MiB, so this looks to be limited. _Note: measured without frequent manual gc() calls._
This indeed looks to be not affected by --max-old-space-size. Not sure if it should be, cc @addaleax? Perhaps V8 is not aware of some of the native http2 objects and is not applying proper gc pressure?
Note: I am not entirely sure if my analysis above is correct, let's wait for what @addaleax thinks on this. I might have missed something.
@addaleax Since #21336 made it to the 10.5.0 release i have been running tests on several of the production systems and its looking good.
I will make sure to put #21373 and the new custom memory allocator #21374 through testing once they make it to the next release.
When it comes to my original issue, this appears to be fully resolved in 10.5.0, yay o/
I will keep this ticket open for now in case there is anything more to add here.
@cTn-dev I might've misread but it seems like the issues have been resolved? If so, I will close this out but do feel free to reopen if I misunderstood or if you find more issues. I'm strictly doing this so it's easier for everyone to identify which issues still exist and to focus on fixing them.
Most helpful comment
I'll investigate over the weekend if this isn't fixed by then.