Node: Possible memory leak in http2

Created on 14 Jun 2018 · 14Comments · Source: nodejs/node

Version: v10.4.1
Platform: Windows 10 x64, Ubuntu 18.04 x64
Subsystem: http2

Hi, this is my first bug report for node, hopefully i get it right!

TLDR: http2session.request appears to be leaking memory outside of javascript (heap is clean, resident set size keeps growing).

I am attaching a rather crude test code which demonstrates the issue and is able to repro it every time.
The zip file attached contains client.js, server.js and "keys" folder containing self signed SSL keys.

http2test.zip

start the server (it will be running on port 8080) node server.js
start the client node --expose-gc client.js
let the client run for a little while

Depending on the performance of your machine, you should see the "test output" in < 15 seconds or so.

What happens

client will establish connection to the server and make 1 request
after we get a reply we save current process.memoryUsage() for later comparison
client will send 5 parallel requests 20 000 times to the server, 100k total
we will trigger GC and wait for 10 seconds, then trigger GC again (i need heapTotal to shrink and this seems to work)
we will grab the latest process.memoryUsage and calculate the delta for rss, heapTotal and heapUsed

You should get an output similar to this
Deltas: 0 KB Heap Total, -52 KB Heap Used, 10436 KB RSS

Feel free to play with the amount of requests you make, obviously higher number takes longer to execute but leaves much higher rss.

Here is a graph of rss, heap total and heap used over 24h on a system that does ~30k> requests per day, it has been running for several days (hence the larger rss).

24h

Sorry i wasn't able to narrow it down any further (took me over a month to trace the issue this far in my system as it kept leaking rather slowly and doing heap snapshot comparison never revealed anything).

http2 memory

Source

cTn-dev

Most helpful comment

I'll investigate over the weekend if this isn't fixed by then.

apapirovski on 15 Jun 2018

👍2

All 14 comments

@nodejs/http2

Trott on 14 Jun 2018

Thanks @cTn-dev for the extremely comprehensive bug report, that helps.

/cc @addaleax maybe you know what's wrong here?

ryzokuken on 15 Jun 2018

@ryzokuken Looked into it, but couldn’t find anything yet… It’s not https://github.com/nodejs/node/pull/21336, that just popped up while investigating. :)

addaleax on 15 Jun 2018

I wished it was related, and we had a fix ready :)
I'll try reproducing this first. Do you think we should also ping v8 peeps because this might be something with the GC (highly unlikely, but maybe).

ryzokuken on 15 Jun 2018

@ryzokuken It’s an increase in RSS but not in heap size … it’s not what we’d expect from a typical leak in JS

Also, as I understand it, this only happens with http2?

addaleax on 15 Jun 2018

@addaleax If you were referring to http/https, i haven't wrote a detailed test to have a look there, but after switching to https in my project the abnormal RSS increase completely went away, so i would say https is fine as of right now.

cTn-dev on 15 Jun 2018

👍1

I'll investigate over the weekend if this isn't fixed by then.

apapirovski on 15 Jun 2018

👍2

@ctn-dev I couldn’t find anything so far … while they don’t directly affect the results in your reproduction, could you check whether https://github.com/nodejs/node/pull/21336 and/or https://github.com/nodejs/node/pull/21373 solve the problem for you in production?

addaleax on 17 Jun 2018

I couldn't reproduce the result under valgrind --tool=massif, perhaps I am doing something wrong?
Also I don't observe a linearly growing RSS with increasing the total requests count. This doesn't look like a memory leak to me.

I expect the inconsitencies in the observed RSS memory usage to be caused by the memory allocator here.

ChALkeR on 17 Jun 2018

Moreover, calling gc() often (e.g. every 100 requests) seems to decrease the observed RSS change significantly. With total=100000, parallel=5, gc=100 it changes the RSS difference from ~35 MiB down to ~5 MiB. That works even with scavenges (gc(true));

My test code (client only, server is the same): https://gist.github.com/ChALkeR/aa392f5bb957279f0527e7f74e3b72da

I would say that something unoptimal is going on with memory allocation and garbage collection here, but this doesn't look like a memory leak. I.e. the memory is freed correctly, jut RSS doesn't decrease completely because of how memory allocator works.

ChALkeR on 17 Jun 2018

@addaleax could you point me to a nightly build that contains these changes? (i don't really have much experience with compiling node nor have the environment setup for such).

I can test it in production to see if it will behave differently, i can definitely see #21336 being related as there are some "short" custom headers used.

Considering what @ChALkeR said, i am less sure about this being a leak in http2, if that's the case, i am really sorry for pointing you in the wrong direction :(

What initially put me of the "this is just how GC behaves" is the fact that observed RSS growth appears to completely ignore --max_old_space_size.

cTn-dev on 17 Jun 2018

@cTn-dev

RSS not going down is not always a sign of a memory leak, it could be caused by memory fragmentation. E.g., if you malloc (with glibc impl) 1 GiB worth of small strings in pure C, then free() them all except the _last one_, RSS would still stay at 1 GiB. That's because glibc uses sbrk for small allicatons which could not be shrinked back partially in that case. There are memory allocators that don't use brk, e.g. the OpenBSD malloc, but they pay for that by being slower or more memory consuming in some (likely common) cases.
Based on running your testcase with valgrind massif I don't see a memleak (it uses it's own memory allocator to track things afaik), so I assume that the RSS not going down is caused by memory fragmentation.
My testcase with frequent gc calls shows that those objects are indeed collected, and if gc() is called faster — the fragmentation seems to be lower because there is more freed memory to reuse instead of allocating new.
RSS slowly growing might be indeed caused by the GC optimizations, but to my observations both 30k, 100k, and 1000k requests (with same concurrency=5) result in an identical increase of ~37 MiB, so this looks to be limited. _Note: measured without frequent manual gc() calls._
This indeed looks to be not affected by --max-old-space-size. Not sure if it should be, cc @addaleax? Perhaps V8 is not aware of some of the native http2 objects and is not applying proper gc pressure?

Note: I am not entirely sure if my analysis above is correct, let's wait for what @addaleax thinks on this. I might have missed something.

ChALkeR on 17 Jun 2018

@addaleax Since #21336 made it to the 10.5.0 release i have been running tests on several of the production systems and its looking good.

I will make sure to put #21373 and the new custom memory allocator #21374 through testing once they make it to the next release.

When it comes to my original issue, this appears to be fully resolved in 10.5.0, yay o/

I will keep this ticket open for now in case there is anything more to add here.

cTn-dev on 22 Jun 2018

🎉1

@cTn-dev I might've misread but it seems like the issues have been resolved? If so, I will close this out but do feel free to reopen if I misunderstood or if you find more issues. I'm strictly doing this so it's easier for everyone to identify which issues still exist and to focus on fixing them.

apapirovski on 25 Jun 2018

Was this page helpful?

0 / 5 - 0 ratings