Node: Node 4.2 large memory spikes and timeouts on Heroku

Created on 14 Oct 2015 · 43Comments · Source: nodejs/node

Below is a screenshot of the past 12 hours which has been a total disaster for us to say the least. After updating to node 4.2 (from 0.10) in production we immediately exceeded all memory quotas and experienced a high volume of timeouts (even with no load and memory under the 1GB limit).

First, I apologize if this is not the place for this. I am happy to move discussion somewhere else and we will help diagnose whatever you guys need. We did this same parade with Node 0.12 and had to downgrade to 0.10.

Second, and I guess the real question here, is Heroku's 512MB of ram simply not enough to run Node 4.x? If that is the case, cool, but memory constraints definitely need to be made more clear.

Timeline:

Tue, Oct. 13th, at 2pm EST we deployed Node 4.2.0 with cluster enabled running 2 threads. Immediately hit 512 memory limit as seen in the picture below.
Tue, Oct. 13th, at 2:15pm EST we removed cluster completely. Hit memory limits 30 min later.
Tue, Oct. 13th, at 4:00pm EST we saw Node 4.2.1 was released, deployed 4.2.1. Continued to hit memory limits
Tue, Oct. 13th, at 6:00pm EST we doubled memory to 1GB

In general you can see the memory is all over the place, maybe that is expected with newer versions of V8...

screen shot 2015-10-14 at 12 08 03 pm

Although I don't have a screen, you can see in the first part of the graph running Node 0.10 that it stays almost perfectly flat at 256MB of ram. Under any load, that was consistent.

For reference, here is a load test we did in a dev environment running Node 4.2.1, cluster forked to 4 processes, and about 5k rpms. Also immediately hit the higher 1GB memory limit. We then dropped this down to 2 forked processes with the same result.

screen shot 2015-10-14 at 12 26 15 pm

memory performance

Source

AndrewBarba

👍3

Most helpful comment

Hello, the info in this issue was very useful. I made a module based on @joanniclaborde suggestion.
It is here: https://github.com/damianmr/heroku-node-settings

The good thing is that the node process never reaches its max memory available, so it never crashes in Heroku.

We have been using it in a real project with thousand of requests per minute and had no crash whatsoever.

I encourage you to try it out and report any problems you find.

damianmr on 12 Jul 2016

👍7 🎉4

All 43 comments

@rvagg @trevnorris @Fishrock123 ... ideas?

jasnell on 14 Oct 2015

What happens when you start with (for example) node --max_old_space_size=256? The GC is lazy (lazier than v0.10) and the default upper limit is about 1.5 GB so you'll need to cap it if your machine has less memory.

bnoordhuis on 14 Oct 2015

👍1

Okay just tried running with --max_old_space_size=256 using cluster and 4 processes. Hit the limit immediately so dropping down to 1 process again. Is the 256mb per process?

2015-10-14T16:46:29.457954+00:00 heroku[web.1]: source=web.1 dyno=heroku.18327029.28d659a4-2e6e-4ee5-9fb9-18dc6d99e1de sample#memory_total=578.31MB sample#memory_rss=445.58MB sample#memory_cache=0.00MB sample#memory_swap=132.73MB sample#memory_pgpgin=338334pages sample#memory_pgpgout=225287pages

AndrewBarba on 14 Oct 2015

Is the 256mb per process?

Yes.

bnoordhuis on 14 Oct 2015

Dropping down to 1 process --max_old_space_size=256 and --max_old_space_size=128 both returned VERY consistent results. Huge difference. Here is running at 128:
screen shot 2015-10-14 at 1 01 38 pm
And here is running at --max_old_space_size=512:
screen shot 2015-10-14 at 12 59 35 pm

AndrewBarba on 14 Oct 2015

👍2

@bnoordhuis ... looking at this, we likely definitely want to make sure this is documented better

jasnell on 14 Oct 2015

Yes very eye-opening for me at least. Fingers crossed, but I think you guys just saved us from having to revert everything again. Really appreciate the quick response. Going to leave this open through end of day while we do a few more tests on our end.

AndrewBarba on 14 Oct 2015

@jasnell I don't disagree but where would you put it? Maybe it's time we add a FAQ.

bnoordhuis on 14 Oct 2015

FWIW I just added a FAQ to the wiki here and it is linked to on the main wiki page. Perhaps this could also be linked to somewhere on nodejs.org and other places?

mscdex on 14 Oct 2015

/cc @hunterloftis

friism on 15 Oct 2015

perhaps dig through a few old issues that shared the same type of characteristics so google/seo/etc can drive people in the right direction?

jbergstroem on 15 Oct 2015

@friism I recommend --max_old_space_size when this occasionally pops up; the biggest downside is that when you specify that flag and then exceed the limit, your app will shut down hard. Heroku allows you to burst 5x the memory limit, so it's not always best to crash an app instead of letting it temporarily exceed soft limits.

This is probably documentation we should provide as well in a 'Memory-management with Node on Heroku' article.

hunterloftis on 15 Oct 2015

@hunterloftis From what we've seen, the second we hit a memory limit everything pretty much starts timing out. I'd actually rather a hard kill at that point so we can get a fresh node up that can respond to requests again.

You mention that if we specify --max_old_space_size the node will shut down immediately if we hit that limit, but doesn't that flag have to do with V8's heap and when it should start cleaning things up? I don't see how that relates to Heroku's memory limit and when a node is killed (other than exceeding Heroku's limit of course). In the tests above where we passed --max_old_space_size=128 the memory on the node was around 220MB.

AndrewBarba on 15 Oct 2015

@AndrewBarba yeah it looks like your app is particularly hard-hit by lazy collection, and in that case the lesser evil is shutting down a process.

I'd be interested in hearing from @bnoordhuis on this but, from what I've seen, if your app _requires_ more space than max_old_space_size allows (ie, you're actually storing > X mb in objects, vs storing < X mb in objects and needing more frequent sweeps to clear out things you don't store anymore)... then you'll get 'allocation failed' errors (hard shutdowns).

Also keep in mind that 'old space' is just one part of the application's memory footprint, so you can expect the whole app to take > max_old_space_size as a whole.

hunterloftis on 15 Oct 2015

Might try playing with flags like --gc_interval and/or --always_compact. Will slow down the application, but if there's a memory crunch then is worth giving it a go.

trevnorris on 15 Oct 2015

For a kitchen sink example with arbitrary values:

node --gc_global --optimize_for_size --max_old_space_size=960 --use_idle_notification --always_compact --max_executable_size=64 --gc_interval=100 --expose_gc server.js

hunterloftis on 15 Oct 2015

👀1

@hunterloftis Yes, --max_old_space_size is a hard limit. --use_idle_notification is currently a no-op, by the way.

bnoordhuis on 17 Oct 2015

Just weighing in to share my experience: I had this happen to me with 0.12.x and 4.0.x - before I migrated everything to 4.0, Heroku Support recommended --max_old_space_size and it worked like a charm. I still get some instances of the error every now and then, but the affected dyno shuts down and no great harm is done, just a couple dozen timed out requests once a week or so (out of thousands per minute).
Also, migrating to 4.0.x cut down memory usage from an average of about 500MB to 270 per dyno.

claudiorodriguez on 20 Oct 2015

@fansworld-claudio That's great to hear. What size dyno are you running and what did you end up setting max_old_space_size to? Also have you used this setting with cluster?

AndrewBarba on 20 Oct 2015

@AndrewBarba At first I had the Standard-2X (1GB) dynos and max_old at 960, then after a couple weeks of stability I scaled it down to Standard-1X (512MB) and max_old at 480 - has been stable for a couple months. YMMV though, this is a REST API that does mostly i/o with redis and mongo.

claudiorodriguez on 20 Oct 2015

It's kinda a pain to set this for all your heroku apps (and remember to vary this based on dyno size), so I made heroku-node wraps node and sets the max_old_space_size based on the values of $WEB_MEMORY and $WEB_CONCURRENCY which heroku sets based on the size of dyno you've chosen.

apechimp on 30 Oct 2015

@apechimp I just gave that a go and this showed in the console immediately:

2015-10-31T14:51:14.352577+00:00 app[web.1]: <--- Last few GCs --->
2015-10-31T14:51:14.352577+00:00 app[web.1]:
2015-10-31T14:51:14.352579+00:00 app[web.1]:    14856 ms: Mark-sweep 48.0 (86.0) -> 48.0 (86.0) MB, 93.5 / 0 ms [allocation failure] [GC in old space requested].
2015-10-31T14:51:14.352580+00:00 app[web.1]:    14967 ms: Mark-sweep 48.0 (86.0) -> 48.0 (86.0) MB, 111.4 / 0 ms [allocation failure] [GC in old space requested].
2015-10-31T14:51:14.352581+00:00 app[web.1]:    15055 ms: Mark-sweep 48.0 (86.0) -> 48.0 (86.0) MB, 87.9 / 0 ms [last resort gc].
2015-10-31T14:51:14.352582+00:00 app[web.1]:    15178 ms: Mark-sweep 48.0 (86.0) -> 48.0 (86.0) MB, 122.7 / 0 ms [last resort gc].
2015-10-31T14:51:14.352582+00:00 app[web.1]:
2015-10-31T14:51:14.352583+00:00 app[web.1]:
2015-10-31T14:51:14.352583+00:00 app[web.1]: <--- JS stacktrace --->
2015-10-31T14:51:14.352584+00:00 app[web.1]:
2015-10-31T14:51:14.352585+00:00 app[web.1]: ==== JS stack trace =========================================

Looks like you are dividing WEB_MEMORY by some factor of concurrency but WEB_MEMORY as explained by that article you reference is the recommended memory for each process. As is, running this on the Performance M dyno would give you:

WEB_CONCURRENCY = 5;
WEB_MEMORY = 512;
WEB_MEMORY / (2 * WEB_CONCURRENCY) = 51.2MB

51.2MB of ram is not enough to even start up. I think you can just simplify to WEB_MEMORY / 2 or even WEB_MEMORY / 1.5. For us 256 seems to be the sweet spot so we'll stick with / 2

AndrewBarba on 31 Oct 2015

👍1

@AndrewBarba, _mea culpa_ That's fixed as of [email protected].

apechimp on 31 Oct 2015

Couldn't the garbage collector also being executed when getting and out of memory exception? This would helps in a more generic way on systems with really low memory constrains, for example when executing npm on NodeOS it starts killing processes on QEmu with the default memory settings (128mb), and the same would happen on Raspberry Pi without swap...

piranna on 19 Nov 2015

@piranna What exact exception do you mean? If you are having allocation failures in mind, catching all of those in some generic way is a terrible idea — many things could go wrong from that. If you are speaking about the system oom killer, intercepting it requires specific settings on the system level.

ChALkeR on 19 Nov 2015

@piranna What exact exception do you mean?

The error I get is an Out of Memory exception, thrown at process level:
pantallazo-qemu
It happens when calling malloc() and the system don't have enough free memory or a contiguos free chunk big enough for the requested size. I propose that when creating a new Javascript object, when trying to reserve memory capture the exception on the C++ area and if so, exec the garbage colector and try to reserve it again. Probably this is more a v8 issue...

piranna on 19 Nov 2015

@pirania That screenshot you have posted shows the OOM killer triggered. It's not interceptable by default.

Also, it's not necessary caused by node process. Your process could allocate some memory at the beginning, then do nothing and still be killed by the OOM killer later (while doing nothing).

ChALkeR on 19 Nov 2015

@pirania That screenshot you have posted shows the OOM killer triggered. It's not interceptable by default.

What a shame, I though it would be a good feature :-(

Also, it's not necessary caused by node process. Your process could allocate some memory at the beginning, then do nothing and still be killed by the OOM killer later (while doing nothing).

Yeah, I know it's killing random processes after it. In fact in that screenshot the error was caused by slap but nsh stopped working too :-/

piranna on 19 Nov 2015

For what it's worth, here's the solution we're trying now (I sure wish I had found this conversation before!!):

if [ ! "$WEB_MEMORY" = "" ]; then
  if [ $WEB_MEMORY -le 512 ]; then
    NODE_FLAGS="--max_semi_space_size=2 --max_old_space_size=256 --max_executable_size=192"
  elif [ $WEB_MEMORY -le 768 ]; then
    NODE_FLAGS="--max_semi_space_size=8 --max_old_space_size=512 --max_executable_size=384"
  elif [ $WEB_MEMORY -le 1024 ]; then
    NODE_FLAGS="--max_semi_space_size=16 --max_old_space_size=1024 --max_executable_size=512"
  fi
fi

node $NODE_FLAGS "$@"

I'm getting the values for those flags from the V8 defaults, and it seems to be working great so far (Heroku / 512mb).

joanniclaborde on 20 Nov 2015

👍3

the default upper limit is about 1.5

Could it be possible to lower it when the sum of phisical memory + swap is smaller than this? It doesn't makes sense to have such a big limit if it's impossible to achieve it... This would help on memory-constrained systems.

piranna on 11 Dec 2015

@piranna see https://github.com/nodejs/node/issues/3370#issuecomment-148514862 and https://github.com/nodejs/node/issues/3370#issuecomment-158521878

hunterloftis on 11 Dec 2015

@piranna see #3370 (comment) and #3370 (comment)

Yes, I've read them, what I'm asking for is to calculate and set them automatically on start instead of setting the values with some flags.

piranna on 11 Dec 2015

Or even a single flag could be a nice option: node --physical_memory=$WEB_MEMORY

joanniclaborde on 11 Dec 2015

I also added custom settings for modulus.io's 396mb servos:

if [ $1 -le 396 ]; then
  NODE_FLAGS="--max_semi_space_size=1 --max_old_space_size=198 --max_executable_size=148"
elif ...

joanniclaborde on 11 Dec 2015

Or even a single flag could be a nice option: node --physical_memory=$WEB_MEMORY

Is not max_old_space_size flag equivalent to this? Maybe it would be extended to accept auto to calculate it from the current system memory...

piranna on 11 Dec 2015

So because setting max-old-space-size just crashes your application when the limit is reached it isn't very helpful. I made this https://github.com/HustleInc/regiment which will seamlessly create and preemptively replace workers so that you never hit max-old-space-size.

TylerBrock on 12 Jan 2016

👍1

Follow up: this is pure win. I'm running serveral 2x dynos in production for a very leaky app with Regiment.middleware.MemoryFootprint(750) and I've had no dropped requests, no memory overuse, and my aggregate memory usage hovers around 700-750mb/dyno.

pure_memory_win

You do not want to just set --max-semi-space-size, max-old-space-size, max-executable-size etc..., because when those limits are reached your node process simply crashes and any currently running requests are cut short.

TylerBrock on 14 Jan 2016

Thank you for the detailed analysis.

Just spent all day debugging a critical infrastructure performance issues, and finally found that one of the services was move to v4.2.8. After rolling back to v0.12 everything returned to normal.

abienkowski on 3 Feb 2016

@TylerBrock you mentioned that:

So because setting max-old-space-size just crashes your application when the limit is reached

I've been experimenting and noticed if --max-old-space-size is used without --max_semi_space_size being also adjusted then, yes, the application will get killed. If --max_semi_space_size is set to 1 then usually it doesn't get killed.

If anyone's interested, I've written more about the experiment here. If you have any ideas on how I can improve the experiment & understanding, I'd really appreciate it.

bjfletcher on 3 May 2016

👍1

I am testing --max-old-space-size=128 and i can confirm that when the application exceed 128mb of actually used memory the process gets killed.

davibe on 12 Jul 2016

Hello, the info in this issue was very useful. I made a module based on @joanniclaborde suggestion.
It is here: https://github.com/damianmr/heroku-node-settings

The good thing is that the node process never reaches its max memory available, so it never crashes in Heroku.

We have been using it in a real project with thousand of requests per minute and had no crash whatsoever.

I encourage you to try it out and report any problems you find.

damianmr on 12 Jul 2016

👍7 🎉4

Nice!! We've been using those same settings for a few months now, and it's running fine. Good idea to turn that code into a module @damianmr !

joanniclaborde on 12 Jul 2016

🔥 I had a terrible memory leak somewhere: it turned out to be another node module that my system depends on. I took it out and my memory usage went from an average of 8GB down to 4GB. That still meant that I had to pay for the $500 a month plan to keep my app from exceeding the memory limit of the lower plans. I looked everywhere and haven't been able to find why my app is still consuming so much memory.

💯 🙌 Then I came across this today and after using @damianmr's heroku-node-settings. Now I have more time to look into any potential leaks without having to fork over $500/month since I've downgrade to the 2X plan and can actually do >1 dynos! Thanks everyone for the tips that lead to the solution. Here's a pic that's worth sharing:
screen shot 2018-08-11 at 8 34 21 pm

Strangely enough, my app's memory usage is showing as ~constant now rather than the exponential rise from before.
screen shot 2018-08-11 at 8 41 52 pm