Below is a screenshot of the past 12 hours which has been a total disaster for us to say the least. After updating to node 4.2 (from 0.10) in production we immediately exceeded all memory quotas and experienced a high volume of timeouts (even with no load and memory under the 1GB limit).
First, I apologize if this is not the place for this. I am happy to move discussion somewhere else and we will help diagnose whatever you guys need. We did this same parade with Node 0.12 and had to downgrade to 0.10.
Second, and I guess the real question here, is Heroku's 512MB of ram simply not enough to run Node 4.x? If that is the case, cool, but memory constraints definitely need to be made more clear.
Timeline:
In general you can see the memory is all over the place, maybe that is expected with newer versions of V8...
Although I don't have a screen, you can see in the first part of the graph running Node 0.10 that it stays almost perfectly flat at 256MB of ram. Under any load, that was consistent.
For reference, here is a load test we did in a dev environment running Node 4.2.1, cluster forked to 4 processes, and about 5k rpms. Also immediately hit the higher 1GB memory limit. We then dropped this down to 2 forked processes with the same result.
@rvagg @trevnorris @Fishrock123 ... ideas?
What happens when you start with (for example) node --max_old_space_size=256
? The GC is lazy (lazier than v0.10) and the default upper limit is about 1.5 GB so you'll need to cap it if your machine has less memory.
Okay just tried running with --max_old_space_size=256
using cluster and 4 processes. Hit the limit immediately so dropping down to 1 process again. Is the 256mb per process?
2015-10-14T16:46:29.457954+00:00 heroku[web.1]: source=web.1 dyno=heroku.18327029.28d659a4-2e6e-4ee5-9fb9-18dc6d99e1de sample#memory_total=578.31MB sample#memory_rss=445.58MB sample#memory_cache=0.00MB sample#memory_swap=132.73MB sample#memory_pgpgin=338334pages sample#memory_pgpgout=225287pages
Is the 256mb per process?
Yes.
Dropping down to 1 process --max_old_space_size=256
and --max_old_space_size=128
both returned VERY consistent results. Huge difference. Here is running at 128:
And here is running at --max_old_space_size=512
:
@bnoordhuis ... looking at this, we likely definitely want to make sure this is documented better
Yes very eye-opening for me at least. Fingers crossed, but I think you guys just saved us from having to revert everything again. Really appreciate the quick response. Going to leave this open through end of day while we do a few more tests on our end.
@jasnell I don't disagree but where would you put it? Maybe it's time we add a FAQ.
FWIW I just added a FAQ to the wiki here and it is linked to on the main wiki page. Perhaps this could also be linked to somewhere on nodejs.org and other places?
/cc @hunterloftis
perhaps dig through a few old issues that shared the same type of characteristics so google/seo/etc can drive people in the right direction?
@friism I recommend --max_old_space_size
when this occasionally pops up; the biggest downside is that when you specify that flag and then exceed the limit, your app will shut down hard. Heroku allows you to burst 5x the memory limit, so it's not always best to crash an app instead of letting it temporarily exceed soft limits.
This is probably documentation we should provide as well in a 'Memory-management with Node on Heroku' article.
@hunterloftis From what we've seen, the second we hit a memory limit everything pretty much starts timing out. I'd actually rather a hard kill at that point so we can get a fresh node up that can respond to requests again.
You mention that if we specify --max_old_space_size
the node will shut down immediately if we hit that limit, but doesn't that flag have to do with V8's heap and when it should start cleaning things up? I don't see how that relates to Heroku's memory limit and when a node is killed (other than exceeding Heroku's limit of course). In the tests above where we passed --max_old_space_size=128
the memory on the node was around 220MB.
@AndrewBarba yeah it looks like your app is particularly hard-hit by lazy collection, and in that case the lesser evil is shutting down a process.
I'd be interested in hearing from @bnoordhuis on this but, from what I've seen, if your app _requires_ more space than max_old_space_size
allows (ie, you're actually storing > X mb in objects, vs storing < X mb in objects and needing more frequent sweeps to clear out things you don't store anymore)... then you'll get 'allocation failed' errors (hard shutdowns).
Also keep in mind that 'old space' is just one part of the application's memory footprint, so you can expect the whole app to take > max_old_space_size
as a whole.
Might try playing with flags like --gc_interval
and/or --always_compact
. Will slow down the application, but if there's a memory crunch then is worth giving it a go.
For a kitchen sink example with arbitrary values:
node --gc_global --optimize_for_size --max_old_space_size=960 --use_idle_notification --always_compact --max_executable_size=64 --gc_interval=100 --expose_gc server.js
@hunterloftis Yes, --max_old_space_size
is a hard limit. --use_idle_notification
is currently a no-op, by the way.
Just weighing in to share my experience: I had this happen to me with 0.12.x and 4.0.x - before I migrated everything to 4.0, Heroku Support recommended --max_old_space_size and it worked like a charm. I still get some instances of the error every now and then, but the affected dyno shuts down and no great harm is done, just a couple dozen timed out requests once a week or so (out of thousands per minute).
Also, migrating to 4.0.x cut down memory usage from an average of about 500MB to 270 per dyno.
@fansworld-claudio That's great to hear. What size dyno are you running and what did you end up setting max_old_space_size
to? Also have you used this setting with cluster?
@AndrewBarba At first I had the Standard-2X (1GB) dynos and max_old at 960, then after a couple weeks of stability I scaled it down to Standard-1X (512MB) and max_old at 480 - has been stable for a couple months. YMMV though, this is a REST API that does mostly i/o with redis and mongo.
It's kinda a pain to set this for all your heroku apps (and remember to vary this based on dyno size), so I made heroku-node wraps node and sets the max_old_space_size
based on the values of $WEB_MEMORY
and $WEB_CONCURRENCY
which heroku sets based on the size of dyno you've chosen.
@apechimp I just gave that a go and this showed in the console immediately:
2015-10-31T14:51:14.352577+00:00 app[web.1]: <--- Last few GCs --->
2015-10-31T14:51:14.352577+00:00 app[web.1]:
2015-10-31T14:51:14.352579+00:00 app[web.1]: 14856 ms: Mark-sweep 48.0 (86.0) -> 48.0 (86.0) MB, 93.5 / 0 ms [allocation failure] [GC in old space requested].
2015-10-31T14:51:14.352580+00:00 app[web.1]: 14967 ms: Mark-sweep 48.0 (86.0) -> 48.0 (86.0) MB, 111.4 / 0 ms [allocation failure] [GC in old space requested].
2015-10-31T14:51:14.352581+00:00 app[web.1]: 15055 ms: Mark-sweep 48.0 (86.0) -> 48.0 (86.0) MB, 87.9 / 0 ms [last resort gc].
2015-10-31T14:51:14.352582+00:00 app[web.1]: 15178 ms: Mark-sweep 48.0 (86.0) -> 48.0 (86.0) MB, 122.7 / 0 ms [last resort gc].
2015-10-31T14:51:14.352582+00:00 app[web.1]:
2015-10-31T14:51:14.352583+00:00 app[web.1]:
2015-10-31T14:51:14.352583+00:00 app[web.1]: <--- JS stacktrace --->
2015-10-31T14:51:14.352584+00:00 app[web.1]:
2015-10-31T14:51:14.352585+00:00 app[web.1]: ==== JS stack trace =========================================
Looks like you are dividing WEB_MEMORY
by some factor of concurrency but WEB_MEMORY
as explained by that article you reference is the recommended memory for each process. As is, running this on the Performance M dyno would give you:
WEB_CONCURRENCY = 5;
WEB_MEMORY = 512;
WEB_MEMORY / (2 * WEB_CONCURRENCY) = 51.2MB
51.2MB of ram is not enough to even start up. I think you can just simplify to WEB_MEMORY / 2
or even WEB_MEMORY / 1.5
. For us 256 seems to be the sweet spot so we'll stick with / 2
@AndrewBarba, _mea culpa_ That's fixed as of [email protected].
Couldn't the garbage collector also being executed when getting and out of memory exception? This would helps in a more generic way on systems with really low memory constrains, for example when executing npm on NodeOS it starts killing processes on QEmu with the default memory settings (128mb), and the same would happen on Raspberry Pi without swap...
@piranna What exact exception do you mean? If you are having allocation failures in mind, catching all of those in some generic way is a terrible idea — many things could go wrong from that. If you are speaking about the system oom killer, intercepting it requires specific settings on the system level.
@piranna What exact exception do you mean?
The error I get is an Out of Memory exception, thrown at process level:
It happens when calling malloc()
and the system don't have enough free memory or a contiguos free chunk big enough for the requested size. I propose that when creating a new Javascript object, when trying to reserve memory capture the exception on the C++ area and if so, exec the garbage colector and try to reserve it again. Probably this is more a v8 issue...
@pirania That screenshot you have posted shows the OOM killer triggered. It's not interceptable by default.
Also, it's not necessary caused by node
process. Your process could allocate some memory at the beginning, then do nothing and still be killed by the OOM killer later (while doing nothing).
@pirania That screenshot you have posted shows the OOM killer triggered. It's not interceptable by default.
What a shame, I though it would be a good feature :-(
Also, it's not necessary caused by node process. Your process could allocate some memory at the beginning, then do nothing and still be killed by the OOM killer later (while doing nothing).
Yeah, I know it's killing random processes after it. In fact in that screenshot the error was caused by slap but nsh stopped working too :-/
For what it's worth, here's the solution we're trying now (I sure wish I had found this conversation before!!):
if [ ! "$WEB_MEMORY" = "" ]; then
if [ $WEB_MEMORY -le 512 ]; then
NODE_FLAGS="--max_semi_space_size=2 --max_old_space_size=256 --max_executable_size=192"
elif [ $WEB_MEMORY -le 768 ]; then
NODE_FLAGS="--max_semi_space_size=8 --max_old_space_size=512 --max_executable_size=384"
elif [ $WEB_MEMORY -le 1024 ]; then
NODE_FLAGS="--max_semi_space_size=16 --max_old_space_size=1024 --max_executable_size=512"
fi
fi
node $NODE_FLAGS "$@"
I'm getting the values for those flags from the V8 defaults, and it seems to be working great so far (Heroku / 512mb).
the default upper limit is about 1.5
Could it be possible to lower it when the sum of phisical memory + swap is smaller than this? It doesn't makes sense to have such a big limit if it's impossible to achieve it... This would help on memory-constrained systems.
@piranna see https://github.com/nodejs/node/issues/3370#issuecomment-148514862 and https://github.com/nodejs/node/issues/3370#issuecomment-158521878
@piranna see #3370 (comment) and #3370 (comment)
Yes, I've read them, what I'm asking for is to calculate and set them automatically on start instead of setting the values with some flags.
Or even a single flag could be a nice option: node --physical_memory=$WEB_MEMORY
I also added custom settings for modulus.io's 396mb servos:
if [ $1 -le 396 ]; then
NODE_FLAGS="--max_semi_space_size=1 --max_old_space_size=198 --max_executable_size=148"
elif ...
Or even a single flag could be a nice option: node --physical_memory=$WEB_MEMORY
Is not max_old_space_size
flag equivalent to this? Maybe it would be extended to accept auto
to calculate it from the current system memory...
So because setting max-old-space-size
just crashes your application when the limit is reached it isn't very helpful. I made this https://github.com/HustleInc/regiment which will seamlessly create and preemptively replace workers so that you never hit max-old-space-size
.
Follow up: this is pure win. I'm running serveral 2x dynos in production for a very leaky app with Regiment.middleware.MemoryFootprint(750)
and I've had no dropped requests, no memory overuse, and my aggregate memory usage hovers around 700-750mb/dyno.
You do not want to just set --max-semi-space-size
, max-old-space-size
, max-executable-size
etc..., because when those limits are reached your node process simply crashes and any currently running requests are cut short.
Thank you for the detailed analysis.
Just spent all day debugging a critical infrastructure performance issues, and finally found that one of the services was move to v4.2.8. After rolling back to v0.12 everything returned to normal.
@TylerBrock you mentioned that:
So because setting max-old-space-size just crashes your application when the limit is reached
I've been experimenting and noticed if --max-old-space-size
is used without --max_semi_space_size
being also adjusted then, yes, the application will get killed. If --max_semi_space_size
is set to 1
then usually it doesn't get killed.
If anyone's interested, I've written more about the experiment here. If you have any ideas on how I can improve the experiment & understanding, I'd really appreciate it.
I am testing --max-old-space-size=128
and i can confirm that when the application exceed 128mb of actually used memory the process gets killed.
Hello, the info in this issue was very useful. I made a module based on @joanniclaborde suggestion.
It is here: https://github.com/damianmr/heroku-node-settings
The good thing is that the node process never reaches its max memory available, so it never crashes in Heroku.
We have been using it in a real project with thousand of requests per minute and had no crash whatsoever.
I encourage you to try it out and report any problems you find.
Nice!! We've been using those same settings for a few months now, and it's running fine. Good idea to turn that code into a module @damianmr !
🔥 I had a terrible memory leak somewhere: it turned out to be another node module that my system depends on. I took it out and my memory usage went from an average of 8GB down to 4GB. That still meant that I had to pay for the $500 a month plan to keep my app from exceeding the memory limit of the lower plans. I looked everywhere and haven't been able to find why my app is still consuming so much memory.
💯 🙌 Then I came across this today and after using @damianmr's heroku-node-settings
. Now I have more time to look into any potential leaks without having to fork over $500/month since I've downgrade to the 2X plan and can actually do >1 dynos! Thanks everyone for the tips that lead to the solution. Here's a pic that's worth sharing:
Strangely enough, my app's memory usage is showing as ~constant now rather than the exponential rise from before.
Most helpful comment
Hello, the info in this issue was very useful. I made a module based on @joanniclaborde suggestion.
It is here: https://github.com/damianmr/heroku-node-settings
The good thing is that the node process never reaches its max memory available, so it never crashes in Heroku.
We have been using it in a real project with thousand of requests per minute and had no crash whatsoever.
I encourage you to try it out and report any problems you find.