Pm2: Frequent requests are slow and after a while server stops responding

Created on 25 Jul 2016 · 20Comments · Source: Unitech/pm2

When making too frequent requests such as infinite loop sending requests

Expected behaviour

respond requests
...

Actual behaviour

after first couple hundred requests the server stops responding and also starts to use a lot of ram untill server crashes because of the leak
...

Steps to reproduce

run a server with pm2 send a lot of requests
...

Software versions used

OS         :ubuntu16.4,centos 7,centos 6
node.js    :6x
PM2        : 1.1.3

Medium Icebox Bug

Source

NikosEfthias

Most helpful comment

This problem can have two causes :

Since the master of the cluster is PM2, we are somewhat interfering with the cluster mode and cause the problem.
Cluster implementation issue

If anyone has this problem, please +1 to show that its a recurring problem, thanks

vmarchaud on 17 Jan 2017

👍6

All 20 comments

Fork mode? Cluster mode? Http framework?

Unitech on 26 Jul 2016

Cluster is even slower in fork more this happens and with express and just http module itself behaves the same just write a simple echo server and request the server within a infinite loop compare pm2 start app.jsagainst node app.js as I mentioned don't even compare cluster it's way slower

NikosEfthias on 26 Jul 2016

sample script please

Unitech on 26 Jul 2016

I'm also seeing that after a few hours/days my node server will stop responding to requests. I'm not sure why - it's pretty hard to debug. I'm also using PM2 1.1.3, cluster mode, and noticed the problem within the past 2 weeks.

arasmussen on 8 Aug 2016

https://stackoverflow.com/questions/29812692/node-js-server-timeout-problems-ec2-express-pm2 same problem, a year ago though. Might not be related to PM2. Any advice on debugging this?

arasmussen on 8 Aug 2016

Also relevant: http://stackoverflow.com/questions/35185522/node-js-whats-a-good-way-to-automatically-restart-a-node-server-thats-not-res

arasmussen on 8 Aug 2016

I'm gonna go out on a whim and guess that this isn't a problem with PM2 itself, rather with the underlying application. Sometimes your process will get stuck in an infinite loop or forget to respond to a request.

I'm going to handle this by detecting when a request hasn't been responded to in > 10 seconds and then calling process.exit(0); to get PM2 to restart the process. I'll also include a notification specifying which request failed. This should give me an idea of where the bug lies.

I'll report back if I learn more.

arasmussen on 8 Aug 2016

👍2 ❤1

I've learned a bit more.

What I changed: I added an 8 second timer to every https request. If it doesn't respond within this time then I log the route it got stuck on and call process.exit(0); so that PM2 will restart my process. (see implementation below)

What happened: I waited 2 days and was just notified that the restart happened. The route that was logged is a route that gets called tens of thousands of times per day so it's very strange that it works seamlessly for two days straight and then doesn't.

There are a number of things that could cause this:

DB provider takes too long to respond
My EC2 instance isn't allocated any CPU time
EC2 network is temporarily down

Although previously when it got stuck, it didn't get unstuck until I restarted the process... so maybe the entire DB connection is going down and I'm not reconnecting properly? Hard to tell... Either way, I don't think this is an issue with PM2 itself. The fix is good enough for my needs for now and not hard to implement at all.

On request init:

this._timeout = setTimeout(() => {
  this._timeout = null;

  var error = new Error('[' + new Date() + '] Request did not respond within 8 seconds: ' + path + ', ' + JSON.stringify(queryData));
  pmx.notify(error);
  console.error(error);
  process.exit(0);
}, 8000);

On request response:

if (this._timeout) {
  clearTimeout(this._timeout);
  this._timeout = null;
}

arasmussen on 10 Aug 2016

@arasmussen Is it possible that your instance run out of memory ? From previous weird issues, we know that pmx can in certain unknown circonstance leaks memory for every request, so maybe that the problem come from this ?

vmarchaud on 24 Sep 2016

@vmarchaud I don't think so, my memory usage over the past 2 weeks is mostly flat although I've seen multiple restarts. It's not clear how the process gets in this state or what the root problem is... But I'm able to detect when it happens and restart so I'm still successfully serving 99.9% of requests.

arasmussen on 24 Sep 2016

@arasmussen i will close the issue so, if you have suspension about where the issue come from re-open with informations ! I advise you to update to latest pm2 version released last week that have multiple improvements.

vmarchaud on 24 Sep 2016

This issue seems to be related to #1484
At least what @arasmussen described is exactly what users in #1484 noticed.

g-div on 9 Jan 2017

This problem can have two causes :

Since the master of the cluster is PM2, we are somewhat interfering with the cluster mode and cause the problem.
Cluster implementation issue

If anyone has this problem, please +1 to show that its a recurring problem, thanks

vmarchaud on 17 Jan 2017

👍6

@arasmussen I am using your solution (_timeout) but i have one more problem is there, If request is come from different routes then also server is restart because i have reset the timeout at the response time but if i am not define some routes then how we can reset timeout.

@vmarchaud can i change the fork mode for this problem. because i am also using cluster mode on my dev server and fork mode on prod server. But i am facing same problem on both the server also on my prod server i am not using docker so here i am using only pm2.

On prod server i am using
node = v0.10.25
pm2 = 2.0.12

rahulkulmi on 19 Jan 2017

We have multiple cluster under node v6.9 & pm2 v2.2, behind nginx (config) and it works great for us.

vmarchaud on 19 Jan 2017

@arasmussen the solution you mentioned above i used same and it is restarting my server continuously but the issue is i am able to run my admin panel of my website, because every time session get expire on restart, and second issue is i am getting error pmx.notify() is not a function. Help me.

shitaleeeshana on 23 Mar 2017

@vmarchaud I have update node version v7.10.0 and pm2 version 2.4.6
But still i have getting same exception. So please tell me what is issue is there node issue or pm2 or some thing else. And also if you have some solution please tell how to fix this problem.

What is the best way to use mongoose connection in different model I am using express server.

rahulkulmi on 30 May 2017

I have the same issue. The issue always happend.

Nginx error log ：
upstream timed out (110: Connection timed out) while connecting to upstream

the upstream is pm2

pm2 = 2.0.12
node = v4.5.0

justquanyin on 15 Jun 2017

This is still an issue that remains unresolved.

I have submitted a ticket that remains unresolved with Keymetrics about having to run pm2 link daily because after many "buffer too large" errors it eventually unlinks itself. Additionally, the memory on the "PM2 KM Agent" swells because of this activity until it causes the server to run out of memory and the applications crash.

This causes other issues - sometimes the processes don't die correctly and there are multiple instances running, so pm2 restart fails and we have to run pm2 update to get back into a running state. That's if the server would respond enough at all to actually connect. Most of the time it takes up so much memory it requires a full server reboot.

There is no visibility on Keymetrics into the resources taken up by the KM agent process, so this little issue has been crashing our application over and over with memory exceptions while our visible processes in Keymetrics are at a total of 20-30% of RAM.

We have had to completely stop the KM agent with pm2 interact stop and we have encountered no issues at all since then.

jdforsythe on 19 Jun 2018

Hi @jdforsythe

Did you try with the latest pm2 (not yet released) and new Keymetrics interface ?
I know agent + apm have been completely rewritten, and some part of the backend also.
Maybe it can solve your problem, or at least bring more information about the root cause.

But the point about monitoring agent and PM2 itself can be a good feature !