Pm2: PM2 sporadically dies and loses processes

Created on 11 Mar 2016  路  17Comments  路  Source: Unitech/pm2

It's happened a few times already. I notice that the processes are gone because metrics are not being generated and pm2 ls gives:

$ pm2 ls
[PM2] Spawning PM2 daemon
[PM2] PM2 Successfully daemonized

which would indicate that the PM2 daemon was dead. However there is nothing in $HOME/.pm2/pm2.log that would indicate why this happens. Running on OS X. I monitor load and memory usage of the machine and there is nothing interesting happening in that regard.

Any ideas?

In progress

Most helpful comment

Oh, embarrassement ... After freeing some space, it's working again. I think I'll postpone the upgrade unless you recommend otherwise

All 17 comments

I may have found the reason for why this occurs. lsof shows a lot of entries for node having opened a ton of $HOME/.pm2/rpc.sock which could be explained by my incorrect use of the PM2 API.

Make sure that you call pm2.disconnect() before exiting your app

Added to PM2 documentation: https://github.com/pm2-hive/pm2-hive.github.io/commit/67ff803c2f9c0ca16410b30ea77471d541cd69ba

Would this issue be linked with #1983?

I fixed my code by making the app connect only once and disconnect on exit, and had high hopes that it would fix the issue, however the problem just reoccurred. The pm2 main daemon had disappeared and once again, there is no meaningful output in $HOME/.pm2/pm2.log. Not sure how to debug this.

_PM2 v2.1.5 died with these logs:_
_Node v5.12.0_

2016-11-19 06:00:02: ===============================================================================
2016-11-19 06:00:02: --- PM2 global error caught ---------------------------------------------------
2016-11-19 06:00:02: Time                 : Sat Nov 19 2016 06:00:02 GMT+0700 (ICT)
2016-11-19 06:00:02: Cannot read property 'on' of undefined
2016-11-19 06:00:02: TypeError: Cannot read property 'on' of undefined
    at buildProcessTree (/usr/lib/node_modules/pm2/lib/TreeKill.js:81:14)
    at module.exports (/usr/lib/node_modules/pm2/lib/TreeKill.js:32:9)
    at Object.God.killProcess (/usr/lib/node_modules/pm2/lib/God/Methods.js:223:7)
    at Object.God.stopProcessId (/usr/lib/node_modules/pm2/lib/God/ActionMethods.js:261:9)
    at Object.God.deleteProcessId (/usr/lib/node_modules/pm2/lib/God/ActionMethods.js:313:9)
    at Worker.onListen (/usr/lib/node_modules/pm2/lib/God/Reload.js:134:18)
    at Worker.g (events.js:273:16)
    at emitOne (events.js:90:13)
    at Worker.emit (events.js:182:7)
    at listening (cluster.js:504:12)
2016-11-19 06:00:02: ===============================================================================

@tuanna222 Platform ?

@vmarchaud
Here are my info"s platform:

Linux app-server 2.6.32-573.el6.x86_64 #1 SMP Thu Jul 23 15:44:03 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
LSB Version:    :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description:    CentOS release 6.8 (Final)
Release:    6.8
Codename:   Final

Cause of error: Run my app with cluster mode and add a cronjob with command: pm2 reload all
If i run app with fork mode and also add cronjob to reload all daily, PM2 is not crashed.

screen-shot-2016-11-20-at-10 14 39-am

We will do a small patch to avoid this but please note that you are using Node.js 5.x (Node v5.12.0) that is NOT a stable version. You need to upgrade your Node.js version to at least 6.x. It will certainly fix this issue.

To reproduce this issue it's like if we were using this snippet:

const spawn = require('child_process').spawn;
const ls = spawn('ls', ['-lh', '/usr']);

ls.stdout.on('data', (data) => {
  console.log(`stdout: ${data}`);
});

ls.stderr.on('data', (data) => {
  console.log(`stderr: ${data}`);
});

ls.on('close', (code) => {
  console.log(`child process exited with code ${code}`);
});

And that ls would be null.

@vmarchaud
I updated Node.js version from 5.12.0 to 7.0.0 but it didn't work in cluster mode. PM2 was still stopped after reloading.

2016-11-21 09:44:51: Starting execution sequence in -cluster mode- for app name:npm-api id:17
2016-11-21 09:44:51: Stopping app:npm-api id:_old_17
2016-11-21 09:44:51: ===============================================================================
2016-11-21 09:44:51: --- PM2 global error caught ---------------------------------------------------
2016-11-21 09:44:51: Time                 : Mon Nov 21 2016 09:44:51 GMT+0700 (ICT)
2016-11-21 09:44:51: Cannot read property 'on' of undefined
2016-11-21 09:44:51: TypeError: Cannot read property 'on' of undefined
    at buildProcessTree (/usr/lib/node_modules/pm2/lib/TreeKill.js:81:14)
    at module.exports (/usr/lib/node_modules/pm2/lib/TreeKill.js:32:9)
    at Object.God.killProcess (/usr/lib/node_modules/pm2/lib/God/Methods.js:223:7)
    at Object.God.stopProcessId (/usr/lib/node_modules/pm2/lib/God/ActionMethods.js:261:9)
    at Object.God.deleteProcessId (/usr/lib/node_modules/pm2/lib/God/ActionMethods.js:313:9)
    at Timeout._onTimeout (/usr/lib/node_modules/pm2/lib/God/Reload.js:158:18)
    at tryOnTimeout (timers.js:224:11)
    at Timer.listOnTimeout (timers.js:198:5)
2016-11-21 09:44:51: ===============================================================================
2016-11-21 09:44:51: [PM2][%s] Resurrecting PM2
2016-11-21 09:44:51: App name:npm-api id:17 online
events.js:154
      throw er; // Unhandled 'error' event
      ^

Error: spawn ps EAGAIN
    at exports._errnoException (util.js:893:11)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:182:32)
    at onErrorNT (internal/child_process.js:348:16)
    at _combinedTickCallback (internal/process/next_tick.js:74:11)
    at process._tickDomainCallback (internal/process/next_tick.js:122:9)
2016-11-21 09:45:13: ===============================================================================
2016-11-21 09:45:13: --- New PM2 Daemon started ----------------------------------------------------
2016-11-21 09:45:13: Time                 : Mon Nov 21 2016 09:45:13 GMT+0700 (ICT)
2016-11-21 09:45:13: PM2 version          : 2.1.5
2016-11-21 09:45:13: Node.js version      : 7.0.0
2016-11-21 09:45:13: Current arch         : x64
2016-11-21 09:45:13: PM2 home             : /home/app24h-user/.pm2
2016-11-21 09:45:13: PM2 PID file         : /home/app24h-user/.pm2/pm2.pid
2016-11-21 09:45:13: RPC socket file      : /home/app24h-user/.pm2/rpc.sock
2016-11-21 09:45:13: BUS socket file      : /home/app24h-user/.pm2/pub.sock
2016-11-21 09:45:13: Application log path : /home/app24h-user/.pm2/logs
2016-11-21 09:45:13: Process dump file    : /home/app24h-user/.pm2/dump.pm2
2016-11-21 09:45:13: Concurrent actions   : 2
2016-11-21 09:45:13: SIGTERM timeout      : 1600
2016-11-21 09:45:13: ===============================================================================

Could you please have try of development branch:

$ npm install Unitech/pm2#development -g
$ pm2 update

It will work now

@Unitech
It worked with development branch.
Thank you so much!

Hello and thanks for looking into it. Just wanted to mention I ran into this problem yesterday and in a minute I will try the development branch. Before I do, here is a reference to my current setup (Arch Linux on an old Intel laptop)

pm2

@tuanna222 great!
@ordtrogen could you please show me the content of ~/.pm2/pm2.log?

Oh, I hadn't checked that log ... I'm getting a lot of

2016-11-22 14:52:11: Error: ENOSPC: no space left on device, write

Which is what happened when I tried to upgrade to the development branch. I guess I should look into that first cause I don't understand how the disk can be full ...

Oh, embarrassement ... After freeing some space, it's working again. I think I'll postpone the upgrade unless you recommend otherwise

It looks the problem is back:

  • linux Ubuntu 18.04.1 LTS
  • Node v8.10.0
  • PM2 3.2.2
  • just updated to 3.2.4, hopping this is resolved in the new version
    * any chance to add a condition to ignore when 'on' of undefined
    *
    also working great for a couple of hours, then randomly stoping daemon with this msg...
2018-12-23T00:09:23: PM2 log: ===============================================================================
2018-12-23T00:09:23: PM2 log: --- PM2 global error caught ---------------------------------------------------
2018-12-23T00:09:23: PM2 log: Time                 : Sun Dec 23 2018 00:09:23 GMT+0000 (UTC)
2018-12-23T00:09:23: PM2 error: Cannot read property 'on' of undefined
2018-12-23T00:09:23: PM2 error: TypeError: Cannot read property 'on' of undefined
    at /usr/local/lib/node_modules/pm2/lib/God/ForkMode.js:142:19
    at /usr/local/lib/node_modules/pm2/node_modules/async/internal/once.js:12:16
    at next (/usr/local/lib/node_modules/pm2/node_modules/async/waterfall.js:21:29)
    at /usr/local/lib/node_modules/pm2/node_modules/async/internal/onlyOnce.js:12:16
    at WriteStream.<anonymous> (/usr/local/lib/node_modules/pm2/lib/Utility.js:181:13)
    at emitOne (events.js:116:13)
    at WriteStream.emit (events.js:211:7)
    at fs.open (fs.js:2162:10)
    at FSReqWrap.oncomplete (fs.js:135:15)
2018-12-23T00:09:23: PM2 log: ===============================================================================
2018-12-23T00:09:23: PM2 error: [PM2] Resurrecting PM2
events.js:183
      throw er; // Unhandled 'error' event
      ^

Error: spawn node EAGAIN
    at _errnoException (util.js:1022:11)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
    at onErrorNT (internal/child_process.js:372:16)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at Immediate._tickDomainCallback [as _onImmediate] (internal/process/next_tick.js:218:9)
    at runCallback (timers.js:794:20)
    at tryOnImmediate (timers.js:752:5)
    at processImmediate [as _immediateCallback] (timers.js:729:5)
Was this page helpful?
0 / 5 - 0 ratings

Related issues

shaunwarman picture shaunwarman  路  3Comments

psparago picture psparago  路  3Comments

alexpts picture alexpts  路  3Comments

ghost picture ghost  路  3Comments

jubairsaidi picture jubairsaidi  路  3Comments