Rocket.Chat vs. NodeJS 8.11.1 (or rather > 8.9.4): Random SEGV (segmentation violation)

Created on 4 Apr 2018  路  13Comments  路  Source: RocketChat/Rocket.Chat

Description:

This is happening when running Rocket.Chat 0.61.2 as well as 0.63.0 on NodeJS 8.11.1. Both versions don't exhibit this behaviour when run on NodeJS 8.9.4. I had NodeJS 8.11.1 and Rocket.Chat running for a while on my testing instance, which didn't exhibit this behaviour. This leads me to think that the suspect is NodeJS 8.11.1 in combination with:

  • either the load of the Rocket.Chat server
  • or the data in MongoDB

Server Setup Information:

  • Version of Rocket.Chat Server: 0.63.0 & 0.61.2 (this may affect other versions)
  • Operating System: Oracle Linux 7
  • Deployment Method(snap/docker/tar/etc): tar
  • Number of Running Instances: 1
  • DB Replicaset Oplog: -
  • Node Version: 8.11.1
  • mongoDB Version: 2.6.12

Steps to Reproduce:

I can only guess here:

  • run a decently sized Rocket.Chat (in terms of amount of users) server on NodeJS 8.11.1

Expected behavior:

No SEGV

Actual behavior:

SEGV. Restart (due to systemd unit definition) of Rocket.Chat at random intervals

Relevant logs:

strace of the NodeJS process is available, but I will only share it as a last resort with one of the Rocket.Chat developers, as it possibly contains private/sensitive information.

Last lines in strace before SEGV:

read(12, "\27\3\3\0\265\252\244\276\253\262\345\32\335\230b\255\311H\331p\2200\10\245\222.\26\313\2035\210\327"..., 16384) = 9462
rt_sigprocmask(SIG_SETMASK, [], [], 8)  = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
+++ killed by SIGSEGV +++

auditd-log of failing node processes:

ANOM_ABEND: Triggered when a processes ends abnormally (with a signal that could cause a core dump, if enabled).

root@chat01 [/var/log] # ausearch --comm node
----
time->Wed Apr  4 14:35:17 2018
type=ANOM_ABEND msg=audit(1522845317.790:75): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=1043 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 14:47:00 2018
type=ANOM_ABEND msg=audit(1522846020.846:1164): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=4595 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 14:49:03 2018
type=ANOM_ABEND msg=audit(1522846143.096:1227): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5458 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 14:50:34 2018
type=ANOM_ABEND msg=audit(1522846234.113:1269): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5562 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 14:57:58 2018
type=ANOM_ABEND msg=audit(1522846678.970:1448): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5643 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 14:58:05 2018
type=ANOM_ABEND msg=audit(1522846685.473:1460): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5878 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 14:59:29 2018
type=ANOM_ABEND msg=audit(1522846769.954:1477): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5929 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 15:02:32 2018
type=ANOM_ABEND msg=audit(1522846952.269:1538): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=6007 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 15:23:04 2018
type=ANOM_ABEND msg=audit(1522848184.122:2496): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=9055 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 15:35:35 2018
type=ANOM_ABEND msg=audit(1522848935.572:3405): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=11501 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 15:40:33 2018
type=ANOM_ABEND msg=audit(1522849233.904:3470): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=11899 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 15:43:02 2018
type=ANOM_ABEND msg=audit(1522849382.823:3519): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=12040 comm="node" 

/var/log/messages (notice, that the times are identical to the auditd-logs above)

root@chat01 [~] # grep SEGV /var/log/messages
Apr  4 14:35:17 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 14:47:00 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 14:49:03 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 14:50:34 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 14:57:59 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 14:58:05 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 14:59:29 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 15:02:32 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 15:23:04 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 15:35:35 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 15:40:33 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 15:43:02 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV

Most helpful comment

All 13 comments

@TwizzyDizzy weird.. 0.63.0 in combo with Node.js 8.11.1 should have actually solved this seg fault

https://github.com/meteor/meteor/blob/devel/History.md#v1611-2018-04-02

According to the meteor release which we updated to for the 0.63.0 release... Node.js 8.11.1 actually solved the seg fault.

So for sure 0.63.0 in combo with Node.js 8.11.1 gave the seg fault and not some other combo?

Hi Aaron,

yes, I just replicated this: Rocket.Chat 0.63.0 vs. NodeJS 8.11.1.

  • Upgrade to NodeJS 8.11.1
  • do a clean (except for data in MongoDB) install via ansible
  • Server gets killed after some (not always the same) time.

... downgrade to NodeJS 8.9.4: no such behaviour anymore.

Cheers
Thomas

Same here. Still crashing. Rocketchat 0.63.0 / NodeJS 8.11.1. Downgrading NodeJS to 8.9.4 solves it.

cheers
t.

@rodrigok @sampaiodiego thoughts? This seems to be doing the complete opposite of what upgrading to 8.11.1 was supposed to give us

Well they said the patch which should solve the problem should be in 8.11.1. Maybe it's not? Here the nodejs issue for reference. https://github.com/nodejs/node/issues/19274

thanks and cheers

https://github.com/meteor/meteor/pull/9783#issuecomment-377533852 yup looks like they didn't include the segfault in 8.11.1 instead it might be in 8.11.2 :roll_eyes:

How do I downgrade the NodeJs Version within the Rocketchat server (snap)? (for dummies?) ..

I don't think you can @trstn70 .. but @geekgonecrazy released a fix yesterday, please try running sudo snap refresh rocketchat-server

~Closing this~ Due to the merging of #10351 and the release of Rocket.Chat v0.63.1 :) @geekgonecrazy informs me that a snaps release will follow suit in a day or so. :D

Still crashing with 0.63.1 and Node 8.11.1 here. Please re-open.

Revert to 8.9.4 solve the problem.

We are not using the Snap release.

If you are using 8.11.1 please downgrade node version to 8.9.4. Unfortunately until node.js releases another hot fix... We have no other choice. In snap installs we just downgraded to keep people from being effected. Docker images are already downgraded. It's only manual installs left that you have to downgrade nodejs if you did upgrade

NodeJS 8.11.2 is out. I've just upgraded my production instance and the behaviour described in this issue does not occur anymore. This is why I am closing this issue.

Cheers
Thomas

Was this page helpful?
0 / 5 - 0 ratings