This is happening when running Rocket.Chat 0.61.2 as well as 0.63.0 on NodeJS 8.11.1. Both versions don't exhibit this behaviour when run on NodeJS 8.9.4. I had NodeJS 8.11.1 and Rocket.Chat running for a while on my testing instance, which didn't exhibit this behaviour. This leads me to think that the suspect is NodeJS 8.11.1 in combination with:
I can only guess here:
No SEGV
SEGV. Restart (due to systemd unit definition) of Rocket.Chat at random intervals
strace of the NodeJS process is available, but I will only share it as a last resort with one of the Rocket.Chat developers, as it possibly contains private/sensitive information.
Last lines in strace before SEGV:
read(12, "\27\3\3\0\265\252\244\276\253\262\345\32\335\230b\255\311H\331p\2200\10\245\222.\26\313\2035\210\327"..., 16384) = 9462
rt_sigprocmask(SIG_SETMASK, [], [], 8) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
+++ killed by SIGSEGV +++
auditd-log of failing node processes:
ANOM_ABEND
: Triggered when a processes ends abnormally (with a signal that could cause a core dump, if enabled).
root@chat01 [/var/log] # ausearch --comm node
----
time->Wed Apr 4 14:35:17 2018
type=ANOM_ABEND msg=audit(1522845317.790:75): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=1043 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 14:47:00 2018
type=ANOM_ABEND msg=audit(1522846020.846:1164): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=4595 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 14:49:03 2018
type=ANOM_ABEND msg=audit(1522846143.096:1227): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5458 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 14:50:34 2018
type=ANOM_ABEND msg=audit(1522846234.113:1269): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5562 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 14:57:58 2018
type=ANOM_ABEND msg=audit(1522846678.970:1448): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5643 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 14:58:05 2018
type=ANOM_ABEND msg=audit(1522846685.473:1460): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5878 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 14:59:29 2018
type=ANOM_ABEND msg=audit(1522846769.954:1477): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5929 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 15:02:32 2018
type=ANOM_ABEND msg=audit(1522846952.269:1538): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=6007 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 15:23:04 2018
type=ANOM_ABEND msg=audit(1522848184.122:2496): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=9055 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 15:35:35 2018
type=ANOM_ABEND msg=audit(1522848935.572:3405): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=11501 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 15:40:33 2018
type=ANOM_ABEND msg=audit(1522849233.904:3470): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=11899 comm="node" reason="memory violation" sig=11
----
time->Wed Apr 4 15:43:02 2018
type=ANOM_ABEND msg=audit(1522849382.823:3519): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=12040 comm="node"
/var/log/messages (notice, that the times are identical to the auditd-logs above)
root@chat01 [~] # grep SEGV /var/log/messages
Apr 4 14:35:17 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 14:47:00 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 14:49:03 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 14:50:34 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 14:57:59 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 14:58:05 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 14:59:29 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 15:02:32 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 15:23:04 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 15:35:35 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 15:40:33 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr 4 15:43:02 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
@TwizzyDizzy weird.. 0.63.0 in combo with Node.js 8.11.1 should have actually solved this seg fault
https://github.com/meteor/meteor/blob/devel/History.md#v1611-2018-04-02
According to the meteor release which we updated to for the 0.63.0 release... Node.js 8.11.1 actually solved the seg fault.
So for sure 0.63.0 in combo with Node.js 8.11.1 gave the seg fault and not some other combo?
Hi Aaron,
yes, I just replicated this: Rocket.Chat 0.63.0 vs. NodeJS 8.11.1.
... downgrade to NodeJS 8.9.4: no such behaviour anymore.
Cheers
Thomas
Same here. Still crashing. Rocketchat 0.63.0 / NodeJS 8.11.1. Downgrading NodeJS to 8.9.4 solves it.
cheers
t.
@rodrigok @sampaiodiego thoughts? This seems to be doing the complete opposite of what upgrading to 8.11.1 was supposed to give us
Well they said the patch which should solve the problem should be in 8.11.1. Maybe it's not? Here the nodejs issue for reference. https://github.com/nodejs/node/issues/19274
thanks and cheers
https://github.com/meteor/meteor/pull/9783#issuecomment-377533852 yup looks like they didn't include the segfault in 8.11.1 instead it might be in 8.11.2 :roll_eyes:
How do I downgrade the NodeJs Version within the Rocketchat server (snap)? (for dummies?) ..
I don't think you can @trstn70 .. but @geekgonecrazy released a fix yesterday, please try running sudo snap refresh rocketchat-server
~Closing this~ Due to the merging of #10351 and the release of Rocket.Chat v0.63.1 :) @geekgonecrazy informs me that a snaps release will follow suit in a day or so. :D
Still crashing with 0.63.1 and Node 8.11.1 here. Please re-open.
Revert to 8.9.4 solve the problem.
We are not using the Snap release.
If you are using 8.11.1 please downgrade node version to 8.9.4. Unfortunately until node.js releases another hot fix... We have no other choice. In snap installs we just downgraded to keep people from being effected. Docker images are already downgraded. It's only manual installs left that you have to downgrade nodejs if you did upgrade
Also updated release notes with this note
https://forums.rocket.chat/t/rocket-chat-0-63-0-released-updated-for-0-63-1/479
NodeJS 8.11.2 is out. I've just upgraded my production instance and the behaviour described in this issue does not occur anymore. This is why I am closing this issue.
Cheers
Thomas
Most helpful comment
Also updated release notes with this note
https://forums.rocket.chat/t/rocket-chat-0-63-0-released-updated-for-0-63-1/479