Rocket.chat: Enable app cause a high CPU load and server crash

Created on 31 Mar 2020  Â·  26Comments  Â·  Source: RocketChat/Rocket.Chat

Description:

After upgrade to 3.0.5 I have noticed that Poll application is disabled. When I click to enable it our server hang and CPUs overloded.
In log I noticed a lot of entries
observeKey: '{"ordered":false,"collectionName":"users","selector":{"_id":"FycBZn8fbuihdXKWt","services.resume.loginTokens.hashedToken":"3Oi+I1eOl8oNGBRf/7p5IyXAT4tYeFVV2HRb7pLURgM="},"options":{"transform":null,"fields":{"_id":1}}}', writesToCommitWhenWeReachSteadyLength: 0, cursorDescription: '{"collectionName":"users","selector":{"_id":"FycBZn8fbuihdXKWt","services.resume.loginTokens.hashedToken":"3Oi+I1eOl8oNGBRf/7p5IyXAT4tYeFVV2HRb7pLURgM="},"options":{"transform":null,"fields":{"_id":1}}}' } Exception in setInterval callback: SwitchedToQuery {}

Steps to reproduce:

  1. Upgrade to 3.0.5
  2. Try to enable Poll app
  3. Check rocketchat logs

Expected behavior:

Application start without impact on performance

Actual behavior:

Rocket Chat crash due to high CPU load

Server Setup Information:

  • Version of Rocket.Chat Server: 8.0.5
  • Operating System: Centos 7
  • Deployment Method: Docker Compose
  • Number of Running Instances: 30
  • DB Replicaset Oplog: YES
  • NodeJS Version: v12.14.0
  • MongoDB Version: 4.0.14

Client Setup Information

  • Desktop App or Browser Version: All clients
  • Operating System: All
apps bug

All 26 comments

Hi @emikolajczak

We are currently investigating this issue.

you can give us your docker-compose.yml and we will try it

We are currently investigating this issue.

Hi, @rodrigok
Is there any investigation results?
Yesterday we had a half day downtime because of that issue #17310

Do you really think this is related to the poll app? Does the cpu load increase right after enabling the app?

Do you really think this is related to the poll app? Does the cpu load increase right after enabling the app?

I thinks that enabling app-poll not related to real cause of that issue.
It's just coincided.

Hi we Have similar issues on 3.0.5. Try with 3.0.12 is significant more stable.

Do you really think this is related to the poll app? Does the cpu load increase right after enabling the app?

I thinks that enabling app-poll not related to real cause of that issue.
It's just coincided.

@sampaiodiego FYI
I might be wrong.
We are on version 3.0.3 now and I saw, that Pool App is Disabled like issue author had.
I enabled it in saw a spikes:
image
image
image

it's network, CPU and nginx requests and MongoDB operations.

and in logs i see this:


    April 18th 2020, 08:37:32.883   s-rc-srv-02 rocketchat_rocketchat_2_1   at /app/bundle/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/fiber_pool.js:43:40
    April 18th 2020, 08:37:32.882   s-rc-srv-02 rocketchat_rocketchat_2_1   at getRoom (server/routes/avatar/room.js:17:11)
    April 18th 2020, 08:37:32.882   s-rc-srv-02 rocketchat_rocketchat_2_1   at runWithEnvironment (packages/meteor.js:1286:24)
    April 18th 2020, 08:37:32.882   s-rc-srv-02 rocketchat_rocketchat_2_1   at server/routes/avatar/room.js:25:15
    April 18th 2020, 08:37:32.882   s-rc-srv-02 rocketchat_rocketchat_2_1   at packages/meteor.js:1299:14
    April 18th 2020, 08:37:32.881   s-rc-srv-02 rocketchat_rocketchat_2_1   Exception in callback of async function: TypeError: Cannot read property 'prid' of undefined
    April 18th 2020, 08:36:33.853   s-rc-srv-02 rocketchat_rocketchat_1_1   at /app/bundle/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/fiber_pool.js:43:40
    April 18th 2020, 08:36:33.853   s-rc-srv-02 rocketchat_rocketchat_1_1   at runWithEnvironment (packages/meteor.js:1286:24)
    April 18th 2020, 08:36:33.853   s-rc-srv-02 rocketchat_rocketchat_1_1   at packages/meteor.js:1299:14
    April 18th 2020, 08:36:33.852   s-rc-srv-02 rocketchat_rocketchat_1_1   at server/routes/avatar/room.js:25:15
    April 18th 2020, 08:36:33.852   s-rc-srv-02 rocketchat_rocketchat_1_1   at getRoom (server/routes/avatar/room.js:17:11)
    April 18th 2020, 08:36:33.851   s-rc-srv-02 rocketchat_rocketchat_1_1   Exception in callback of async function: TypeError: Cannot read property 'prid' of undefined
    April 18th 2020, 08:35:32.885   s-rc-srv-02 rocketchat_rocketchat_3_1   at /app/bundle/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/fiber_pool.js:43:40
    April 18th 2020, 08:35:32.884   s-rc-srv-02 rocketchat_rocketchat_3_1   at packages/meteor.js:1299:14
    April 18th 2020, 08:35:32.882   s-rc-srv-02 rocketchat_rocketchat_3_1   at runWithEnvironment (packages/meteor.js:1286:24)
    April 18th 2020, 08:35:32.881   s-rc-srv-02 rocketchat_rocketchat_3_1   Exception in callback of async function: TypeError: Cannot read property 'prid' of undefined
    April 18th 2020, 08:35:32.881   s-rc-srv-02 rocketchat_rocketchat_3_1   at server/routes/avatar/room.js:25:15
    April 18th 2020, 08:35:32.881   s-rc-srv-02 rocketchat_rocketchat_3_1   at getRoom (server/routes/avatar/room.js:17:11)
    April 18th 2020, 08:34:32.883   s-rc-srv-02 rocketchat_rocketchat_4_1   at /app/bundle/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/fiber_pool.js:43:40
    April 18th 2020, 08:34:32.880   s-rc-srv-02 rocketchat_rocketchat_4_1   at packages/meteor.js:1299:14
    April 18th 2020, 08:34:32.880   s-rc-srv-02 rocketchat_rocketchat_4_1   at runWithEnvironment (packages/meteor.js:1286:24)
    April 18th 2020, 08:34:32.879   s-rc-srv-02 rocketchat_rocketchat_4_1   Exception in callback of async function: TypeError: Cannot read property 'prid' of undefined
    April 18th 2020, 08:34:32.879   s-rc-srv-02 rocketchat_rocketchat_4_1   at getRoom (server/routes/avatar/room.js:17:11)
    April 18th 2020, 08:34:32.879   s-rc-srv-02 rocketchat_rocketchat_4_1   at server/routes/avatar/room.js:25:15

Hi we Have similar issues on 3.0.5. Try with 3.0.12 is significant more stable.

Do you have Poll App installed on 3.0.12?

We have the same issue on v2.4.11

@ankar84 we've made some good improvements on how apps send real time data to clients during 3.0.x patch releases, starting at 3.0.8 by looking at change logs..

so I highly recommend upgrading _at least_ to 3.0.12.. I'd suggest considering upgrading to 3.1.x as it has other improvements, but as it also has other updates that can impact other things, I see it can be more stressful.

btw, @rodrigok @d-gubert is there any other improvements pending to do to solve this issue? I think we can close it already.

We still have some issues of high CPU when enabling apps @sampaiodiego

@ankar84 we've made some good improvements on how apps send real time data to clients during 3.0.x patch releases, starting at 3.0.8 by looking at change logs..

so I highly recommend upgrading _at least_ to 3.0.12.. I'd suggest considering upgrading to 3.1.x as it has other improvements, but as it also has other updates that can impact other things, I see it can be more stressful.

@sampaiodiego Hi, Diego! Thanks for your advice!
We now on our way to 3.1.1, it is in test development right now.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Is that issue resolved?

Is that issue resolved?

how to disable Poll app(where?)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Issue still here on 3.6.3
But a little lower CPU peaks

After enable 'Poll app', I can see massive operations on rocketchat_apps_logs and rocketchat_apps. Version 3.6.3

image

May be :there is something wrong with app or cloud

After enable 'Poll app', I can see massive operations on rocketchat_apps_logs and rocketchat_apps. Version 3.6.3

image

this could happen depending on how many rocket.chat instances you have running in the cluster, it should not related to the app itself (probably happens when enabling any app).

After enable 'Poll app', I can see massive operations on rocketchat_apps_logs and rocketchat_apps. Version 3.6.3
image

this could happen depending on how many rocket.chat instances you have running in the cluster, it should not related to the app itself (probably happens when enabling any app).

Diego, can I drop rocketchat_apps_logs collection? Is it safe? It's kind a big in our deployment right now.

Diego, can I drop rocketchat_apps_logs collection? Is it safe? It's kind a big in our deployment right now.

yes you can, it is safe to remove them.

yes you can, it is safe to remove them.

It wasn't so safe to remove that documents in that collection really.

rs0:PRIMARY> db.rocketchat_apps_logs.remove({})
WriteResult({ "nRemoved" : 13220647 })

And all RC instances felt in high load on CPU, RAM and Network
Only full restart of all 25 instances after collection clean was finished fixed that problem.
So, be careful!
image

@ankar84 oh, sry about that.. my thought was regarding breaking some functionality.. but have in mind that handling any big amount of data (removing, updating, inserting) will end up hammering rocket.chat servers with lots of oplog entries and that could cause that

@ankar84 oh, sry about that.. my thought was regarding breaking some functionality.. but have in mind that handling any big amount of data (removing, updating, inserting) will end up hammering rocket.chat servers with lots of oplog entries and that could cause that

Nevermind, Diego! I was asking currently about if removing that collection could cause some functionality problems, so your first answer is fully satisfied me.
But about high load of RC servers - I just will do such tasks on a weekends.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

engelgabriel picture engelgabriel  Â·  3Comments

Buzzele picture Buzzele  Â·  3Comments

neha1deshmukh picture neha1deshmukh  Â·  3Comments

mddvul22 picture mddvul22  Â·  3Comments

Kiran-Rao picture Kiran-Rao  Â·  3Comments