Rocket.chat: snap fails to start after every update; restart works

Created on 10 Oct 2018  ·  16Comments  ·  Source: RocketChat/Rocket.Chat

Description:

Steps to reproduce:

Either: Run snap refresh rocketchat-server in case there's an update available.
Or: Run snap disable rocketchat-server && snap enable rocketchat-server

Expected behavior:

snap.rocketchat-server.rocketchat-server.service is in started state

Actual behavior:

snap.rocketchat-server.rocketchat-server.service is in failed state:

$ sudo systemctl status snap.rocketchat-server.rocketchat-server.service 
● snap.rocketchat-server.rocketchat-server.service - Service for snap application rocketchat-server.rocketchat-server
   Loaded: loaded (/etc/systemd/system/snap.rocketchat-server.rocketchat-server.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2018-10-10 20:18:17 CEST; 17s ago
  Process: 10423 ExecStart=/usr/bin/snap run rocketchat-server (code=exited, status=1/FAILURE)
 Main PID: 10423 (code=exited, status=1/FAILURE)
      CPU: 673ms

Okt 10 20:18:17  systemd[1]: snap.rocketchat-server.rocketchat-server.service: Unit entered failed state.
Okt 10 20:18:17  systemd[1]: snap.rocketchat-server.rocketchat-server.service: Failed with result 'exit-code'.
Okt 10 20:18:17 systemd[1]: snap.rocketchat-server.rocketchat-server.service: Service hold-off time over, scheduling restart.
Okt 10 20:18:17  systemd[1]: Stopped Service for snap application rocketchat-server.rocketchat-server.
Okt 10 20:18:17 systemd[1]: snap.rocketchat-server.rocketchat-server.service: Start request repeated too quickly.
Okt 10 20:18:17  systemd[1]: Failed to start Service for snap application rocketchat-server.rocketchat-server.
Okt 10 20:18:17 systemd[1]: snap.rocketchat-server.rocketchat-server.service: Unit entered failed state.
Okt 10 20:18:17 systemd[1]: snap.rocketchat-server.rocketchat-server.service: Failed with result '

The mongo service is working. The rocketchat-server service starts working again with a simple systemctl restart

Server Setup Information:

  • Version of Rocket.Chat Server: 0.69.2
  • Operating System: Debian stretch (9.5)
  • Deployment Method: snap
  • Number of Running Instances: 1
  • DB Replicaset Oplog: ?
  • NodeJS Version: these come from the snap
  • MongoDB Version: same

Server logs

Okt 10 20:18:17  systemd[1]: snap.rocketchat-server.rocketchat-server.service: Service hold-off time over, scheduling restart.
Okt 10 20:18:17  systemd[1]: Stopped Service for snap application rocketchat-server.rocketchat-server.
Okt 10 20:18:17  systemd[1]: Started Service for snap application rocketchat-server.rocketchat-server.
Okt 10 20:18:17  /usr/bin/snap[10423]: cmd.go:105: DEBUG: restarting into "/snap/core/current/usr/bin/snap"
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]: Checking if oplog has been enabled, and enabling if not
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]: MongoDB shell version: 3.2.7
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]: connecting to: test
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]: 2018-10-10T20:18:17.171+0200 W NETWORK  [thread1] Failed to connect to 127.0.0.1:27017, reason: errno:111 Connection refused
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]: 2018-10-10T20:18:17.171+0200 E QUERY    [thread1] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed :
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]: connect@src/mongo/shell/mongo.js:229:14
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]: @(connect):1:6
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]: exception: connect failed
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]: /snap/rocketchat-server/1324/programs/server/node_modules/fibers/future.js:313
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]:                                                 throw(ex);
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]:                                                 ^
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]: MongoError: failed to connect to server [localhost:27017] on first connect [MongoError: connect ECONNREFUSED 127.0.0.1:27017]
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]:     at Pool.<anonymous> (/snap/rocketchat-server/1324/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]:     at emitOne (events.js:116:13)
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]:     at Pool.emit (events.js:211:7)
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]:     at Connection.<anonymous> (/snap/rocketchat-server/1324/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-co
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]:     at Object.onceWrapper (events.js:317:30)
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]:     at emitTwo (events.js:126:13)
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]:     at Connection.emit (events.js:214:7)
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]:     at Socket.<anonymous> (/snap/rocketchat-server/1324/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/l
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]:     at Object.onceWrapper (events.js:315:30)
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]:     at emitOne (events.js:116:13)
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]:     at Socket.emit (events.js:211:7)
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]:     at emitErrorNT (internal/streams/destroy.js:64:8)
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]:     at _combinedTickCallback (internal/process/next_tick.js:138:11)
Okt 10 20:18:17  rocketchat-server.rocketchat-server[10423]:     at process._tickCallback (internal/process/next_tick.js:180:9)
Okt 10 20:18:17  systemd[1]: snap.rocketchat-server.rocketchat-server.service: Main process exited, code=exited, status=1/FAILURE
Okt 10 20:18:17  systemd[1]: snap.rocketchat-server.rocketchat-server.service: Unit entered failed state.
Okt 10 20:18:17  systemd[1]: snap.rocketchat-server.rocketchat-server.service: Failed with result 'exit-code'.
Okt 10 20:18:17  systemd[1]: snap.rocketchat-server.rocketchat-server.service: Service hold-off time over, scheduling restart.
Okt 10 20:18:17  systemd[1]: Stopped Service for snap application rocketchat-server.rocketchat-server.
Okt 10 20:18:17  systemd[1]: snap.rocketchat-server.rocketchat-server.service: Start request repeated too quickly.

This is a serious problem for us because we can't disable snap's auto updates and then rocketchat breaks on every release at a random time of day until an admin can restart it. :-/

Most helpful comment

Thanks for sharing your work around!

@lulugo should we maybe add more delay between checks so it will fail out a bit slower? Unfortunately via the snap I don’t think we can set this retry override. :(

All 16 comments

Found the server startup log.

So it seems it can't connect to mongo DB. Even though mongo DB starts successfully. This might be a race condition then?

It seems this is fixed by adding explicit After= and Requires= dependencies to the -server service.

For now I've done that in an override file:

# /etc/systemd/system/snap.rocketchat-server.rocketchat-server.service
[Unit]
# Auto-generated, DO NOT EDIT
Description=Service for snap application rocketchat-server.rocketchat-server
Requires=snap-rocketchat\x2dserver-1324.mount
Wants=network-online.target
After=snap-rocketchat\x2dserver-1324.mount network-online.target
X-Snappy=yes

[Service]
ExecStart=/usr/bin/snap run rocketchat-server
SyslogIdentifier=rocketchat-server.rocketchat-server
Restart=on-failure
WorkingDirectory=/var/snap/rocketchat-server/1324
TimeoutStopSec=30
Type=simple

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/snap.rocketchat-server.rocketchat-server.service.d/override.conf
[Unit]
After=snap.rocketchat-server.rocketchat-mongo.service
Requires=snap.rocketchat-server.rocketchat-mongo.service

Nice you got it sorted.
Another for @geekgonecrazy
Seems that maybe there is an issue with the systemd unit file on snap installs?

Technically this shouldn't be an issue. The snaps actually has a check and loop in place so that if mongo isn't ready it will retry for a good bit waiting on it to be ready. Definitely would be nice if snaps had support for service dependency. Maybe they have added this recently.

The snaps actually has a check and loop in place so that if mongo isn't ready it will retry for a good bit waiting on it to be ready.

I think I saw this message at some point during my testing, something like retrying in 10 s or so.

But somehow this isn't triggering in my original case. :thinking:

Interesting.. sounds like we need to keep an eye on this and see if others experience the same

For other users making changes to the unit file be aware that a 'disable/enable' will trash your modifications.
Back them up before you try.

For other users making changes to the unit file be aware that a 'disable/enable' will trash your modifications.

Yes.

Back them up before you try.

Better yet, don't modify those files but use systemctl edit <unit> to create and edit a override file.

I had this exact same issue. However the above fix of adding the 'after' and 'requires' did not help in my case.
What fixed it for me was adding:
[Service]
RestartSec=8

I did this in addition to the 'after' and 'requires' items above.

This also fixed my reboot issue where rocketchat would fail to start a boot. MongoDB seems to report to systemd that its 'running' but hasn't quite loaded the database which seems to take another 6 seconds.

ubuntu 16.04.5 LTS using the typical snap install.

Yes, I also had another failure after my "fix" above.

Will try this suggestion

On October 28, 2018 3:35:49 AM GMT+01:00, CSW4Lyfe notifications@github.com wrote:

I had this exact same issue. However the above fix of adding the
'after' and 'requires' did not help in my case.
What fixed it for me was adding:
[Service]
RestartSec=8

I did this in addition to the 'after' and 'requires' items above.

This also fixed my reboot issue where rocketchat would fail to start a
boot. MongoDB seems to report to systemd that its 'running' but hasn't
quote loaded the database which seems to take another 6 seconds.

ubuntu 16.04.5 LTS using the typical snap install.

@rocket-cat close

Closing this as nothing else has been reported for a while

Hello community,
My organization has been suffering from this symptom as well, it took us much time to figure out exactly what is the culprit. So now it's time to contribute back to the community because we all love Rocket.Chat.

According to this article, systemd service will stop trying to restart any service if the restart counter reaches 5 in less than a 10 sec window.

By default, rocket.chat restarts every 100ms.

We've checked our log and verified this is actually the case, as the rocket.chat service stopped after restart counter reaches 6, it no longer restarts. On the other hand, mongoDB server is still undergoing upgrade procedure.

So here's a simple fix:

sudo vim /etc/systemd/system/snap.rocketchat-server.rocketchat-server.service.d/override.conf

[Service]
StartLimitIntervalSec=0
RestartSec=20

Let me briefly explain:
StartLimitIntervalSec=0 turns off the timer, in essence this means it will restart forever
RestartSec=20 gives a 20 second window between each restart, reducing load on server.

Feel free to modify RestartSec to any number as you see fit, as long as StartLimitIntervalSec is set to 0.

Thanks for sharing your work around!

@lulugo should we maybe add more delay between checks so it will fail out a bit slower? Unfortunately via the snap I don’t think we can set this retry override. :(

@geekgonecrazy
Do you think putting these 2 lines in override.conf won't work for snap? We believe we've found the answer but we don't know if this is the right way to solve it. Still not sure how snap works in this regard, and we can't test the solution until next snap upgrade happens ....

I think it will work as a workaround.

We just are unable to bundle that up as part of the snap. With snaps we are limited to what we can do. The snap system hooks up the daemon we only get to provide it a few options and we are unable to touch those files generated as they are outside the snap confines.

Your solution looks like a very safe workaround. But we do want to make it not needed if we can

I see, thank you for your feedback, I will keep this post updated when we can confirm that the workaround actually works.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

danpospisil picture danpospisil  ·  3Comments

royalaid picture royalaid  ·  3Comments

neha1deshmukh picture neha1deshmukh  ·  3Comments

amayer5125 picture amayer5125  ·  3Comments

karlprieb picture karlprieb  ·  3Comments