Rocket.chat: [resolved] snap refresh fail

Created on 12 May 2020  路  37Comments  路  Source: RocketChat/Rocket.Chat

snap has been running fine, it auto-refreshed today:

7 Done today at 14:22 EDT today at 14:22 EDT Auto-refresh snap "rocketchat-server"

and mongo promptly died.

systemctl status snap.rocketchat-server.rocketchat-mongo.service
芒 snap.rocketchat-server.rocketchat-mongo.service - Service for snap application rocketchat-server.rocketchat-mongo
Loaded: loaded (/etc/systemd/system/snap.rocketchat-server.rocketchat-mongo.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2020-05-12 14:52:23 EDT; 7min ago
Process: 1056 ExecStart=/usr/bin/snap run rocketchat-server.rocketchat-mongo (code=exited, status=100)
Main PID: 1056 (code=exited, status=100)

May 12 14:52:23 rocketchat systemd[1]: snap.rocketchat-server.rocketchat-mongo.service: Service hold-off time over, scheduling restart.
May 12 14:52:23 rocketchat systemd[1]: snap.rocketchat-server.rocketchat-mongo.service: Scheduled restart job, restart counter is at 5.
May 12 14:52:23 rocketchat systemd[1]: Stopped Service for snap application rocketchat-server.rocketchat-mongo.
May 12 14:52:23 rocketchat systemd[1]: snap.rocketchat-server.rocketchat-mongo.service: Start request repeated to
o quickly.
May 12 14:52:23 rocketchat systemd[1]: snap.rocketchat-server.rocketchat-mongo.service: Failed with result 'exit-
code'.
May 12 14:52:23 rocketchat systemd[1]: Failed to start Service for snap application rocketchat-server.rocketchat-
mongo.

Most helpful comment

These are the steps to revert to previous version (From ubuntu-snap channel):

snap revert rocketchat-server # Revert to 2.4.11
sudo snap run --shell rocketchat-server
snapctl get snap-refreshing #which returned "true"
snapctl set snap-refreshing=false
exit

Then I restarted the service and it came up

All 37 comments

Here's me thinking I am getting hacked. Thank god someone else is alert!
I will post some info in a sec.

$ snap refresh --time
timer: 00:00~24:00/4
last: today at 18:27 BST
next: tomorrow at 03:21 BST

3 mins later I get a nagios alert for Connection refused

Looking at syslog. A couple of key lines that stand out.

2020/05/12 18:28:04 loading Caddyfile via flag: open /var/snap/rocketchat-server/1436/Caddyfile: no such file or directory

May 12 18:28:14 rocketchat rocketchat-server.rocketchat-server[10098]: Error: ENOENT: no such file or directory, open '/snap/rocketchat-server/1436/star.json'

in the mean time:

snap revert rocketchat-server

didn't work either, mongo is back up but now caddy is down:

systemctl status snap.rocketchat-server.rocketchat-caddy.service
芒 snap.rocketchat-server.rocketchat-caddy.service - Service for snap application rocketchat-server.rocketchat-caddy
Loaded: loaded (/etc/systemd/system/snap.rocketchat-server.rocketchat-caddy.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2020-05-12 15:16:51 EDT; 3s ago
Process: 2563 ExecStart=/usr/bin/snap run rocketchat-server.rocketchat-caddy (code=exited, status=1/FAILURE)
Main PID: 2563 (code=exited, status=1/FAILURE)

May 12 15:16:50 rocketchat systemd[1]: snap.rocketchat-server.rocketchat-caddy.service: Main process exited, code
=exited, status=1/FAILURE
May 12 15:16:50 rocketchat systemd[1]: snap.rocketchat-server.rocketchat-caddy.service: Failed with result 'exit-
code'.
May 12 15:16:51 rocketchat systemd[1]: snap.rocketchat-server.rocketchat-caddy.service: Service hold-off time over, scheduling restart.
May 12 15:16:51 rocketchat systemd[1]: snap.rocketchat-server.rocketchat-caddy.service: Scheduled restart job, restart counter is at 5.
May 12 15:16:51 rocketchat systemd[1]: Stopped Service for snap application rocketchat-server.rocketchat-caddy.
May 12 15:16:51 rocketchat systemd[1]: snap.rocketchat-server.rocketchat-caddy.service: Start request repeated to
o quickly.
May 12 15:16:51 rocketchat systemd[1]: snap.rocketchat-server.rocketchat-caddy.service: Failed with result 'exit-
code'.
May 12 15:16:51 rocketchat systemd[1]: Failed to start Service for snap application rocketchat-server.rocketchat-
caddy.

We are getting same issue... because of snap refresh. See logs below

journalctl -u snap.rocketchat-server.rocketchat-mongo

2020-05-12T18:08:21.306+0000 I CONTROL [initandlisten] db version v3.4.20
2020-05-12T18:08:21.306+0000 I CONTROL [initandlisten] git version: 447847d93d6e0a21b018d5df45528e815c7c13d8
2020-05-12T18:08:21.306+0000 I CONTROL [initandlisten] allocator: tcmalloc
2020-05-12T18:08:21.306+0000 I CONTROL [initandlisten] modules: none
2020-05-12T18:08:21.306+0000 I CONTROL [initandlisten] build environment:
2020-05-12T18:08:21.306+0000 I CONTROL [initandlisten] distarch: x86_64
2020-05-12T18:08:21.306+0000 I CONTROL [initandlisten] target_arch: x86_64
2020-05-12T18:08:21.306+0000 I CONTROL [initandlisten] options: { net: { bindIp: "127.0.0.1" }, processManagement: { pidFilePath: "/va
2020-05-12T18:08:21.334+0000 I - [initandlisten] Detected data files in /var/snap/rocketchat-server/common created by the 'wired
2020-05-12T18:08:21.334+0000 I STORAGE [initandlisten]
2020-05-12T18:08:21.334+0000 I STORAGE [initandlisten] * WARNING: Using the XFS filesystem is strongly recommended with the WiredTige
2020-05-12T18:08:21.334+0000 I STORAGE [initandlisten] *
See http://dochub.mongodb.org/core/prodnotes-filesystem
2020-05-12T18:08:21.334+0000 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=7511M,session_max=20000,eviction=(thr
2020-05-12T18:08:21.394+0000 E STORAGE [initandlisten] WiredTiger error (-31802) [1589306901:394671][3772:0x7fe87395ed00], txn-recover
2020-05-12T18:08:21.394+0000 E STORAGE [initandlisten] WiredTiger error (0) [1589306901:394752][3772:0x7fe87395ed00], txn-recover: Wir
2020-05-12T18:08:21.394+0000 E STORAGE [initandlisten] WiredTiger error (0) [1589306901:394767][3772:0x7fe87395ed00], txn-recover: Thi
2020-05-12T18:08:21.394+0000 E STORAGE [initandlisten] WiredTiger error (0) [1589306901:394779][3772:0x7fe87395ed00], txn-recover: You
2020-05-12T18:08:21.394+0000 E STORAGE [initandlisten] WiredTiger error (-31802) [1589306901:394796][3772:0x7fe87395ed00], txn-recover
2020-05-12T18:08:21.396+0000 I - [initandlisten] Assertion: 28595:-31802: WT_ERROR: non-specific WiredTiger error src/mongo/db/s
2020-05-12T18:08:21.450+0000 I STORAGE [initandlisten] exception in initAndListen: 28595 -31802: WT_ERROR: non-specific WiredTiger err
2020-05-12T18:08:21.450+0000 I NETWORK [initandlisten] shutdown: going to close listening sockets...
2020-05-12T18:08:21.450+0000 I NETWORK [initandlisten] removing socket file: /tmp/mongodb-27017.sock
2020-05-12T18:08:21.450+0000 I NETWORK [initandlisten] shutdown: going to flush diaglog...
2020-05-12T18:08:21.450+0000 I CONTROL [initandlisten] now exiting
2020-05-12T18:08:21.450+0000 I CONTROL [initandlisten] shutting down with code:100
chat-mongo.service: Main process exited, code=exited, status=100/n/a
chat-mongo.service: Unit entered failed state.
chat-mongo.service: Failed with result 'exit-code'.

same here.

WireddTiger error (-31802) [1589310954:167122][1803:0x7fd8068cad00], txn-recover: unsupported WiredTiger file version: this build only supports major/minor versions up to 1/0, and the file is version 2/0: WT_ERROR: non-specific WiredTiger error

Looks like as if the new snap has an older Monogodb engine than the one before.

reverting the snap breaks somehow rocketchat-server

2020-05-12T21:07:23.164451+02:00 xxxx systemd[1]: Started Service for snap application rocketchat-server.rocketchat-server.
2020-05-12T21:07:23.342642+02:00 xxxx systemd[1]: snap.rocketchat-server.rocketchat-server.service: Succeeded.

but status is inactive

So ATM no chance to get the service back up.

These are the steps to revert to previous version (From ubuntu-snap channel):

snap revert rocketchat-server # Revert to 2.4.11
sudo snap run --shell rocketchat-server
snapctl get snap-refreshing #which returned "true"
snapctl set snap-refreshing=false
exit

Then I restarted the service and it came up

Thank you @racquetmaster !

Does this happen often....? Quite an alarming set of server logs!

@jarmitage I've been running a rocketchat server for 3-4 years. First time something like this has happened...

thank you @racquetmaster for the added steps to revert

After reverting I still had the problem that caddy didn't came up.
I symlinked
/snap/rocketchat-server/1427/resources/Caddyfile /var/snap/rocketchat-server/1427/Caddyfile
to get it up.

@jarmitage it's the second time in some years that the server didn't worked with a new snap and we had to revert.

Oooh this was a stressful experience. Is there an in-app way to control when/how the snaps are installed so this kind of thing doesn't catch us unaware? Something like how MacOS asks to install updates {now, in an hour, tomorrow}.

@alexweissman
you can set the times or disable it alltogether and do it manually:
https://snapcraft.io/docs/keeping-snaps-up-to-date

It looks like our snap was updated to 2.4.12 (from 2.4.11) and now we get a 502 Bad Gateway if we try to open the website.

Is this problem related to this one?

same problem here, had to revert the snap using @racquetmaster's commands

probably, follow the revert procedure posted by racquetmaster

Does steps by @racquetmaster's commands

works for anyone?

worked for me

@captainwasabi Thanks, let me try then.

Ours crashed out as well. We followed @racquetmaster directions and we also had to run the symbolic link that @morticah posted as well to get caddy back up.

I have the same issue

our two rocket chat servers just did the same thing going from 2.4.11 to 2.4.12, 502 bad gateway.
Thank you for the revert instructions @racquetmaster

Now the question is: why did we need to revert?

The command from @racquetmaster worked for me as well...
Steps ---

1) Open terminal
2) Login with root
sudo -s
3) Run below commands...
snap revert rocketchat-server

output as Revert to 2.4.11

sudo snap run --shell rocketchat-server
snapctl get snap-refreshing

output which returned "true"

snapctl set snap-refreshing=false
exit

4) Restart mongo DB...
sudo service snap.rocketchat-server.rocketchat-mongo restart

5) Restart Rocket Chat service...
sudo service snap.rocketchat-server.rocketchat-server restart

Same issue for us. Revert steps resolved it.

The command from @racquetmaster and @AmrishSingh worked. just had to reboot the machine as well.

puuuhhh work now ... but how this happend?

We've rolled back to the previous revision. So if currently broke you can do: sudo snap refresh rocketchat-server if you already reverted then you should be good.

Something broke in the 2.4.12 release. We're investigating and will make sure its addressed before rolling it back out.

Thanks!

They reverted the change to latest/stable

After fumbling around, I found ...
snap refresh rocketchat-server --channel stable/fix-migration

Which doesn't exist right now, but the refresh pulled 2.4.11 back down as stable. So that worked as well.

I installed on Ubuntu 18.04 LTS (snap, obviously) about 45 days ago. It looks like 2.4.12 downgraded Mongo to 3.4,20 from previous 3.6.14.

@klou yup thats what i'm seeing also. Looks like some how the snapcraft.yaml we used to generate 2.4.12 wasn't up to date. We will be correcting that and running through several more tests to make sure its fully fixed before we publish new revision.

going to open back and leave this open for a bit in case others go looking for an issue so we keep all conversation here

I'd also recommend everyone look at: https://forums.rocket.chat/t/introducing-snap-tracks/5890 consider using tracks so you can control when you go between major versions. Because while we try to avoid this ever happening on any track. The odds are a bit higher that will happen between major versions. So good opportunity to remind everyone to check it out

I was looking at that earlier. Will latest/stable ever move to 3.x (and when would that be expected?), or is this going to replace that? i.e you have to manually switch tracks. It sounds like new installs are going to the 3.x track already.

Indeed we鈥檝e directed new installs to 3.x track. The problem is when we went from 1.x to 2.x the support headache that came with that was pretty significant for snaps. A lot of people that installed and maybe knew nothing about server maintenance showed up out of the woodwork with issues. Some took a lot of effort to help to navigate without loosing data. So we really want people to stop and think about upgrading between major versions.. to do a backup like good a good sysadmin does before a major upgrade.

Not to mention major releases introduce breaking changes sometimes. Maybe we deprecate an api that a snap user has integrated. If we auto upgrade to next major release then what? If they didn鈥檛 backup they can鈥檛 roll back. Major releases almost always contain a lot of migrations.

All that being said.. I guess TBD? More feedback needed on the forum post I guess need to see what people think.

Word.

I can take a snapshot, then if it fails go back. Easy peasy. Might give it a try over one of these weekends.

Thanks for a great piece of software.

we are running several instances v2.4.11 using snap,,.they are on auto-update ..last night all went down but they were back automatically today without touching anything.

@geekgonecrazy thanks for the fix!

If we did @racquetmaster's temporary fix, do we need to do anything to revert that?

I did this:

    sudo snap run --shell rocketchat-server
    snapctl set snap-refreshing=true
    exit

I'll close this as it has been resolved. If any further help is needed, please refer to our open Support Channel or the forums. Thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Buzzele picture Buzzele  路  3Comments

zeigerpuppy picture zeigerpuppy  路  3Comments

amayer5125 picture amayer5125  路  3Comments

antn89 picture antn89  路  3Comments

brendanheywood picture brendanheywood  路  3Comments