Rocket.chat: Failed mongodb after 2.1.1 Ubuntu snap auto-refresh

Created on 6 Nov 2019  路  13Comments  路  Source: RocketChat/Rocket.Chat

Description:

After our Rocket.Chat server, installed as a snap on Ubuntu 18.04, did an auto-refresh and updated to version 2.1.1, the mongodb component now fails to start. It now throws a fatal error around needing to upgrade data files to 3.4 before attempting to upgrade to 3.6, and returning error code 62.

While this now seems technically a mongodb issue, it's one that's triggered by following the documented snap-install method of rocketchat-server, hence my post here. I've seem what looks like a workaround for this, by manually installing older copies of mongodb and manually updating data one version at a time, but this doesn't seem like it should be the way forward when encountering this specific issue.

Steps to reproduce:

  1. Install old version of rocketchat-server snap, with included mongo earlier than 3.2
  2. Run a snap refresh to trigger the update to v2.1.1
  3. Check status of mongodb service via sudo service snap.rocketchat-server.rocketchat-mongo status

Expected behavior:

Should show Active: success

Actual behavior:

Shows Active: failed

Server Setup Information:

  • Version of Rocket.Chat Server: 2.1.1
  • Operating System: Ubuntu 18.04
  • Deployment Method: snap
  • Number of Running Instances: 1
  • DB Replicaset Oplog: n/a
  • NodeJS Version: v8.15.1
  • MongoDB Version: v3.6.14

Client Setup Information

n/a

Additional context

n/a

Relevant logs:

/var/log/syslog:

Nov  6 03:50:35 <servername> rocketchat-server.rocketchat-server[17325]: Was unable to connect to Mongo.  Please make sure Mongo has started successfully: sudo systemctl status snap.rocketchat-server.rocketchat-mongo to view logs: sudo journalctl -u snap.rocketchat-server.rocketchat-mongo
Nov  6 03:50:35 <servername> systemd[1]: snap.rocketchat-server.rocketchat-server.service: Main process exited, code=exited, status=1/FAILURE
Nov  6 03:50:35 <servername> systemd[1]: snap.rocketchat-server.rocketchat-server.service: Failed with result 'exit-code'.
Nov  6 03:50:35 <servername> systemd[1]: snap.rocketchat-server.rocketchat-server.service: Service hold-off time over, scheduling restart.

sudo /usr/bin/snap run rocketchat-server.rocketchat-mongo output:

2019-11-06T03:54:47.505+0000 I CONTROL  [initandlisten] MongoDB starting : pid=18790 port=27017 dbpath=/var/snap/rocketchat-server/common 64-bit host=haw-i-rct-02
2019-11-06T03:54:47.505+0000 I CONTROL  [initandlisten] db version v3.6.14
2019-11-06T03:54:47.505+0000 I CONTROL  [initandlisten] git version: cbef87692475857c7ee6e764c8f5104b39c342a1
2019-11-06T03:54:47.505+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2019-11-06T03:54:47.505+0000 I CONTROL  [initandlisten] modules: none
2019-11-06T03:54:47.505+0000 I CONTROL  [initandlisten] build environment:
2019-11-06T03:54:47.505+0000 I CONTROL  [initandlisten]     distarch: x86_64
2019-11-06T03:54:47.505+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2019-11-06T03:54:47.505+0000 I CONTROL  [initandlisten] options: { net: { bindIp: "127.0.0.1" }, processManagement: { pidFilePath: "/var/snap/rocketchat-server/common/mongod.pid" }, replication: { replSet: "rs0" }, storage: { dbPath: "/var/snap/rocketchat-server/common", journal: { enabled: true }, mmapv1: { smallFiles: true } } }
2019-11-06T03:54:47.505+0000 I -        [initandlisten] Detected data files in /var/snap/rocketchat-server/common created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2019-11-06T03:54:47.505+0000 I STORAGE  [initandlisten] 
2019-11-06T03:54:47.505+0000 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2019-11-06T03:54:47.505+0000 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2019-11-06T03:54:47.505+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1460M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),compatibility=(release="3.0",require_max="3.0"),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2019-11-06T03:54:48.106+0000 I STORAGE  [initandlisten] WiredTiger message [1573012488:106050][18790:0x7facf6143a40], txn-recover: Main recovery loop: starting at 459/768
2019-11-06T03:54:48.196+0000 I STORAGE  [initandlisten] WiredTiger message [1573012488:196054][18790:0x7facf6143a40], txn-recover: Recovering log 459 through 460
2019-11-06T03:54:48.252+0000 I STORAGE  [initandlisten] WiredTiger message [1573012488:252871][18790:0x7facf6143a40], txn-recover: Recovering log 460 through 460
2019-11-06T03:54:48.296+0000 I STORAGE  [initandlisten] WiredTiger message [1573012488:296323][18790:0x7facf6143a40], txn-recover: Set global recovery timestamp: 0
2019-11-06T03:54:48.326+0000 I STORAGE  [initandlisten] Starting WiredTigerRecordStoreThread local.oplog.rs
2019-11-06T03:54:48.326+0000 I STORAGE  [initandlisten] The size storer reports that the oplog contains 7923447 records totaling to 3725872921 bytes
2019-11-06T03:54:48.326+0000 I STORAGE  [initandlisten] Sampling from the oplog between Feb 13 05:27:00:8 and Nov  6 00:38:35:1 to determine where to place markers for truncation
2019-11-06T03:54:48.326+0000 I STORAGE  [initandlisten] Taking 999 samples and assuming that each section of oplog contains approximately 79243 records totaling to 37262740 bytes
2019-11-06T03:54:48.441+0000 I STORAGE  [initandlisten] Placing a marker at optime Feb 16 02:00:00:204
<snip duplicates>
2019-11-06T03:54:48.480+0000 W STORAGE  [initandlisten] Detected configuration for non-active storage engine mmapv1 when current storage engine is wiredTiger
2019-11-06T03:54:48.480+0000 I CONTROL  [initandlisten] 
2019-11-06T03:54:48.480+0000 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2019-11-06T03:54:48.480+0000 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2019-11-06T03:54:48.480+0000 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2019-11-06T03:54:48.480+0000 I CONTROL  [initandlisten] 
2019-11-06T03:54:48.530+0000 F CONTROL  [initandlisten] ** IMPORTANT: UPGRADE PROBLEM: The data files need to be fully upgraded to version 3.4 before attempting an upgrade to 3.6; see http://dochub.mongodb.org/core/3.6-upgrade-fcv for more details.
2019-11-06T03:54:48.530+0000 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
2019-11-06T03:54:48.530+0000 I NETWORK  [initandlisten] removing socket file: /tmp/mongodb-27017.sock
2019-11-06T03:54:48.530+0000 I REPL     [initandlisten] shutdown: removing all drop-pending collections...
2019-11-06T03:54:48.530+0000 I REPL     [initandlisten] shutdown: removing checkpointTimestamp collection...
2019-11-06T03:54:48.530+0000 I REPL     [initandlisten] shutting down replication subsystems
2019-11-06T03:54:48.531+0000 W REPL     [initandlisten] ReplicationCoordinatorImpl::shutdown() called before startup() finished.  Shutting down without cleaning up the replication system
2019-11-06T03:54:48.531+0000 I STORAGE  [WTOplogJournalThread] oplog journal thread loop shutting down
2019-11-06T03:54:48.531+0000 I STORAGE  [initandlisten] WiredTigerKVEngine shutting down
2019-11-06T03:54:48.789+0000 I STORAGE  [initandlisten] WiredTiger message [1573012488:789263][18790:0x7facf6143a40], txn-recover: Main recovery loop: starting at 460/4352
2019-11-06T03:54:48.876+0000 I STORAGE  [initandlisten] WiredTiger message [1573012488:876526][18790:0x7facf6143a40], txn-recover: Recovering log 460 through 461
2019-11-06T03:54:48.926+0000 I STORAGE  [initandlisten] WiredTiger message [1573012488:926387][18790:0x7facf6143a40], txn-recover: Recovering log 461 through 461
2019-11-06T03:54:48.972+0000 I STORAGE  [initandlisten] WiredTiger message [1573012488:972617][18790:0x7facf6143a40], txn-recover: Set global recovery timestamp: 0
2019-11-06T03:54:49.084+0000 I STORAGE  [initandlisten] shutdown: removing fs lock...
2019-11-06T03:54:49.084+0000 I CONTROL  [initandlisten] now exiting
2019-11-06T03:54:49.084+0000 I CONTROL  [initandlisten] shutting down with code:62

Note that the suggested fix provided in the URL linked in the error message seems to require a mongodb instance that's already running, which can't happen until the upgrade issue is already resolved, rather a chicken and egg scenario.

Most helpful comment

We are having the same issue with mongodb and RocketChat as mentioned above, but the solution in the link above does not work...

_MongoDB shell version v3.6.14
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
2019-11-06T18:25:57.292+1100 W NETWORK [thread1] Failed to connect to 127.0.0.1:27017, in(checking socket for error after poll), reason: Connection refused
2019-11-06T18:25:57.292+1100 E QUERY [thread1] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed :
connect@src/mongo/shell/mongo.js:263:13
@(connect):1:6
exception: connect failed

Please note that I tried to revert from RocketChat 1416 back to 1414, but that did not resolve the issue...

I have no idea what the issue is and any help would be much appreciated. Note that I am new to Linux and not familiar at all with RocketChat, Mongo or Snap...

All 13 comments

Thinking about this one further, this really seems less like a Rocket.Chat Server bug as such, and more a pair of faults in both the snap system and mongodb. The snap system should not consider the refresh successful if the snap is no longer running properly afterwards (and should at the least enable a rollback), and mongodb should be able to handle upgrading data to latest version, regardless of source version. Feel free to close this bug report, unless wants to add any other thoughts.

We are having the same issue with mongodb and RocketChat as mentioned above, but the solution in the link above does not work...

_MongoDB shell version v3.6.14
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
2019-11-06T18:25:57.292+1100 W NETWORK [thread1] Failed to connect to 127.0.0.1:27017, in(checking socket for error after poll), reason: Connection refused
2019-11-06T18:25:57.292+1100 E QUERY [thread1] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed :
connect@src/mongo/shell/mongo.js:263:13
@(connect):1:6
exception: connect failed

Please note that I tried to revert from RocketChat 1416 back to 1414, but that did not resolve the issue...

I have no idea what the issue is and any help would be much appreciated. Note that I am new to Linux and not familiar at all with RocketChat, Mongo or Snap...

Please follow all of the directions there. Just reverting will not solve. https://open.rocket.chat/channel/ubuntu-snap or replying to forum post is best place to proceed

I have same problems as @Lee1310 all 3 instances failed to auto upgrade from 1.3.2. Just finished recovering from backups + blocking snapcraft dns as revert didint work and after restoring from backup snap would autoupgrade immediatly to broken version again...

PS my rocketchat instances are lxd machines and I m using nginx as a proxy.

Sorry I am new to all this and not sure the best place to post... When I follow the instructions I get:

snap revert --revision=1416 rocketchat-server

error: cannot revert "rocketchat-server": cannot find revision 1416 for snap "rocketchat-server"

rocketchat-server.mongo --eval "printjson(db.adminCommand ({getParameter: 1, featureCompatibilityVersion: 1}))"

MongoDB shell version v3.6.14
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
2019-11-06T19:36:20.306+1100 W NETWORK [thread1] Failed to connect to 127.0.0.1:27017, in(checking socket for error after poll), reason: Connection refused
2019-11-06T19:36:20.306+1100 E QUERY [thread1] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed :
connect@src/mongo/shell/mongo.js:263:13
@(connect):1:6
exception: connect failed

rocketchat-server.mongo --eval "printjson(db.adminCommand ({ setFeatureCompatibilityVersion: "3.4" }))"

MongoDB shell version v3.6.14
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
2019-11-06T19:38:08.875+1100 W NETWORK [thread1] Failed to connect to 127.0.0.1:27017, in(checking socket for error after poll), reason: Connection refused
2019-11-06T19:38:08.875+1100 E QUERY [thread1] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed :
connect@src/mongo/shell/mongo.js:263:13
@(connect):1:6
exception: connect failed

snap refresh rocketchat-server

snap "rocketchat-server" has no updates available

Everything I have tried has not worked...

Hello,
As others, my snap auto-refreshed this morning and can't start anymore with message :
** IMPORTANT: UPGRADE PROBLEM: The data files need to be fully upgraded to version 3.4 before attempting an upgrade to 3.6; see http://dochub.mongodb.org/core/3.6-upgrade-fcv for more details.

So I followed procedure : revert to rev 1416 and now, mongodb refuse to start with :
[initandlisten] WiredTiger error (-31802) [1573029936:363524][8276:0x7f304d2afd00], txn-recover: unsupported WiredTiger file version: this build only supports major/minor versions up to 1/0, and the file is version 2/0: WT_ERROR: non-specific WiredTiger error

Seems the DB version is completely stuck between 3.4 and 3.6. Any idea as we don't have access to mongodb binary to start a with --repair (or I didn't found how).

Last solution would be to start with mongodb fresh data but what about chat history and/or parameters ?

Thanks for yoru help,
Julien

I solved with this.
https://forums.rocket.chat/t/snap-news-rocket-chat-2-1/4890/13

Stop Mongo

sudo systemctl stop snap.rocketchat-server.rocketchat-mongo

Revert version back

snap revert --revision=1416 rocketchat-server

If that fails run:

snap refresh rocketchat-server --channel stable/fix-migration

Give it a few to start up and then run:

rocketchat-server.mongo --eval "printjson(db.adminCommand ({getParameter: 1, featureCompatibilityVersion: 1}))"

If it outputs 3.2 then run

rocketchat-server.mongo --eval "printjson(db.adminCommand ({ setFeatureCompatibilityVersion: "3.4" }))"

Check again and make sure its 3.4

rocketchat-server.mongo --eval "printjson(db.adminCommand ({getParameter: 1, featureCompatibilityVersion: 1}))"

Finally then you should be able to refresh back to latest

snap refresh rocketchat-server --channel=stable

Yes please refer to https://forums.rocket.chat/t/snap-news-rocket-chat-2-1/4890/14 I鈥檝e updated several times already so best place to refer.

@Azylog go back to latest and repeat with latest from the linked above

Thanks vvolas... Your instructions above finally resolved my issues... Much appreciated...

I also have been able to solve the issue with help in the link above (rocket chat forum) but without having to change channel.

With no new comments here. I'm going to go ahead and close this.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

zeigerpuppy picture zeigerpuppy  路  3Comments

mddvul22 picture mddvul22  路  3Comments

marceloschmidt picture marceloschmidt  路  3Comments

mattlin picture mattlin  路  3Comments

amayer5125 picture amayer5125  路  3Comments