Previously we were tracking stable, I think it was 3.5, though some of them may have been on candidate up until a few days ago.
Ubuntu Bionic, kernel is 4.15.0-1021-gcp, lxd is backed on ZFS.
Oct 14 21:20:14 server-name lxd.daemon[21959]: => Starting LXD
Oct 14 21:20:45 server-name lxd.daemon[21959]: lvl=warn msg="Failed to rollback transaction after error (failed to apply update 11: Failed to add project_id column: driver: bad connection): driver: bad connection" t=2018-10-14T21:20:45+0000
Oct 14 21:20:56 server-name lxd.daemon[21959]: lvl=eror msg="Failed to start the daemon: failed to open cluster database: failed to ensure schema: failed to begin transaction: failed to create dqlite connection: no available dqlite leader server found" t=2018-10-14T21:20:56+0000
Oct 14 21:20:56 server-name lxd.daemon[21959]: lvl=eror msg="Dqlite: aborting (fd=23 state=body msg=response write error: broken pipe (EPIPE))" t=2018-10-14T21:20:56+0000
Oct 14 21:20:56 server-name lxd.daemon[21959]: Error: failed to open cluster database: failed to ensure schema: failed to begin transaction: failed to create dqlite connection: no available dqlite leader server found
Oct 14 21:20:56 server-name lxd.daemon[21959]: => LXD failed to start
Per instructions in another issue, stopped lxd units, ran lxd --debug --group lxd:
DBUG[10-14|21:37:43] Connecting to a local LXD over a Unix socket
DBUG[10-14|21:37:43] Sending request to LXD etag= method=GET url=http://unix.socket/1.0
INFO[10-14|21:37:44] LXD 3.6 is starting in normal mode path=/var/snap/lxd/common/lxd
INFO[10-14|21:37:44] Kernel uid/gid map:
INFO[10-14|21:37:44] - u 0 0 4294967295
INFO[10-14|21:37:44] - g 0 0 4294967295
INFO[10-14|21:37:44] Configured LXD uid/gid map:
INFO[10-14|21:37:44] - u 0 1000000 1000000000
INFO[10-14|21:37:44] - g 0 1000000 1000000000
DBUG[10-14|21:37:44] Failed to set NETLINK_DUMP_STRICT_CHK socket option
DBUG[10-14|21:37:44] Running kernel does not support netnsid-based network retrieval
DBUG[10-14|21:37:44] Running kernel does not support uevent injection
DBUG[10-14|21:37:44] System supports unprivileged file capabilities
INFO[10-14|21:37:44] Initializing local database
INFO[10-14|21:37:44] Initializing database gateway
INFO[10-14|21:37:44] Start database node address= id=1
INFO[10-14|21:37:44] Raft: Restored from snapshot 1-25819-1539515095227
INFO[10-14|21:37:48] Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}]
INFO[10-14|21:37:48] Raft: Node at 0 [Leader] entering Leader state
INFO[10-14|21:37:48] Dqlite: starting event loop
DBUG[10-14|21:37:48] Dqlite: accepting connections
INFO[10-14|21:37:48] LXD isn't socket activated
DBUG[10-14|21:37:48] Connecting to a local LXD over a Unix socket
DBUG[10-14|21:37:48] Sending request to LXD etag= method=GET url=http://unix.socket/1.0
DBUG[10-14|21:37:48] Detected stale unix socket, deleting
INFO[10-14|21:37:48] Starting /dev/lxd handler:
INFO[10-14|21:37:48] - binding devlxd socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[10-14|21:37:48] REST API daemon:
INFO[10-14|21:37:48] - binding Unix socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[10-14|21:37:48] - binding TCP socket socket=10.240.0.92:8443
INFO[10-14|21:37:48] Initializing global database
INFO[10-14|21:37:48] Dqlite: handling new connection (fd=20)
INFO[10-14|21:37:48] Dqlite: connected address=0 attempt=0
ERROR: can't write page 2182 of file db.bin which has only 2180 pages
panic: checkpoint: disk I/O error
trace:
2018-10-14 21:37:44.85310: fsm: restore=25819 start
2018-10-14 21:37:44.85310: fsm: restore=25819 database size: 8929280
2018-10-14 21:37:44.85951: fsm: restore=25819 wal size: 3073552
2018-10-14 21:37:44.86159: fsm: restore=25819 filename: db.bin
2018-10-14 21:37:44.86160: fsm: restore=25819 transaction ID:
2018-10-14 21:37:44.87305: fsm: restore=25819 open follower: db.bin
2018-10-14 21:37:44.87436: fsm: restore=25819 done
2018-10-14 21:37:48.97013: fsm: term=1 index=25820 cmd=frames txn=25819 pages=3 commit=1 start
2018-10-14 21:37:48.97028: fsm: term=1 index=25820 cmd=frames txn=25819 pages=3 commit=1 unregister txn
2018-10-14 21:37:48.97028: fsm: term=1 index=25820 cmd=frames txn=25819 pages=3 commit=1 done
2018-10-14 21:37:48.97029: fsm: term=1 index=25821 cmd=frames txn=25820 pages=10 commit=1 start
2018-10-14 21:37:48.97036: fsm: term=1 index=25821 cmd=frames txn=25820 pages=10 commit=1 unregister txn
2018-10-14 21:37:48.97036: fsm: term=1 index=25821 cmd=frames txn=25820 pages=10 commit=1 done
2018-10-14 21:37:48.97053: fsm: term=1 index=25822 cmd=frames txn=25821 pages=2 commit=1 start
2018-10-14 21:37:48.97057: fsm: term=1 index=25822 cmd=frames txn=25821 pages=2 commit=1 unregister txn
2018-10-14 21:37:48.97059: fsm: term=1 index=25822 cmd=frames txn=25821 pages=2 commit=1 done
2018-10-14 21:37:48.97060: fsm: term=1 index=25823 cmd=frames txn=25822 pages=2 commit=1 start
2018-10-14 21:37:48.97063: fsm: term=1 index=25823 cmd=frames txn=25822 pages=2 commit=1 unregister txn
2018-10-14 21:37:48.97063: fsm: term=1 index=25823 cmd=frames txn=25822 pages=2 commit=1 done
2018-10-14 21:37:48.97070: fsm: term=1 index=25824 cmd=frames txn=25823 pages=3 commit=1 start
2018-10-14 21:37:48.97084: fsm: term=1 index=25824 cmd=frames txn=25823 pages=3 commit=1 unregister txn
2018-10-14 21:37:48.97084: fsm: term=1 index=25824 cmd=frames txn=25823 pages=3 commit=1 done
2018-10-14 21:37:48.97133: fsm: term=1 index=25825 cmd=frames txn=25824 pages=18 commit=1 start
2018-10-14 21:37:48.97163: fsm: term=1 index=25825 cmd=frames txn=25824 pages=18 commit=1 unregister txn
2018-10-14 21:37:48.97164: fsm: term=1 index=25825 cmd=frames txn=25824 pages=18 commit=1 done
2018-10-14 21:37:48.97165: fsm: term=1 index=25826 cmd=frames txn=25825 pages=3 commit=1 start
2018-10-14 21:37:48.97171: fsm: term=1 index=25826 cmd=frames txn=25825 pages=3 commit=1 unregister txn
2018-10-14 21:37:48.97171: fsm: term=1 index=25826 cmd=frames txn=25825 pages=3 commit=1 done
2018-10-14 21:37:48.97175: fsm: term=1 index=25827 cmd=frames txn=25826 pages=17 commit=1 start
2018-10-14 21:37:48.97200: fsm: term=1 index=25827 cmd=frames txn=25826 pages=17 commit=1 unregister txn
2018-10-14 21:37:48.97200: fsm: term=1 index=25827 cmd=frames txn=25826 pages=17 commit=1 done
2018-10-14 21:37:48.97212: fsm: term=1 index=25828 cmd=frames txn=25827 pages=9 commit=1 start
2018-10-14 21:37:48.97224: fsm: term=1 index=25828 cmd=frames txn=25827 pages=9 commit=1 unregister txn
2018-10-14 21:37:48.97228: fsm: term=1 index=25828 cmd=frames txn=25827 pages=9 commit=1 done
2018-10-14 21:37:48.97230: fsm: term=1 index=25829 cmd=frames txn=25828 pages=2 commit=1 start
2018-10-14 21:37:48.97253: fsm: term=1 index=25829 cmd=frames txn=25828 pages=2 commit=1 unregister txn
2018-10-14 21:37:48.97254: fsm: term=1 index=25829 cmd=frames txn=25828 pages=2 commit=1 done
2018-10-14 21:37:48.97288: fsm: term=1 index=25830 cmd=frames txn=25829 pages=3 commit=1 start
2018-10-14 21:37:48.97295: fsm: term=1 index=25830 cmd=frames txn=25829 pages=3 commit=1 unregister txn
2018-10-14 21:37:48.97295: fsm: term=1 index=25830 cmd=frames txn=25829 pages=3 commit=1 done
2018-10-14 21:37:48.97298: fsm: term=1 index=25831 cmd=frames txn=25830 pages=10 commit=1 start
2018-10-14 21:37:48.97305: fsm: term=1 index=25831 cmd=frames txn=25830 pages=10 commit=1 unregister txn
2018-10-14 21:37:48.97305: fsm: term=1 index=25831 cmd=frames txn=25830 pages=10 commit=1 done
2018-10-14 21:37:48.97306: fsm: term=1 index=25832 cmd=frames txn=25831 pages=2 commit=1 start
2018-10-14 21:37:48.97310: fsm: term=1 index=25832 cmd=frames txn=25831 pages=2 commit=1 unregister txn
2018-10-14 21:37:48.97310: fsm: term=1 index=25832 cmd=frames txn=25831 pages=2 commit=1 done
2018-10-14 21:37:48.97314: fsm: term=1 index=25833 cmd=frames txn=25832 pages=2 commit=1 start
2018-10-14 21:37:48.97317: fsm: term=1 index=25833 cmd=frames txn=25832 pages=2 commit=1 unregister txn
2018-10-14 21:37:48.97317: fsm: term=1 index=25833 cmd=frames txn=25832 pages=2 commit=1 done
2018-10-14 21:37:48.97318: fsm: term=1 index=25834 cmd=frames txn=25833 pages=3 commit=1 start
2018-10-14 21:37:48.97323: fsm: term=1 index=25834 cmd=frames txn=25833 pages=3 commit=1 unregister txn
2018-10-14 21:37:48.97323: fsm: term=1 index=25834 cmd=frames txn=25833 pages=3 commit=1 done
2018-10-14 21:37:48.97327: fsm: term=1 index=25835 cmd=frames txn=25834 pages=15 commit=1 start
2018-10-14 21:37:48.97338: fsm: term=1 index=25835 cmd=frames txn=25834 pages=15 commit=1 unregister txn
2018-10-14 21:37:48.97338: fsm: term=1 index=25835 cmd=frames txn=25834 pages=15 commit=1 done
2018-10-14 21:37:48.97341: fsm: term=1 index=25836 cmd=frames txn=25835 pages=6 commit=1 start
2018-10-14 21:37:48.97348: fsm: term=1 index=25836 cmd=frames txn=25835 pages=6 commit=1 unregister txn
2018-10-14 21:37:48.97348: fsm: term=1 index=25836 cmd=frames txn=25835 pages=6 commit=1 done
2018-10-14 21:37:48.97350: fsm: term=1 index=25837 cmd=frames txn=25836 pages=15 commit=1 start
2018-10-14 21:37:48.97361: fsm: term=1 index=25837 cmd=frames txn=25836 pages=15 commit=1 unregister txn
2018-10-14 21:37:48.97361: fsm: term=1 index=25837 cmd=frames txn=25836 pages=15 commit=1 done
2018-10-14 21:37:48.97367: fsm: term=1 index=25838 cmd=frames txn=25837 pages=9 commit=1 start
2018-10-14 21:37:48.97377: fsm: term=1 index=25838 cmd=frames txn=25837 pages=9 commit=1 unregister txn
2018-10-14 21:37:48.97377: fsm: term=1 index=25838 cmd=frames txn=25837 pages=9 commit=1 done
2018-10-14 21:37:48.97379: fsm: term=1 index=25839 cmd=frames txn=25838 pages=3 commit=1 start
2018-10-14 21:37:48.97425: fsm: term=1 index=25839 cmd=frames txn=25838 pages=3 commit=1 unregister txn
2018-10-14 21:37:48.97425: fsm: term=1 index=25839 cmd=frames txn=25838 pages=3 commit=1 done
2018-10-14 21:37:48.97428: fsm: term=1 index=25840 cmd=frames txn=25839 pages=18 commit=1 start
2018-10-14 21:37:48.97441: fsm: term=1 index=25840 cmd=frames txn=25839 pages=18 commit=1 unregister txn
2018-10-14 21:37:48.97441: fsm: term=1 index=25840 cmd=frames txn=25839 pages=18 commit=1 done
2018-10-14 21:37:48.97453: fsm: term=1 index=25841 cmd=frames txn=25840 pages=3 commit=1 start
2018-10-14 21:37:48.97460: fsm: term=1 index=25841 cmd=frames txn=25840 pages=3 commit=1 unregister txn
2018-10-14 21:37:48.97460: fsm: term=1 index=25841 cmd=frames txn=25840 pages=3 commit=1 done
2018-10-14 21:37:48.97482: fsm: term=1 index=25842 cmd=frames txn=25841 pages=13 commit=1 start
2018-10-14 21:37:48.97492: fsm: term=1 index=25842 cmd=frames txn=25841 pages=13 commit=1 unregister txn
2018-10-14 21:37:48.97492: fsm: term=1 index=25842 cmd=frames txn=25841 pages=13 commit=1 done
2018-10-14 21:37:48.97494: fsm: term=1 index=25843 cmd=frames txn=25842 pages=9 commit=1 start
2018-10-14 21:37:48.97501: fsm: term=1 index=25843 cmd=frames txn=25842 pages=9 commit=1 unregister txn
2018-10-14 21:37:48.97501: fsm: term=1 index=25843 cmd=frames txn=25842 pages=9 commit=1 done
2018-10-14 21:37:48.97503: fsm: term=1 index=25844 cmd=frames txn=25843 pages=2 commit=1 start
2018-10-14 21:37:48.97505: fsm: term=1 index=25844 cmd=frames txn=25843 pages=2 commit=1 unregister txn
2018-10-14 21:37:48.97505: fsm: term=1 index=25844 cmd=frames txn=25843 pages=2 commit=1 done
2018-10-14 21:37:48.97510: fsm: term=1 index=25845 cmd=frames txn=25844 pages=2 commit=1 start
2018-10-14 21:37:48.97514: fsm: term=1 index=25845 cmd=frames txn=25844 pages=2 commit=1 unregister txn
2018-10-14 21:37:48.97514: fsm: term=1 index=25845 cmd=frames txn=25844 pages=2 commit=1 done
2018-10-14 21:37:48.97515: fsm: term=1 index=25846 cmd=frames txn=25845 pages=3 commit=1 start
2018-10-14 21:37:48.97518: fsm: term=1 index=25846 cmd=frames txn=25845 pages=3 commit=1 unregister txn
2018-10-14 21:37:48.97518: fsm: term=1 index=25846 cmd=frames txn=25845 pages=3 commit=1 done
2018-10-14 21:37:48.97520: fsm: term=1 index=25847 cmd=frames txn=25846 pages=10 commit=1 start
2018-10-14 21:37:48.97529: fsm: term=1 index=25847 cmd=frames txn=25846 pages=10 commit=1 unregister txn
2018-10-14 21:37:48.97529: fsm: term=1 index=25847 cmd=frames txn=25846 pages=10 commit=1 done
2018-10-14 21:37:48.97530: fsm: term=1 index=25848 cmd=frames txn=25847 pages=2 commit=1 start
2018-10-14 21:37:48.97534: fsm: term=1 index=25848 cmd=frames txn=25847 pages=2 commit=1 unregister txn
2018-10-14 21:37:48.97535: fsm: term=1 index=25848 cmd=frames txn=25847 pages=2 commit=1 done
2018-10-14 21:37:48.97535: fsm: term=1 index=25849 cmd=frames txn=25848 pages=2 commit=1 start
2018-10-14 21:37:48.97538: fsm: term=1 index=25849 cmd=frames txn=25848 pages=2 commit=1 unregister txn
2018-10-14 21:37:48.97538: fsm: term=1 index=25849 cmd=frames txn=25848 pages=2 commit=1 done
2018-10-14 21:37:48.97541: fsm: term=1 index=25850 cmd=frames txn=25849 pages=3 commit=1 start
2018-10-14 21:37:48.97543: fsm: term=1 index=25850 cmd=frames txn=25849 pages=3 commit=1 unregister txn
2018-10-14 21:37:48.97543: fsm: term=1 index=25850 cmd=frames txn=25849 pages=3 commit=1 done
2018-10-14 21:37:48.97545: fsm: term=1 index=25851 cmd=frames txn=25850 pages=10 commit=1 start
2018-10-14 21:37:48.97551: fsm: term=1 index=25851 cmd=frames txn=25850 pages=10 commit=1 unregister txn
2018-10-14 21:37:48.97551: fsm: term=1 index=25851 cmd=frames txn=25850 pages=10 commit=1 done
2018-10-14 21:37:48.97551: fsm: term=1 index=25852 cmd=frames txn=25851 pages=2 commit=1 start
2018-10-14 21:37:48.97554: fsm: term=1 index=25852 cmd=frames txn=25851 pages=2 commit=1 unregister txn
2018-10-14 21:37:48.97554: fsm: term=1 index=25852 cmd=frames txn=25851 pages=2 commit=1 done
2018-10-14 21:37:48.97555: fsm: term=1 index=25853 cmd=frames txn=25852 pages=2 commit=1 start
2018-10-14 21:37:48.97557: fsm: term=1 index=25853 cmd=frames txn=25852 pages=2 commit=1 unregister txn
2018-10-14 21:37:48.97557: fsm: term=1 index=25853 cmd=frames txn=25852 pages=2 commit=1 done
2018-10-14 21:37:48.97559: fsm: term=1 index=25854 cmd=frames txn=25853 pages=3 commit=1 start
2018-10-14 21:37:48.97567: fsm: term=1 index=25854 cmd=frames txn=25853 pages=3 commit=1 unregister txn
2018-10-14 21:37:48.97567: fsm: term=1 index=25854 cmd=frames txn=25853 pages=3 commit=1 done
2018-10-14 21:37:48.97568: fsm: term=1 index=25855 cmd=frames txn=25854 pages=11 commit=1 start
2018-10-14 21:37:48.97575: fsm: term=1 index=25855 cmd=frames txn=25854 pages=11 commit=1 unregister txn
2018-10-14 21:37:48.97575: fsm: term=1 index=25855 cmd=frames txn=25854 pages=11 commit=1 done
2018-10-14 21:37:48.97575: fsm: term=1 index=25856 cmd=frames txn=25855 pages=3 commit=1 start
2018-10-14 21:37:48.97579: fsm: term=1 index=25856 cmd=frames txn=25855 pages=3 commit=1 unregister txn
2018-10-14 21:37:48.97580: fsm: term=1 index=25856 cmd=frames txn=25855 pages=3 commit=1 done
2018-10-14 21:37:48.97581: fsm: term=1 index=25857 cmd=frames txn=25856 pages=10 commit=1 start
2018-10-14 21:37:48.97588: fsm: term=1 index=25857 cmd=frames txn=25856 pages=10 commit=1 unregister txn
2018-10-14 21:37:48.97588: fsm: term=1 index=25857 cmd=frames txn=25856 pages=10 commit=1 done
2018-10-14 21:37:48.97589: fsm: term=1 index=25858 cmd=frames txn=25857 pages=9 commit=1 start
2018-10-14 21:37:48.97598: fsm: term=1 index=25858 cmd=frames txn=25857 pages=9 commit=1 unregister txn
2018-10-14 21:37:48.97598: fsm: term=1 index=25858 cmd=frames txn=25857 pages=9 commit=1 done
2018-10-14 21:37:48.97599: fsm: term=1 index=25859 cmd=checkpoint file=db.bin start
2018-10-14 21:37:48.97631: fsm: term=1 index=25859 cmd=checkpoint failed: checkpoint: disk I/O error
goroutine 55 [running]:
github.com/CanonicalLtd/go-dqlite/internal/trace.(*Tracer).Panic(0xc00031e000, 0x1159e31, 0x2, 0xc000cf7d68, 0x1, 0x1)
/build/lxd/parts/lxd/go/src/github.com/CanonicalLtd/go-dqlite/internal/trace/tracer.go:59 +0x12a
github.com/CanonicalLtd/go-dqlite/internal/replication.(*FSM).Apply(0xc000180ee0, 0xc000e8aff0, 0x0, 0x0)
/build/lxd/parts/lxd/go/src/github.com/CanonicalLtd/go-dqlite/internal/replication/fsm.go:84 +0x136
github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc000e861c0)
/build/lxd/parts/lxd/go/src/github.com/hashicorp/raft/fsm.go:57 +0x155
github.com/hashicorp/raft.(*Raft).runFSM(0xc00033c000)
/build/lxd/parts/lxd/go/src/github.com/hashicorp/raft/fsm.go:120 +0x2ef
github.com/hashicorp/raft.(*Raft).runFSM-fm()
/build/lxd/parts/lxd/go/src/github.com/hashicorp/raft/api.go:506 +0x2a
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc00033c000, 0xc000178630)
/build/lxd/parts/lxd/go/src/github.com/hashicorp/raft/state.go:146 +0x53
created by github.com/hashicorp/raft.(*raftState).goFunc
/build/lxd/parts/lxd/go/src/github.com/hashicorp/raft/state.go:144 +0x66
If there's any further information required or anything I can test, you can ping me on #lxcontainers for a faster response.
Can you do:
Cheers, hope this is helpful:
root@server-name:~# systemctl stop snap.lxd.daemon.service snap.lxd.daemon.unix.socket
root@server-name:~# pkill -9 lxd
root@server-name:~# lxd --debug --group lxd
DBUG[10-14|21:51:29] Connecting to a local LXD over a Unix socket
DBUG[10-14|21:51:29] Sending request to LXD etag= method=GET url=http://unix.socket/1.0
INFO[10-14|21:51:29] LXD 3.6 is starting in normal mode path=/var/snap/lxd/common/lxd
INFO[10-14|21:51:29] Kernel uid/gid map:
INFO[10-14|21:51:29] - u 0 0 4294967295
INFO[10-14|21:51:29] - g 0 0 4294967295
INFO[10-14|21:51:29] Configured LXD uid/gid map:
INFO[10-14|21:51:29] - u 0 1000000 1000000000
INFO[10-14|21:51:29] - g 0 1000000 1000000000
DBUG[10-14|21:51:29] Failed to set NETLINK_DUMP_STRICT_CHK socket option
DBUG[10-14|21:51:29] Running kernel does not support netnsid-based network retrieval
DBUG[10-14|21:51:29] Running kernel does not support uevent injection
DBUG[10-14|21:51:29] System supports unprivileged file capabilities
INFO[10-14|21:51:29] Initializing local database
INFO[10-14|21:51:29] Initializing database gateway
INFO[10-14|21:51:29] Start database node address= id=1
INFO[10-14|21:51:29] Raft: Restored from snapshot 1-25819-1539515095227
INFO[10-14|21:51:33] Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}]
INFO[10-14|21:51:33] Raft: Node at 0 [Leader] entering Leader state
INFO[10-14|21:51:33] Dqlite: starting event loop
DBUG[10-14|21:51:33] Dqlite: accepting connections
INFO[10-14|21:51:33] LXD isn't socket activated
DBUG[10-14|21:51:33] Connecting to a local LXD over a Unix socket
DBUG[10-14|21:51:33] Sending request to LXD etag= method=GET url=http://unix.socket/1.0
DBUG[10-14|21:51:33] Detected stale unix socket, deleting
DBUG[10-14|21:51:33] Detected stale unix socket, deleting
INFO[10-14|21:51:33] Starting /dev/lxd handler:
INFO[10-14|21:51:33] - binding devlxd socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[10-14|21:51:33] REST API daemon:
INFO[10-14|21:51:33] - binding Unix socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[10-14|21:51:33] - binding TCP socket socket=10.240.0.92:8443
INFO[10-14|21:51:33] Initializing global database
INFO[10-14|21:51:33] Dqlite: handling new connection (fd=20)
INFO[10-14|21:51:33] Dqlite: connected address=0 attempt=0
INFO[10-14|21:51:34] Dqlite: closing client
INFO[10-14|21:51:34] Dqlite: handling new connection (fd=22)
INFO[10-14|21:51:39] Dqlite: closing client
DBUG[10-14|21:51:39] Dqlite: server connection failed err=failed to send Leader request: failed to receive response: failed to receive header: failed to receive header: read unix @->@00064: i/o timeout address=0 attempt=0
DBUG[10-14|21:51:39] Dqlite: connection failed err=no available dqlite leader server found attempt=0
INFO[10-14|21:51:41] Dqlite: handling new connection (fd=24)
INFO[10-14|21:51:41] Dqlite: connected address=0 attempt=1
INFO[10-14|21:51:41] Updating the LXD global schema. Backup made as "global.bak"
DBUG[10-14|21:51:44] Updating global DB schema from 10 to 11
DBUG[10-14|21:51:44] Updating global DB schema from 11 to 12
INFO[10-14|21:51:50] Dqlite: closing client
WARN[10-14|21:51:50] Failed to rollback transaction after error (failed to apply update 11: Failed to add project_id column: driver: bad connection): driver: bad connection
DBUG[10-14|21:51:50] Database error: &errors.errorString{s:"failed to apply update 11: Failed to add project_id column: driver: bad connection"}
DBUG[10-14|21:51:50] Retry failed db interaction (failed to apply update 11: Failed to add project_id column: driver: bad connection)
INFO[10-14|21:51:50] Dqlite: handling new connection (fd=22)
INFO[10-14|21:51:55] Dqlite: closing client
DBUG[10-14|21:51:55] Dqlite: server connection failed err=failed to send Leader request: failed to receive response: failed to receive header: failed to receive header: read unix @->@00064: i/o timeout address=0 attempt=0
DBUG[10-14|21:51:55] Dqlite: connection failed err=no available dqlite leader server found attempt=0
INFO[10-14|21:52:00] Dqlite: closing client
DBUG[10-14|21:52:00] Dqlite: server connection failed err=failed to send Leader request: failed to receive response: failed to receive header: failed to receive header: read unix @->@00064: i/o timeout address=0 attempt=1
DBUG[10-14|21:52:00] Dqlite: connection failed err=no available dqlite leader server found attempt=1
DBUG[10-14|21:52:00] Database error: failed to begin transaction: failed to create dqlite connection: no available dqlite leader server found
EROR[10-14|21:52:00] Failed to start the daemon: failed to open cluster database: failed to ensure schema: failed to begin transaction: failed to create dqlite connection: no available dqlite leader server found
INFO[10-14|21:52:00] Starting shutdown sequence
INFO[10-14|21:52:00] Stopping REST API handler:
INFO[10-14|21:52:00] - closing socket socket=10.240.0.92:8443
INFO[10-14|21:52:00] - closing socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[10-14|21:52:00] Stopping /dev/lxd handler:
INFO[10-14|21:52:00] - closing socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[10-14|21:52:00] Stop database gateway
INFO[10-14|21:52:00] Stop raft instance
EROR[10-14|21:52:00] Dqlite: aborting (fd=24 state=body msg=response write error: broken pipe (EPIPE))
INFO[10-14|21:52:00] Dqlite: handling new connection (fd=14)
INFO[10-14|21:52:00] Dqlite: stopping event loop
INFO[10-14|21:52:00] Dqlite: event loop stopped
DBUG[10-14|21:52:00] Not unmounting temporary filesystems (containers are still running)
INFO[10-14|21:52:00] Saving simplestreams cache
INFO[10-14|21:52:00] Saved simplestreams cache
Error: failed to open cluster database: failed to ensure schema: failed to begin transaction: failed to create dqlite connection: no available dqlite leader server found
Summarising my conversation with St茅phane in case anyone searches for this in the future - I believe this was caused by running 3.5 on the candidate stream, downgrading to 3.5/stable when 3.6 caused us issues, and then stable being updated to 3.6 again. In short, going 3.5->3.6->3.5->3.6 broke things in dqlite. :(
stgraber sent over a debug 3.5 binary that brought the most crucial daemons back up, but there are a few that won't start still even with it in place. Here's the debug output from the binary:
INFO[10-15|08:17:44] LXD 3.5 is starting in normal mode path=/var/snap/lxd/common/lxd
INFO[10-15|08:17:44] Kernel uid/gid map:
INFO[10-15|08:17:44] - u 0 0 4294967295
INFO[10-15|08:17:44] - g 0 0 4294967295
INFO[10-15|08:17:44] Configured LXD uid/gid map:
INFO[10-15|08:17:44] - u 0 1000000 1000000000
INFO[10-15|08:17:44] - g 0 1000000 1000000000
DBUG[10-15|08:17:44] Running kernel does not support netnsid-based network retrieval
INFO[10-15|08:17:44] Initializing local database
INFO[10-15|08:17:44] Initializing database gateway
INFO[10-15|08:17:44] Start database node address= id=1
INFO[10-15|08:17:44] Raft: Restored from snapshot 1-10492-1539503278922
INFO[10-15|08:17:50] Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}]
INFO[10-15|08:17:50] Raft: Node at 0 [Leader] entering Leader state
INFO[10-15|08:17:50] Dqlite: starting event loop
DBUG[10-15|08:17:50] Dqlite: accepting connections
INFO[10-15|08:17:50] LXD isn't socket activated
INFO[10-15|08:17:50] Starting /dev/lxd handler:
INFO[10-15|08:17:50] - binding devlxd socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[10-15|08:17:50] REST API daemon:
INFO[10-15|08:17:50] - binding Unix socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[10-15|08:17:50] - binding TCP socket socket=10.240.0.16:8443
INFO[10-15|08:17:50] Initializing global database
INFO[10-15|08:17:50] Dqlite: handling new connection (fd=20)
INFO[10-15|08:17:50] Dqlite: connected address=0 attempt=0
INFO[10-15|08:17:51] Dqlite: closing client
INFO[10-15|08:17:51] Dqlite: handling new connection (fd=22)
INFO[10-15|08:17:56] Dqlite: closing client
DBUG[10-15|08:17:56] Dqlite: server connection failed err=failed to send Leader request: failed to receive response: failed to receive header: failed to receive header: read unix @->@062c1: i/o timeout address=0 attempt=0
DBUG[10-15|08:17:56] Dqlite: connection failed err=no available dqlite leader server found attempt=0
INFO[10-15|08:18:01] Dqlite: closing client
DBUG[10-15|08:18:01] Dqlite: server connection failed err=failed to send Leader request: failed to receive response: failed to receive header: failed to receive header: read unix @->@062c1: i/o timeout address=0 attempt=1
DBUG[10-15|08:18:01] Dqlite: connection failed err=no available dqlite leader server found attempt=1
DBUG[10-15|08:18:02] Database error: failed to begin transaction: failed to create dqlite connection: no available dqlite leader server found
EROR[10-15|08:18:02] Failed to start the daemon: failed to open cluster database: failed to ensure schema: failed to begin transaction: failed to create dqlite connection: no available dqlite leader server found
INFO[10-15|08:18:02] Starting shutdown sequence
INFO[10-15|08:18:02] Stopping REST API handler:
INFO[10-15|08:18:02] - closing socket socket=10.240.0.16:8443
INFO[10-15|08:18:02] - closing socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[10-15|08:18:02] Stopping /dev/lxd handler:
INFO[10-15|08:18:02] - closing socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[10-15|08:18:02] Stop database gateway
INFO[10-15|08:18:02] Stop raft instance
INFO[10-15|08:18:02] Dqlite: handling new connection (fd=15)
INFO[10-15|08:18:03] Dqlite: stopping event loop
INFO[10-15|08:18:03] Dqlite: event loop stopped
DBUG[10-15|08:18:03] Not unmounting temporary filesystems (containers are still running)
INFO[10-15|08:18:03] Saving simplestreams cache
INFO[10-15|08:18:03] Saved simplestreams cache
Error: failed to open cluster database: failed to ensure schema: failed to begin transaction: failed to create dqlite connection: no available dqlite leader server found
In the database dir, global/logs.db is nearly 5GB on this machine, presumably from systemd bouncing it over and over and nothing ever got cleaned up. Let me know if I can provide any further information to help unbreak this, thanks!
Is there anything that can be done to revert successfully to 3.5? this has killed production lxd for us. current containers are running but any launch or attempt to create snapshots fails.
Once we have this fixed. How do we stop these auto upgrades in snap?
Merged free's fix, will immediately push to candidate and roll it out in stable so it should be available in the next couple of hours.
I have recent exports of all my containers, created with "lxd export --container-only". I tried importing one of them to a new Ubuntu 18.04 system with LXD 3.6, and it seems to work, after working around 3 problems:
I'll probably wait for the snap update. In the meantime, this is an exercise/test of LXD backup procedures.
Snap is currently building, ETA for candidate is about 30min
Unfortunately it doesn't work @stgraber after switching to channel "candidate", build 9210 we experience the following error:
lvl=info msg="LXD 3.6 is starting in normal mode" path=/var/snap/lxd/common/lxd t=2018-10-15T14:28:02+0000
lvl=info msg="Kernel uid/gid map:" t=2018-10-15T14:28:02+0000
lvl=info msg=" - u 0 0 4294967295" t=2018-10-15T14:28:02+0000
lvl=info msg=" - g 0 0 4294967295" t=2018-10-15T14:28:02+0000
lvl=info msg="Configured LXD uid/gid map:" t=2018-10-15T14:28:02+0000
lvl=info msg=" - u 0 1000000 1000000000" t=2018-10-15T14:28:02+0000
lvl=info msg=" - g 0 1000000 1000000000" t=2018-10-15T14:28:02+0000
lvl=info msg="Initializing local database" t=2018-10-15T14:28:02+0000
lvl=info msg="Initializing database gateway" t=2018-10-15T14:28:02+0000
address= id=1 lvl=info msg="Start database node" t=2018-10-15T14:28:02+0000
lvl=info msg="Raft: Restored from snapshot 1-881483-1539609801319" t=2018-10-15T14:28:03+0000
lvl=info msg="Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}]" t=2018-10-15T14:28:03+0000
lvl=info msg="Raft: Node at 0 [Leader] entering Leader state" t=2018-10-15T14:28:03+0000
lvl=info msg="Dqlite: starting event loop" t=2018-10-15T14:28:03+0000
lvl=info msg="LXD is socket activated" t=2018-10-15T14:28:03+0000
lvl=info msg="Starting /dev/lxd handler:" t=2018-10-15T14:28:03+0000
lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock t=2018-10-15T14:28:03+0000
lvl=info msg="REST API daemon:" t=2018-10-15T14:28:03+0000
lvl=info msg=" - binding Unix socket" socket=/var/snap/lxd/common/lxd/unix.socket t=2018-10-15T14:28:03+0000
lvl=info msg=" - binding TCP socket" socket=10.240.0.85:8443 t=2018-10-15T14:28:03+0000
lvl=info msg="Initializing global database" t=2018-10-15T14:28:03+0000
lvl=info msg="Dqlite: handling new connection (fd=23)" t=2018-10-15T14:28:03+0000
lvl=info msg="Dqlite: connected address=0 attempt=0" t=2018-10-15T14:28:03+0000
lvl=info msg="Updating the LXD global schema. Backup made as \"global.bak\"" t=2018-10-15T14:28:03+0000
lvl=info msg="Dqlite: closing client" t=2018-10-15T14:29:17+0000
lvl=warn msg="Failed to rollback transaction after error (failed to apply update 11: Failed to add project_id column: driver: bad connection): driver: bad connection" t=2018-10-15T14:29:17+0000
lvl=info msg="Dqlite: handling new connection (fd=26)" t=2018-10-15T14:29:17+0000
lvl=info msg="Dqlite: closing client" t=2018-10-15T14:29:22+0000
lvl=info msg="Dqlite: closing client" t=2018-10-15T14:29:27+0000
lvl=eror msg="Failed to start the daemon: failed to open cluster database: failed to ensure schema: failed to begin transaction: failed to create dqlite connection: no available dqlite leader server found" t=2018-10-15T14:29:27+0000
lvl=info msg="Starting shutdown sequence" t=2018-10-15T14:29:27+0000
lvl=info msg="Stopping REST API handler:" t=2018-10-15T14:29:27+0000
lvl=info msg=" - closing socket" socket=10.240.0.85:8443 t=2018-10-15T14:29:27+0000
lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/unix.socket t=2018-10-15T14:29:27+0000
lvl=info msg="Stopping /dev/lxd handler:" t=2018-10-15T14:29:27+0000
lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/devlxd/sock t=2018-10-15T14:29:27+0000
lvl=info msg="Stop database gateway" t=2018-10-15T14:29:27+0000
lvl=info msg="Stop raft instance" t=2018-10-15T14:29:27+0000
lvl=eror msg="Dqlite: aborting (fd=23 state=body msg=response write error: broken pipe (EPIPE))" t=2018-10-15T14:29:27+0000
lvl=info msg="Dqlite: handling new connection (fd=8)" t=2018-10-15T14:29:27+0000
lvl=info msg="Dqlite: stopping event loop" t=2018-10-15T14:29:27+0000
lvl=info msg="Dqlite: event loop stopped" t=2018-10-15T14:29:27+0000
lvl=info msg="Saving simplestreams cache" t=2018-10-15T14:29:27+0000
lvl=info msg="Saved simplestreams cache" t=2018-10-15T14:29:27+0000
lxd --debug --group lxd
DBUG[10-15|14:36:51] Connecting to a local LXD over a Unix socket
DBUG[10-15|14:36:51] Sending request to LXD etag= method=GET url=http://unix.socket/1.0
INFO[10-15|14:38:17] LXD 3.6 is starting in normal mode path=/var/snap/lxd/common/lxd
INFO[10-15|14:38:17] Kernel uid/gid map:
INFO[10-15|14:38:17] - u 0 0 4294967295
INFO[10-15|14:38:17] - g 0 0 4294967295
INFO[10-15|14:38:17] Configured LXD uid/gid map:
INFO[10-15|14:38:17] - u 0 1000000 1000000000
INFO[10-15|14:38:17] - g 0 1000000 1000000000
DBUG[10-15|14:38:17] Failed to set NETLINK_DUMP_STRICT_CHK socket option
DBUG[10-15|14:38:17] Running kernel does not support netnsid-based network retrieval
DBUG[10-15|14:38:17] Running kernel does not support uevent injection
DBUG[10-15|14:38:17] System supports unprivileged file capabilities
INFO[10-15|14:38:17] Initializing local database
INFO[10-15|14:38:17] Initializing database gateway
INFO[10-15|14:38:17] Start database node address= id=1
INFO[10-15|14:38:18] Raft: Restored from snapshot 1-881483-1539609801319
INFO[10-15|14:38:18] Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}]
INFO[10-15|14:38:18] Raft: Node at 0 [Leader] entering Leader state
INFO[10-15|14:38:18] Dqlite: starting event loop
DBUG[10-15|14:38:18] Dqlite: accepting connections
INFO[10-15|14:38:18] LXD isn't socket activated
DBUG[10-15|14:38:18] Connecting to a local LXD over a Unix socket
DBUG[10-15|14:38:18] Sending request to LXD etag= method=GET url=http://unix.socket/1.0
log output:
lvl=info msg="LXD 3.6 is starting in normal mode" path=/var/snap/lxd/common/lxd t=2018-10-15T14:36:51+0000
lvl=info msg="Kernel uid/gid map:" t=2018-10-15T14:36:51+0000
lvl=info msg=" - u 0 0 4294967295" t=2018-10-15T14:36:51+0000
lvl=info msg=" - g 0 0 4294967295" t=2018-10-15T14:36:51+0000
lvl=info msg="Configured LXD uid/gid map:" t=2018-10-15T14:36:51+0000
lvl=info msg=" - u 0 1000000 1000000000" t=2018-10-15T14:36:51+0000
lvl=info msg=" - g 0 1000000 1000000000" t=2018-10-15T14:36:51+0000
lvl=info msg="Initializing local database" t=2018-10-15T14:36:51+0000
lvl=info msg="Initializing database gateway" t=2018-10-15T14:36:51+0000
address= id=1 lvl=info msg="Start database node" t=2018-10-15T14:36:51+0000
lvl=info msg="Raft: Restored from snapshot 1-881483-1539609801319" t=2018-10-15T14:36:52+0000
lvl=info msg="Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}]" t=2018-10-15T14:36:52+0000
lvl=info msg="Raft: Node at 0 [Leader] entering Leader state" t=2018-10-15T14:36:52+0000
lvl=info msg="Dqlite: starting event loop" t=2018-10-15T14:36:52+0000
lvl=info msg="LXD is socket activated" t=2018-10-15T14:36:52+0000
lvl=info msg="Starting /dev/lxd handler:" t=2018-10-15T14:36:52+0000
lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock t=2018-10-15T14:36:52+0000
lvl=info msg="REST API daemon:" t=2018-10-15T14:36:52+0000
lvl=info msg=" - binding Unix socket" socket=/var/snap/lxd/common/lxd/unix.socket t=2018-10-15T14:36:52+0000
lvl=info msg=" - binding TCP socket" socket=10.240.0.85:8443 t=2018-10-15T14:36:52+0000
lvl=info msg="Initializing global database" t=2018-10-15T14:36:52+0000
lvl=info msg="Dqlite: handling new connection (fd=24)" t=2018-10-15T14:36:52+0000
lvl=info msg="Dqlite: connected address=0 attempt=0" t=2018-10-15T14:36:52+0000
lvl=info msg="Updating the LXD global schema. Backup made as \"global.bak\"" t=2018-10-15T14:36:52+0000
lvl=info msg="Dqlite: closing client" t=2018-10-15T14:38:07+0000
lvl=warn msg="Failed to rollback transaction after error (failed to apply update 11: Failed to add project_id column: driver: bad connection): driver: bad connection" t=2018-10-15T14:38:07+0000
lvl=info msg="Dqlite: handling new connection (fd=53)" t=2018-10-15T14:38:07+0000
lvl=info msg="Dqlite: closing client" t=2018-10-15T14:38:12+0000
lvl=info msg="Dqlite: closing client" t=2018-10-15T14:38:17+0000
lvl=eror msg="Failed to start the daemon: failed to open cluster database: failed to ensure schema: failed to begin transaction: failed to create dqlite connection: no available dqlite leader server found" t=2018-10-15T14:38:17+0000
lvl=info msg="Starting shutdown sequence" t=2018-10-15T14:38:17+0000
lvl=info msg="Stopping REST API handler:" t=2018-10-15T14:38:17+0000
lvl=info msg=" - closing socket" socket=10.240.0.85:8443 t=2018-10-15T14:38:17+0000
lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/unix.socket t=2018-10-15T14:38:17+0000
lvl=info msg="Stopping /dev/lxd handler:" t=2018-10-15T14:38:17+0000
lvl=info msg=" - closing socket" socket=/var/snap/lxd/common/lxd/devlxd/sock t=2018-10-15T14:38:17+0000
lvl=info msg="Stop database gateway" t=2018-10-15T14:38:17+0000
lvl=info msg="Stop raft instance" t=2018-10-15T14:38:17+0000
lvl=eror msg="Dqlite: aborting (fd=24 state=body msg=response write error: broken pipe (EPIPE))" t=2018-10-15T14:38:17+0000
lvl=info msg="Dqlite: handling new connection (fd=8)" t=2018-10-15T14:38:17+0000
lvl=info msg="Dqlite: stopping event loop" t=2018-10-15T14:38:17+0000
lvl=info msg="Dqlite: event loop stopped" t=2018-10-15T14:38:17+0000
lvl=info msg="Saving simplestreams cache" t=2018-10-15T14:38:17+0000
lvl=info msg="Saved simplestreams cache" t=2018-10-15T14:38:17+0000
tail: /var/snap/lxd/common/lxd/logs/lxd.log: file truncated
lvl=info msg="LXD 3.6 is starting in normal mode" path=/var/snap/lxd/common/lxd t=2018-10-15T14:38:18+0000
lvl=info msg="Kernel uid/gid map:" t=2018-10-15T14:38:18+0000
lvl=info msg=" - u 0 0 4294967295" t=2018-10-15T14:38:18+0000
lvl=info msg=" - g 0 0 4294967295" t=2018-10-15T14:38:18+0000
lvl=info msg="Configured LXD uid/gid map:" t=2018-10-15T14:38:18+0000
lvl=info msg=" - u 0 1000000 1000000000" t=2018-10-15T14:38:18+0000
lvl=info msg=" - g 0 1000000 1000000000" t=2018-10-15T14:38:18+0000
lvl=info msg="Initializing local database" t=2018-10-15T14:38:18+0000
lvl=info msg="Initializing database gateway" t=2018-10-15T14:38:18+0000
address= id=1 lvl=info msg="Start database node" t=2018-10-15T14:38:18+0000
lvl=eror msg="Failed to start the daemon: Failed to create raft factory: failed to create bolt store for raft logs: timeout" t=2018-10-15T14:38:23+0000
lvl=info msg="Starting shutdown sequence" t=2018-10-15T14:38:23+0000
lvl=info msg="Saving simplestreams cache" t=2018-10-15T14:38:23+0000
lvl=info msg="Saved simplestreams cache" t=2018-10-15T14:38:23+0000
tail: /var/snap/lxd/common/lxd/logs/lxd.log: file truncated
lvl=info msg="LXD 3.6 is starting in normal mode" path=/var/snap/lxd/common/lxd t=2018-10-15T14:38:25+0000
lvl=info msg="Kernel uid/gid map:" t=2018-10-15T14:38:25+0000
lvl=info msg=" - u 0 0 4294967295" t=2018-10-15T14:38:25+0000
lvl=info msg=" - g 0 0 4294967295" t=2018-10-15T14:38:25+0000
lvl=info msg="Configured LXD uid/gid map:" t=2018-10-15T14:38:25+0000
lvl=info msg=" - u 0 1000000 1000000000" t=2018-10-15T14:38:25+0000
lvl=info msg=" - g 0 1000000 1000000000" t=2018-10-15T14:38:25+0000
lvl=info msg="Initializing local database" t=2018-10-15T14:38:25+0000
lvl=info msg="Initializing database gateway" t=2018-10-15T14:38:25+0000
address= id=1 lvl=info msg="Start database node" t=2018-10-15T14:38:25+0000
lvl=eror msg="Failed to start the daemon: Failed to create raft factory: failed to create bolt store for raft logs: timeout" t=2018-10-15T14:38:30+0000
lvl=info msg="Starting shutdown sequence" t=2018-10-15T14:38:30+0000
lvl=info msg="Saving simplestreams cache" t=2018-10-15T14:38:30+0000
lvl=info msg="Saved simplestreams cache" t=2018-10-15T14:38:30+0000
tail: /var/snap/lxd/common/lxd/logs/lxd.log: file truncated
lvl=info msg="LXD 3.6 is starting in normal mode" path=/var/snap/lxd/common/lxd t=2018-10-15T14:38:32+0000
lvl=info msg="Kernel uid/gid map:" t=2018-10-15T14:38:32+0000
lvl=info msg=" - u 0 0 4294967295" t=2018-10-15T14:38:32+0000
lvl=info msg=" - g 0 0 4294967295" t=2018-10-15T14:38:32+0000
lvl=info msg="Configured LXD uid/gid map:" t=2018-10-15T14:38:32+0000
lvl=info msg=" - u 0 1000000 1000000000" t=2018-10-15T14:38:32+0000
lvl=info msg=" - g 0 1000000 1000000000" t=2018-10-15T14:38:32+0000
lvl=info msg="Initializing local database" t=2018-10-15T14:38:32+0000
lvl=info msg="Initializing database gateway" t=2018-10-15T14:38:32+0000
address= id=1 lvl=info msg="Start database node" t=2018-10-15T14:38:32+0000
lvl=eror msg="Failed to start the daemon: Failed to create raft factory: failed to create bolt store for raft logs: timeout" t=2018-10-15T14:38:37+0000
lvl=info msg="Starting shutdown sequence" t=2018-10-15T14:38:37+0000
lvl=info msg="Saving simplestreams cache" t=2018-10-15T14:38:37+0000
lvl=info msg="Saved simplestreams cache" t=2018-10-15T14:38:37+0000
tail: /var/snap/lxd/common/lxd/logs/lxd.log: file truncated
lvl=info msg="LXD 3.6 is starting in normal mode" path=/var/snap/lxd/common/lxd t=2018-10-15T14:38:38+0000
lvl=info msg="Kernel uid/gid map:" t=2018-10-15T14:38:38+0000
lvl=info msg=" - u 0 0 4294967295" t=2018-10-15T14:38:38+0000
lvl=info msg=" - g 0 0 4294967295" t=2018-10-15T14:38:38+0000
lvl=info msg="Configured LXD uid/gid map:" t=2018-10-15T14:38:38+0000
lvl=info msg=" - u 0 1000000 1000000000" t=2018-10-15T14:38:38+0000
lvl=info msg=" - g 0 1000000 1000000000" t=2018-10-15T14:38:38+0000
lvl=info msg="Initializing local database" t=2018-10-15T14:38:38+0000
lvl=info msg="Initializing database gateway" t=2018-10-15T14:38:38+0000
address= id=1 lvl=info msg="Start database node" t=2018-10-15T14:38:38+0000
lvl=eror msg="Failed to start the daemon: Failed to create raft factory: failed to create bolt store for raft logs: timeout" t=2018-10-15T14:38:44+0000
lvl=info msg="Starting shutdown sequence" t=2018-10-15T14:38:44+0000
lvl=info msg="Saving simplestreams cache" t=2018-10-15T14:38:44+0000
lvl=info msg="Saved simplestreams cache" t=2018-10-15T14:38:44+0000
tail: /var/snap/lxd/common/lxd/logs/lxd.log: file truncated
lvl=info msg="LXD 3.6 is starting in normal mode" path=/var/snap/lxd/common/lxd t=2018-10-15T14:38:45+0000
lvl=info msg="Kernel uid/gid map:" t=2018-10-15T14:38:45+0000
lvl=info msg=" - u 0 0 4294967295" t=2018-10-15T14:38:45+0000
lvl=info msg=" - g 0 0 4294967295" t=2018-10-15T14:38:45+0000
lvl=info msg="Configured LXD uid/gid map:" t=2018-10-15T14:38:45+0000
lvl=info msg=" - u 0 1000000 1000000000" t=2018-10-15T14:38:45+0000
lvl=info msg=" - g 0 1000000 1000000000" t=2018-10-15T14:38:45+0000
lvl=info msg="Initializing local database" t=2018-10-15T14:38:45+0000
lvl=info msg="Initializing database gateway" t=2018-10-15T14:38:45+0000
address= id=1 lvl=info msg="Start database node" t=2018-10-15T14:38:45+0000
lvl=eror msg="Failed to start the daemon: Failed to create raft factory: failed to create bolt store for raft logs: timeout" t=2018-10-15T14:38:50+0000
lvl=info msg="Starting shutdown sequence" t=2018-10-15T14:38:50+0000
lvl=info msg="Saving simplestreams cache" t=2018-10-15T14:38:50+0000
lvl=info msg="Saved simplestreams cache" t=2018-10-15T14:38:50+0000
tail: /var/snap/lxd/common/lxd/logs/lxd.log: file truncated
lvl=info msg="LXD 3.6 is starting in normal mode" path=/var/snap/lxd/common/lxd t=2018-10-15T14:38:52+0000
lvl=info msg="Kernel uid/gid map:" t=2018-10-15T14:38:52+0000
lvl=info msg=" - u 0 0 4294967295" t=2018-10-15T14:38:52+0000
lvl=info msg=" - g 0 0 4294967295" t=2018-10-15T14:38:52+0000
lvl=info msg="Configured LXD uid/gid map:" t=2018-10-15T14:38:52+0000
lvl=info msg=" - u 0 1000000 1000000000" t=2018-10-15T14:38:52+0000
lvl=info msg=" - g 0 1000000 1000000000" t=2018-10-15T14:38:52+0000
lvl=info msg="Initializing local database" t=2018-10-15T14:38:52+0000
lvl=info msg="Initializing database gateway" t=2018-10-15T14:38:52+0000
address= id=1 lvl=info msg="Start database node" t=2018-10-15T14:38:52+0000
lvl=eror msg="Failed to start the daemon: Failed to create raft factory: failed to create bolt store for raft logs: timeout" t=2018-10-15T14:38:57+0000
lvl=info msg="Starting shutdown sequence" t=2018-10-15T14:38:57+0000
lvl=info msg="Saving simplestreams cache" t=2018-10-15T14:38:57+0000
lvl=info msg="Saved simplestreams cache" t=2018-10-15T14:38:57+0000
tail: /var/snap/lxd/common/lxd/logs/lxd.log: file truncated
lvl=info msg="LXD 3.6 is starting in normal mode" path=/var/snap/lxd/common/lxd t=2018-10-15T14:38:59+0000
lvl=info msg="Kernel uid/gid map:" t=2018-10-15T14:38:59+0000
lvl=info msg=" - u 0 0 4294967295" t=2018-10-15T14:38:59+0000
lvl=info msg=" - g 0 0 4294967295" t=2018-10-15T14:38:59+0000
lvl=info msg="Configured LXD uid/gid map:" t=2018-10-15T14:38:59+0000
lvl=info msg=" - u 0 1000000 1000000000" t=2018-10-15T14:38:59+0000
lvl=info msg=" - g 0 1000000 1000000000" t=2018-10-15T14:38:59+0000
lvl=info msg="Initializing local database" t=2018-10-15T14:38:59+0000
lvl=info msg="Initializing database gateway" t=2018-10-15T14:38:59+0000
address= id=1 lvl=info msg="Start database node" t=2018-10-15T14:38:59+0000
@MarkGavalda any chance you can send a tarball of /var/snap/lxd/common/lxd/database to [email protected]?
It worked for me:
sudo snap switch --channel=candidate lxd
sudo snap refresh
What should I do when the fix is incorporated to the stable channel?
@melato It is now in the stable channel, so you can run snap refresh --channel=stable lxd which should switch you back.
Note that the fix for this will not cause the huge logs.db to shrink back to a useful size, we're looking at a separate change that'd allow that particular database to get re-generated from a newly created snapshot, which will then allow us to resize all those databases down to a more reasonable size.
For anyone following this, the fix is in stable now. If you were on candidate temporarily, refresh to stable with snap refresh lxd --stable.
Is there an open issue for the huge logs.db? Mine is 2GB.
https://github.com/lxc/lxd/issues/5177 filed now
Most helpful comment
Note that the fix for this will not cause the huge logs.db to shrink back to a useful size, we're looking at a separate change that'd allow that particular database to get re-generated from a newly created snapshot, which will then allow us to resize all those databases down to a more reasonable size.