Lxd: Snap v3.6 candidate bug

Created on 11 Oct 2018  路  9Comments  路  Source: lxc/lxd

After automatically upgrading to v3.6 in the "candidate" snap channel multiple machines are affected by this bug:

Oct 11 05:51:24 test1 lxd.daemon[63988]: => Killing conflicting LXD (pid=58025)
Oct 11 05:51:24 test1 lxd.daemon[63988]: => Starting LXD
Oct 11 05:51:24 test1 lxd.daemon[63988]: lxd: src/vfs.c:1409: dqlite__vfs_open: Assertion `filename != NULL' failed.
Oct 11 05:51:24 test1 lxd.daemon[63988]: SIGABRT: abort
Oct 11 05:51:24 test1 lxd.daemon[63988]: PC=0x7fee28ade428 m=13 sigcode=18446744073709551610
Oct 11 05:51:24 test1 lxd.daemon[63988]: signal arrived during cgo execution
Oct 11 05:51:24 test1 lxd.daemon[63988]: goroutine 38 [syscall, locked to thread]:
Oct 11 05:51:24 test1 lxd.daemon[63988]: runtime.cgocall(0xf25c80, 0xc000ac1f20, 0x0)
Oct 11 05:51:24 test1 lxd.daemon[63988]:         /build/lxd/parts/go/build/src/runtime/cgocall.go:128 +0x5e fp=0xc000ac1ef0 sp=0xc000ac1eb8 pc=0x40e7ee
Oct 11 05:51:24 test1 lxd.daemon[63988]: github.com/CanonicalLtd/go-dqlite/internal/bindings._Cfunc_dqlite_server_run(0x7fee14000d90, 0x7fee00000000)
Oct 11 05:51:24 test1 lxd.daemon[63988]:         _cgo_gotypes.go:435 +0x49 fp=0xc000ac1f20 sp=0xc000ac1ef0 pc=0x7dae09
Oct 11 05:51:24 test1 lxd.daemon[63988]: github.com/CanonicalLtd/go-dqlite/internal/bindings.(*Server).Run.func1(0x7fee14000d90, 0xc000909f90)
Oct 11 05:51:24 test1 lxd.daemon[63988]:         /build/lxd/parts/lxd/go/src/github.com/CanonicalLtd/go-dqlite/internal/bindings/server.go:137 +0x56 fp=0xc000ac1f58 sp=0xc000ac1f20 pc=0x7e1386
Oct 11 05:51:24 test1 lxd.daemon[63988]: github.com/CanonicalLtd/go-dqlite/internal/bindings.(*Server).Run(0x7fee14000d90, 0x11de758, 0x0)
Oct 11 05:51:24 test1 lxd.daemon[63988]:         /build/lxd/parts/lxd/go/src/github.com/CanonicalLtd/go-dqlite/internal/bindings/server.go:137 +0x2f fp=0xc000ac1fa0 sp=0xc000ac1f58 pc=0x7de96f
Oct 11 05:51:24 test1 lxd.daemon[63988]: github.com/CanonicalLtd/go-dqlite.(*Server).run(0xc0001520a0)
Oct 11 05:51:24 test1 lxd.daemon[63988]:         /build/lxd/parts/lxd/go/src/github.com/CanonicalLtd/go-dqlite/server.go:131 +0x54 fp=0xc000ac1fd8 sp=0xc000ac1fa0 pc=0x928244
Oct 11 05:51:24 test1 lxd.daemon[63988]: runtime.goexit()
Oct 11 05:51:24 test1 lxd.daemon[63988]:         /build/lxd/parts/go/build/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc000ac1fe0 sp=0xc000ac1fd8 pc=0x467e01
Oct 11 05:51:24 test1 lxd.daemon[63988]: created by github.com/CanonicalLtd/go-dqlite.NewServer
Oct 11 05:51:24 test1 lxd.daemon[63988]:         /build/lxd/parts/lxd/go/src/github.com/CanonicalLtd/go-dqlite/server.go:88 +0x29c
Oct 11 05:51:24 test1 lxd.daemon[63988]: goroutine 1 [IO wait]:
Oct 11 05:51:24 test1 lxd.daemon[63988]: internal/poll.runtime_pollWait(0x7fee29dcfbc0, 0x72, 0xc000ace800)
Oct 11 05:51:24 test1 lxd.daemon[63988]:         /build/lxd/parts/go/build/src/runtime/netpoll.go:173 +0x66
Oct 11 05:51:24 test1 lxd.daemon[63988]: internal/poll.(*pollDesc).wait(0xc000199118, 0x72, 0xffffffffffffff00, 0x131fa60, 0x1c89958)
Oct 11 05:51:24 test1 lxd.daemon[63988]:         /build/lxd/parts/go/build/src/internal/poll/fd_poll_runtime.go:85 +0x9a
Oct 11 05:51:24 test1 lxd.daemon[63988]: internal/poll.(*pollDesc).waitRead(0xc000199118, 0xc000038500, 0x8, 0x8)
Oct 11 05:51:24 test1 lxd.daemon[63988]:         /build/lxd/parts/go/build/src/internal/poll/fd_poll_runtime.go:90 +0x3d
Oct 11 05:51:24 test1 lxd.daemon[63988]: internal/poll.(*FD).Read(0xc000199100, 0xc000038570, 0x8, 0x8, 0x0, 0x0, 0x0)
Oct 11 05:51:24 test1 lxd.daemon[63988]:         /build/lxd/parts/go/build/src/internal/poll/fd_unix.go:169 +0x179
Oct 11 05:51:24 test1 lxd.daemon[63988]: net.(*netFD).Read(0xc000199100, 0xc000038570, 0x8, 0x8, 0xc000ace940, 0x5a08cf, 0xc000199100)
Oct 11 05:51:24 test1 lxd.daemon[63988]:         /build/lxd/parts/go/build/src/net/fd_unix.go:202 +0x4f
Oct 11 05:51:24 test1 lxd.daemon[63988]: net.(*conn).Read(0xc0001fe608, 0xc000038570, 0x8, 0x8, 0x0, 0x0, 0x0)
Oct 11 05:51:24 test1 lxd.daemon[63988]:         /build/lxd/parts/go/build/src/net/net.go:177 +0x68
Oct 11 05:51:24 test1 lxd.daemon[63988]: github.com/CanonicalLtd/go-dqlite/internal/client.(*Client).recvFill(0xc000ba07d0, 0xc000038570, 0x8, 0x8, 0x0, 0x0, 0xc000ace9f0)
Oct 11 05:51:24 test1 lxd.daemon[63988]:         /build/lxd/parts/lxd/go/src/github.com/CanonicalLtd/go-dqlite/internal/client/client.go:246 +0x74
Oct 11 05:51:24 test1 lxd.daemon[63988]: github.com/CanonicalLtd/go-dqlite/internal/client.(*Client).recvPeek(0xc000ba07d0, 0xc000038570, 0x8, 0x8, 0x0, 0x80)
Oct 11 05:51:24 test1 lxd.daemon[63988]:         /build/lxd/parts/lxd/go/src/github.com/CanonicalLtd/go-dqlite/internal/client/client.go:230 +0x9c
Oct 11 05:51:24 test1 lxd.daemon[63988]: github.com/CanonicalLtd/go-dqlite/internal/client.(*Client).recvHeader(0xc000ba07d0, 0xc000bac230, 0x0, 0x0)
Oct 11 05:51:24 test1 lxd.daemon[63988]:         /build/lxd/parts/lxd/go/src/github.com/CanonicalLtd/go-dqlite/internal/client/client.go:186 +0x4f
Oct 11 05:51:24 test1 lxd.daemon[63988]: github.com/CanonicalLtd/go-dqlite/internal/client.(*Client).recv(0xc000ba07d0, 0xc000bac230, 0x0, 0x0)
Oct 11 05:51:24 test1 lxd.daemon[63988]:         /build/lxd/parts/lxd/go/src/github.com/CanonicalLtd/go-dqlite/internal/client/client.go:174 +0x39
Oct 11 05:51:24 test1 lxd.daemon[63988]: github.com/CanonicalLtd/go-dqlite/internal/client.(*Client).Call(0xc000ba07d0, 0x132b880, 0xc0000380a8, 0xc000bac1d0, 0xc000bac230, 0x0, 0x0)
Incomplete

Most helpful comment

And fix in edge and candidate now, a refresh should get you back online.

All 9 comments

What version are you upgrading from?

Could you please run (as root):

systemctl stop snap.lxd.daemon
systemctl stop snap.lxd.daemon.unix.socket
rm /var/snap/lxd/common/lxd/unix.socket

and the paste the output of:

LXD_DIR=/var/snap/lxd/common/lxd/ LD_LIBRARY_PATH=/snap/lxd/current/lib /snap/lxd/current/bin/lxd --debug

?

$ sudo systemctl stop snap.lxd.daemon
$ sudo systemctl stop snap.lxd.daemon.unix.socket
$ sudo rm /var/snap/lxd/common/lxd/unix.socket
$ LXD_DIR=/var/snap/lxd/common/lxd/ LD_LIBRARY_PATH=/snap/lxd/current/lib /snap/lxd/current/bin/lxd --debug
/snap/lxd/current/bin/lxd: error while loading shared libraries: libuv.so.1: cannot open shared object file: No such file or directory

Actually, just stopping the units above and running lxd --debug --group lxd should be enough to capture the full trace.

Note that I did get hit by this issue myself, so we at least do have a system we can poke at and prepare a fix but having a few more full traces would let us confirm we're dealing with the same thing.

Excellent, lines up with what I have here, except that you're going through two DB update while I'm only going through one.

@freeekanayaka has a copy of my database and can reproduce this issue on his system now, so hopefully won't be long until we have a fix

Looks like we have a fix, I'm refreshing edge and candidate now, should be fixed in the next couple of hours

And fix in edge and candidate now, a refresh should get you back online.

Can confirm, back online

Was this page helpful?
0 / 5 - 0 ratings