snap have updated automatically to 3.4, and nothing works anymore.
Distribution: ubuntu
The output of "lxc info" or if that fails:
/snap/bin/lxc info
Error: Failed to fetch http://unix.socket/1.0: 500 Internal Server Error
Kernel version: 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:55:48 UTC 2018 i686 i686 i686 GNU/Linux
LXD version:
snap list
Name Version Rev Tracking Publisher Notes
core 16-2.34.3 5146 stable canonical core
lxd 3.4 8385 stable canonical -
Storage backend in use: btrfs, compressed with lzo
snap have updated lxd to 3.4. Since then, nothing works any more.
dmesg:
[255820.008382] audit: type=1400 audit(1534561683.200:135): apparmor="DENIED" operation="open" profile="snap.lxd.hook.configure" name="/var/lib/snapd/hostfs/usr/lib/os-release" pid=8759 comm="snap-exec" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[255820.071777] audit: type=1400 audit(1534561683.260:136): apparmor="DENIED" operation="open" profile="snap.lxd.hook.configure" name="/var/lib/snapd/hostfs/usr/lib/os-release" pid=8768 comm="snapctl" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[255820.133003] audit: type=1400 audit(1534561683.324:137): apparmor="DENIED" operation="open" profile="snap.lxd.hook.configure" name="/var/lib/snapd/hostfs/usr/lib/os-release" pid=8777 comm="snapctl" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[255820.197571] audit: type=1400 audit(1534561683.388:138): apparmor="DENIED" operation="open" profile="snap.lxd.hook.configure" name="/var/lib/snapd/hostfs/usr/lib/os-release" pid=8783 comm="snapctl" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[255820.256379] audit: type=1400 audit(1534561683.448:139): apparmor="DENIED" operation="open" profile="snap.lxd.hook.configure" name="/var/lib/snapd/hostfs/usr/lib/os-release" pid=8792 comm="snapctl" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[255820.313201] audit: type=1400 audit(1534561683.504:140): apparmor="DENIED" operation="open" profile="snap.lxd.hook.configure" name="/var/lib/snapd/hostfs/usr/lib/os-release" pid=8800 comm="snapctl" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[264675.302793] audit: type=1400 audit(1534570538.492:141): apparmor="DENIED" operation="capable" profile="snap.lxd.lxc" pid=12601 comm="lxc" capability=2 capname="dac_read_search"
[264675.302809] audit: type=1400 audit(1534570538.492:142): apparmor="DENIED" operation="capable" profile="snap.lxd.lxc" pid=12601 comm="lxc" capability=1 capname="dac_override"
[264675.309940] audit: type=1400 audit(1534570538.500:143): apparmor="DENIED" operation="capable" profile="snap.lxd.lxc" pid=12644 comm="cat" capability=2 capname="dac_read_search"
[264675.309959] audit: type=1400 audit(1534570538.500:144): apparmor="DENIED" operation="capable" profile="snap.lxd.lxc" pid=12644 comm="cat" capability=1 capname="dac_override"
lxc monitor while reproducing the issue)^
This is a duplicate of #4944.
There's a good chance that the issue was not actually fixed as it supposed to be. @stgraber any way we can confirm that @gaetanquentin's snap revision was built with dqlite master 515e6f7 or later? I believe that's the case, but want to double check.
@freeekanayaka yeah, 8385 has those commits.
@gaetanquentin can you snap refresh lxd? That should get you up to 8393. I don't think it'll help with this particular problem, but should take care of any further leftover LXD processes.
Assuming the issue still happens then, can you make a tarball of /var/snap/lxd/common/lxd/database and send that to stgraber at ubuntu dot com?
@stgraber
root@nas:/var/snap/lxd# snap refresh lxd
le snap "lxd" n'a aucune mise à jour disponible
root@nas:/var/snap/lxd# ls -la
total 4
drwxr-xr-x 1 root root 50 août 18 05:07 .
drwxr-xr-x 1 root root 14 févr. 12 2018 ..
drwxr-xr-x 1 root root 6 févr. 14 2018 8298
drwxr-xr-x 1 root root 6 févr. 14 2018 8351
drwxr-xr-x 1 root root 6 févr. 14 2018 8385
drwxr-xr-x 1 root root 92 août 18 08:34 common
lrwxrwxrwx 1 root root 4 août 18 05:07 current -> 8385
Ah, 8385 is current but i386, that's why the revs didn't line up.
@gaetanquentin can you provide the content of /var/snap/lxd/common/lxd/logs/lxd.log?
Ideally, starting at when the daemon starts.
And assuming that just restarting the daemon doesn't fix this, it'd be useful if you could do:
Which would then make the logs much more verbose. If you do that and still run into the issue, then again, please provide lxd.log starting at the point where LXD started and until you start seeing the repetitive log entries.
@gaetanquentin ah and a full ps fauxww prior to restarting the daemon (while it still gives you the 500 errors) would be nice too.
strangely now it works again . always with the same version. (8385)
lxd.log were still attached in this thread, with lxd-strace.zip, a strace of lxd
i won't stop it again for now since i need the containers running.
@gaetanquentin sorry I missed the tarballs, okay, so the beginning of it looks like:
lvl=info msg="Updating images" t=2018-08-18T08:35:12+0200
lvl=warn msg="Failed to rollback transaction after error (driver: bad connection): unexpected response type 7" t=2018-08-18T08:35:16+0200
lvl=warn msg="Failed to rollback transaction after error (cannot fetch node config from database: unexpected response type 6): unexpected response type 7" t=2018-08-18T08:35:17+0200
lvl=warn msg="Failed to rollback transaction after error (Failed to fetch nodes: unexpected response type 6): unexpected response type 7" t=2018-08-18T08:35:17+0200
lvl=warn msg="Failed to get current cluster nodes: Failed to fetch nodes: unexpected response type 6" t=2018-08-18T08:35:17+0200
lvl=info msg="Done updating instance types" t=2018-08-18T08:35:17+0200
lvl=warn msg="Failed to rollback transaction after error (unexpected response type 6): unexpected response type 7" t=2018-08-18T08:35:18+0200
err="unexpected response type 6" lvl=eror msg="Failed to expire logs" t=2018-08-18T08:35:18+0200
lvl=info msg="Done expiring log files" t=2018-08-18T08:35:18+0200
lvl=warn msg="Failed to rollback transaction after error (unexpected response type 6): unexpected response type 7" t=2018-08-18T08:35:18+0200
err="unexpected response type 6" fp=4d574fd7ecf8d7f758cb212d92e8fef12876a8c0f03e5048cabbf179482919c7 lvl=eror msg="Error loading image" t=2018-08-18T08:35:18+0200
lvl=info msg="Done updating images" t=2018-08-18T08:35:18+0200
@freeekanayaka
@stgraber did you receive the database content on your email?
@gaetanquentin yes he did and he forwarded it to me. I could not reproduce the issue so far. Did it happen again btw? If so, it'd be great if you can attach the log again, since it might be that the failure mode shown in the first logs you sent is now fixed, but there's another failure mode which we still have to catch.
@freeekanayaka for the moment, containers are working and lxd commands run fast. Have you tried on a slow i386 archi?
sometimes there are errors:
lvl=dbug msg="Database error: &errors.errorString{s:\"sql: no rows in result set\"}" t=2018-08-20T09:31:34+0200
i am trying to stop a container, it does nothing, command is still running. in logs there is that:
ip=@ lvl=dbug method=GET msg=handling t=2018-08-20T14:08:51+0200 url=/1.0/operations/863c56f9-4b12-4a47-bddc-969f1366c069
action=shutdown created=2017-03-14T05:47:59+0100 ephemeral=false lvl=info msg="Shutting down container" name=showfinders t=2018-08-20T14:08:51+0200 timeout=-1s used=2018-08-19T15:31:55+0200
ip=@ lvl=dbug method=GET msg=handling t=2018-08-20T14:12:05+0200 url=/1.0
with stop -f , it works
snap stop lxd, worked too
snap start lxd launched the lxd daemon
but after:
lxc list
Error: Failed to fetch http://unix.socket/1.0: 500 Internal Server Error
i had to wait a minute or more for the lxd answer commands again. but it very slow again, unusable.
in logs:
lvl=dbug msg="Database error: &errors.errorString{s:\"cannot fetch node config from database: unexpected response type 6\"}" t=2018-08-20T14:17:36+0200
ip=@ lvl=dbug method=GET msg=handling t=2018-08-20T14:17:36+0200 url=/1.0
lvl=warn msg="Failed to rollback transaction after error (cannot fetch node config from database: unexpected response type 6): unexpected response type 7" t=2018-08-20T14:17:36+0200
lvl=dbug msg="Database error: &errors.errorString{s:\"cannot fetch node config from database: unexpected response type 6\"}" t=2018-08-20T14:17:36+0200
lvl=warn msg="Failed to rollback transaction after error (Failed to fetch nodes: unexpected response type 6): unexpected response type 7" t=2018-08-20T14:17:37+0200
lvl=dbug msg="Database error: Failed to fetch nodes: unexpected response type 6" t=2018-08-20T14:17:37+0200
lvl=warn msg="Failed to get current cluster nodes: Failed to fetch nodes: unexpected response type 6" t=2018-08-20T14:17:37+0200
lvl=warn msg="Failed to rollback transaction after error (cannot fetch node config from database: unexpected response type 6): unexpected response type 7" t=2018-08-20T14:17:37+0200
lvl=dbug msg="Database error: &errors.errorString{s:\"cannot fetch node config from database: unexpected response type 6\"}" t=2018-08-20T14:17:37+0200
ip=@ lvl=dbug method=GET msg=handling t=2018-08-20T14:17:37+0200 url=/1.0
lvl=warn msg="Failed to rollback transaction after error (cannot fetch node config from database: unexpected response type 6): unexpected response type 7" t=2018-08-20T14:17:38+0200
lvl=dbug msg="Database error: &errors.errorString{s:\"cannot fetch node config from database: unexpected response type 6\"}" t=2018-08-20T14:17:38+0200
lvl=warn msg="Failed to rollback transaction after error (Failed to fetch nodes: unexpected response type 6): unexpected response type 7" t=2018-08-20T14:17:38+0200
lvl=dbug msg="Database error: Failed to fetch nodes: unexpected response type 6" t=2018-08-20T14:17:38+0200
lvl=warn msg="Failed to get current cluster nodes: Failed to fetch nodes: unexpected response type 6" t=2018-08-20T14:17:38+0200
lvl=warn msg="Failed to rollback transaction after error (cannot fetch node config from database: unexpected response type 6): unexpected response type 7" t=2018-08-20T14:17:39+0200
lvl=dbug msg="Database error: &errors.errorString{s:\"cannot fetch node config from database: unexpected response type 6\"}" t=2018-08-20T14:17:39+0200
ip=@ lvl=dbug method=GET msg=handling t=2018-08-20T14:17:39+0200 url=/1.0
lvl=warn msg="Failed to rollback transaction after error (cannot fetch node config from database: unexpected response type 6): unexpected response type 7" t=2018-08-20T14:17:39+0200
lvl=dbug msg="Database error: &errors.errorString{s:\"cannot fetch node config from database: unexpected response type 6\"}" t=2018-08-20T14:17:39+0200
lvl=warn msg="Failed to rollback transaction after error (Failed to fetch nodes: unexpected response type 6): unexpected response type 7" t=2018-08-20T14:17:40+0200
lvl=dbug msg="Database error: Failed to fetch nodes: unexpected response type 6" t=2018-08-20T14:17:40+0200
lvl=warn msg="Failed to get current cluster nodes: Failed to fetch nodes: unexpected response type 6" t=2018-08-20T14:17:40+0200
lvl=warn msg="Failed to rollback transaction after error (cannot fetch node config from database: unexpected response type 6): unexpected response type 7" t=2018-08-20T14:17:40+0200
lvl=dbug msg="Database error: &errors.errorString{s:\"cannot fetch node config from database: unexpected response type 6\"}" t=2018-08-20T14:17:40+0200
ip=@ lvl=dbug method=GET msg=handling t=2018-08-20T14:17:40+0200 url=/1.0
lvl=warn msg="Failed to rollback transaction after error (cannot fetch node config from database: unexpected response type 6): unexpected response type 7" t=2018-08-20T14:17:41+0200
lvl=dbug msg="Database error: &errors.errorString{s:\"cannot fetch node config from database: unexpected response type 6\"}" t=2018-08-20T14:17:41+0200
lvl=warn msg="Failed to rollback transaction after error (Failed to fetch nodes: unexpected response type 6): unexpected response type 7" t=2018-08-20T14:17:41+0200
lvl=dbug msg="Database error: Failed to fetch nodes: unexpected response type 6" t=2018-08-20T14:17:41+0200
lvl=warn msg="Failed to get current cluster nodes: Failed to fetch nodes: unexpected response type 6" t=2018-08-20T14:17:41+0200
lvl=warn msg="Failed to rollback transaction after error (cannot fetch node config from database: unexpected response type 6): unexpected response type 7" t=2018-08-20T14:17:42+0200
lvl=dbug msg="Database error: &errors.errorString{s:\"cannot fetch node config from database: unexpected response type 6\"}" t=2018-08-20T14:17:42+0200
ip=@ lvl=dbug method=GET msg=handling t=2018-08-20T14:17:42+0200 url=/1.0
lvl=warn msg="Failed to rollback transaction after error (cannot fetch node config from database: unexpected response type 6): unexpected response type 7" t=2018-08-20T14:17:42+0200
lvl=dbug msg="Database error: &errors.errorString{s:\"cannot fetch node config from database: unexpected response type 6\"}" t=2018-08-20T14:17:42+0200
lvl=warn msg="Failed to rollback transaction after error (Failed to fetch nodes: unexpected response type 6): unexpected response type 7" t=2018-08-20T14:17:43+0200
lvl=dbug msg="Database error: Failed to fetch nodes: unexpected response type 6" t=2018-08-20T14:17:43+0200
lvl=warn msg="Failed to get current cluster nodes: Failed to fetch nodes: unexpected response type 6" t=2018-08-20T14:17:43+0200
lvl=warn msg="Failed to rollback transaction after error (cannot fetch node config from database: unexpected response type 6): unexpected response type 7" t=2018-08-20T14:17:43+0200
lvl=dbug msg="Database error: &errors.errorString{s:\"cannot fetch node config from database: unexpected response type 6\"}" t=2018-08-20T14:17:43+0200
[ ...]
vl=info msg="Dqlite: connected address=0 attempt=0" t=2018-08-20T14:18:25+0200
ip=@ lvl=dbug method=GET msg=handling t=2018-08-20T14:18:26+0200 url=/1.0
ip=@ lvl=dbug method=GET msg=handling t=2018-08-20T14:18:27+0200 url=/internal/ready
ip=@ lvl=dbug method=GET msg=handling t=2018-08-20T14:18:48+0200 url=/1.0
ip=@ lvl=dbug method=GET msg=handling t=2018-08-20T14:18:50+0200 url="/1.0/containers?recursion=2"
ip=@ lvl=dbug method=GET msg=handling t=2018-08-20T14:20:13+0200 url=/1.0
ip=@ lvl=dbug method=GET msg=handling t=2018-08-20T14:20:15+0200 url="/1.0/containers?recursion=2"
lvl=info msg="Raft: Starting snapshot up to 170473" t=2018-08-20T14:20:29+0200
lvl=info msg="Raft: Compacting logs from 168041 to 169964" t=2018-08-20T14:20:29+0200
lvl=info msg="Raft: Snapshot to 170473 complete" t=2018-08-20T14:20:30+0200
lvl=info msg="Dqlite: closing client" t=2018-08-20T14:20:36+0200
lvl=info msg="Dqlite: handling new connection (fd=28)" t=2018-08-20T14:20:36+0200
lvl=info msg="Dqlite: connected address=0 attempt=0" t=2018-08-20T14:20:36+0200
ip=@ lvl=dbug method=GET msg=handling t=2018-08-20T14:21:32+0200 url=/1.0
ip=@ lvl=dbug method=GET msg=handling t=2018-08-20T14:21:34+0200 url="/1.0/containers?recursion=2"
i think he is doing things at launch time, very time/io consuming. On slow systems, it make it hangs.
didn't you seen something on the strace zip?
There is a lot of io:
Total DISK READ : 0.00 B/s | Total DISK WRITE : 293.53 K/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 533.69 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
11892 be/4 root 0.00 B/s 220.15 K/s 0.00 % 34.46 % lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd --debug
11869 be/4 root 0.00 B/s 73.38 K/s 0.00 % 10.54 % lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd --debug
now lxc list report this:
lxc list
+----------------+-------+------+------+------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+----------------+-------+------+------+------------+-----------+
| xxx | ERROR | | | PERSISTENT | |
+----------------+-------+------+------+------------+-----------+
| yyy | ERROR | | | PERSISTENT | |
+----------------+-------+------+------+------------+-----------+
@gaetanquentin could you please attach again the content of /var/snap/lxd/common/lxd/logs/lxd.log ? and possibly of the last rotated logs in the same directory
I have a similar problem. Rebooting the machine does not fix it.
Command:
LD_LIBRARY_PATH=/snap/lxd/current/lib/ LXD_DIR=/var/snap/lxd/common/lxd/ /snap/lxd/current/bin/lxd --verbose --debug --group lxd
Output:
INFO[08-20|23:04:09] LXD 3.4 is starting in normal mode path=/var/snap/lxd/common/lxd
INFO[08-20|23:04:09] Kernel uid/gid map:
INFO[08-20|23:04:09] - u 0 0 4294967295
INFO[08-20|23:04:09] - g 0 0 4294967295
INFO[08-20|23:04:09] Configured LXD uid/gid map:
INFO[08-20|23:04:09] - u 0 165536 65536
INFO[08-20|23:04:09] - g 0 165536 65536
WARN[08-20|23:04:09] CGroup memory swap accounting is disabled, swap limits will be ignored.
INFO[08-20|23:04:09] Initializing local database
INFO[08-20|23:04:09] Initializing database gateway
INFO[08-20|23:04:09] Start database node address= id=1
INFO[08-20|23:04:10] Raft: Restored from snapshot 1-9086-1534764785477
INFO[08-20|23:04:10] Raft: Initial configuration (index=1): [{Suffrage:Voter ID:1 Address:0}]
INFO[08-20|23:04:10] Raft: Node at 0 [Leader] entering Leader state
INFO[08-20|23:04:10] Dqlite: starting event loop
DBUG[08-20|23:04:10] Dqlite: accepting connections
INFO[08-20|23:04:10] LXD isn't socket activated
INFO[08-20|23:04:10] Starting /dev/lxd handler:
INFO[08-20|23:04:10] - binding devlxd socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[08-20|23:04:10] REST API daemon:
INFO[08-20|23:04:10] - binding Unix socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[08-20|23:04:10] Initializing global database
INFO[08-20|23:04:10] Dqlite: handling new connection (fd=18)
INFO[08-20|23:04:10] Dqlite: connected address=0 attempt=0
INFO[08-20|23:04:11] Initializing storage pools
DBUG[08-20|23:04:11] Initializing and checking storage pool "default"
DBUG[08-20|23:04:12] Checking BTRFS storage pool "default"
INFO[08-20|23:04:12] Initializing networks
INFO[08-20|23:04:15] Dqlite: closing client
INFO[08-20|23:04:15] Dqlite: handling new connection (fd=20)
INFO[08-20|23:04:16] Dqlite: connected address=0 attempt=0
INFO[08-20|23:04:22] Pruning leftover image files
INFO[08-20|23:04:22] Done pruning leftover image files
INFO[08-20|23:04:22] Loading configuration
DBUG[08-20|23:04:22] Initialized inotify with file descriptor 17
INFO[08-20|23:04:30] Connected to MAAS controller
INFO[08-20|23:04:30] Pruning expired images
INFO[08-20|23:04:31] Done pruning expired images
INFO[08-20|23:04:31] Expiring log files
INFO[08-20|23:04:31] Updating instance types
INFO[08-20|23:04:32] Done expiring log files
INFO[08-20|23:04:41] Done updating instance types
WARN[08-20|23:04:58] Failed to rollback transaction after error (Failed to fetch nodes: driver: bad connection): unexpected response type 7
DBUG[08-20|23:04:58] Database error: Failed to fetch nodes: driver: bad connection
DBUG[08-20|23:04:58] Retry failed db interaction (Failed to fetch nodes: driver: bad connection)
WARN[08-20|23:04:59] Failed to rollback transaction after error (Failed to fetch nodes: unexpected response type 6): unexpected response type 7
DBUG[08-20|23:04:59] Database error: Failed to fetch nodes: unexpected response type 6
I also tried to replace the database with /var/snap/lxd/common/lxd/database/global /var/snap/lxd/common/lxd/database/global.bak temporarily. It lets me use lxc but the paths have changed since the bak was taken. I can't start containers anyway. It seems the problem is around the global database.
FYI, it hangs again:
lxc list
Error: Failed to fetch http://unix.socket/1.0: 500 Internal Server Error
I've been experiencing the same error, which I've been able to reproduce on several occasions by running watch lxc list for a period of several hours.
Just a quick update, we have a reproducer that lets me trigger this within seconds on a test system.
It appears to be tied to I/O performance against the LXD database, which explains why it's intermittent and why most of our users are immune (thanks to SSDs I guess).
And we have what looks like a potential fix, expecting it to release in the next few hours.
The fix is now available in the stable channel, please all run snap refresh lxd.
works fine, thanks.
It works for me too. Thanks
@stgraber unfortunately doesn't work for me. I'm on lxd 3.5 8774 stable.
[..]
Sep 23 14:03:26 [HOST] lxd.daemon[12593]: lvl=warn msg="Failed to rollback transaction after error (Failed to fetch nodes: unexpected response
type 6): unexpected response type 7" t=2018-09-23T14:03:26+0200
Sep 23 14:03:26 [HOST] lxd.daemon[12593]: lvl=warn msg="Failed to get current cluster nodes: Failed to fetch nodes: unexpected response type 6"
t=2018-09-23T14:03:26+0200
Sep 23 14:03:27 [HOST] lxd.daemon[12593]: lvl=warn msg="Failed to rollback transaction after error (Failed to fetch nodes: unexpected response
type 6): unexpected response type 7" t=2018-09-23T14:03:27+0200
Sep 23 14:03:27 [HOST] lxd.daemon[12593]: lvl=warn msg="Failed to get current cluster nodes: Failed to fetch nodes: unexpected response type 6"
t=2018-09-23T14:03:27+0200
Sep 23 14:03:28 [HOST] lxd.daemon[12593]: lvl=warn msg="Failed to rollback transaction after error (Failed to fetch nodes: unexpected response
type 6): unexpected response type 7" t=2018-09-23T14:03:28+0200
Sep 23 14:03:28 [HOST] lxd.daemon[12593]: lvl=warn msg="Failed to get current cluster nodes: Failed to fetch nodes: unexpected response type 6"
t=2018-09-23T14:03:28+0200
[..]
Seeing this as well on
lxd 3.5 9010 stable canonical✓ -
~~~~
lvl=warn msg="Failed to get current cluster nodes: Failed to fetch nodes: unexpected response type 6" t=2018-10-06T16:09:34+0200
lvl=warn msg="Failed to rollback transaction after error (cannot fetch node config from database: unexpected response type 6): unexpected response type 7" t=2018-10-06T16:09:35+0200
lvl=warn msg="Failed to rollback transaction after error (cannot fetch node config from database: unexpected response type 6): unexpected response type 7" t=2018-10-06T16:09:35+0200
lvl=warn msg="Failed to rollback transaction after error (cannot fetch node config from database: unexpected response type 6): unexpected response type 7" t=2018-10-06T16:09:35+0200
lvl=warn msg="Failed to rollback transaction after error (cannot fetch node config from database: unexpected response type 6): unexpected response type 7" t=2018-10-06T16:09:35+0200
lvl=warn msg="Failed to rollback transaction after error (cannot fetch node config from database: unexpected response type 6): unexpected response type 7" t=2018-10-06T16:09:35+0200
lvl=warn msg="Failed to rollback transaction after error (cannot fetch node config from database: unexpected response type 6): unexpected response type 7" t=2018-10-06T16:09:35+0200
~~~~
@sabaini if you're still stuck in this state, try systemctl reload snap.lxd.daemon that will hopefully unstick you.
@freeekanayaka we've seen a few reports of this happening (have seen it on some of our machines), any ideas?
I think this bug should be re-opened.
I've been seeing this occasionally, and just had a report of it on a jenkins slave, both running 3.6.
I can semi-consistently trigger it on my laptop, while I've been writing/testing integration tests that use pylxd to create and destroy containers. While doing so, I'm running a session with watch lxc list. If I do that long enough, I will hit the error and have to restart the service.
Are there any logs I can capture that'll help isolate this?
We're also still seeing this on Bionic, is the fix backported to the deb version available on Bionic? We are deploying with juju so we do not necessarily have the luxury of choosing to use the snap version of LXD.
3.0.3 currently in proposed has the updated dqlite, so hopefully that'll do the trick
The same here on Xenial:
ii lxd 3.0.2-0ubuntu1~16.04.1 amd64 Container hypervisor based on LXC - daemon
same here.. bionic.
Anyone hitting this, please make sure you're on 3.0.3 or higher, then if it keeps happening, we can look into it.
Got the same issue today on my BETA machine, but not on my PROD machine fortunately.
The BETA machine is much slower than PROD, in case this information might be useful.
the snap refresh lxd did fix the lxc listissue => unexpected response type 6
however I noticed there were the last 2 containers missing (the last 2 I've recently created, a couple of days ago). The list shows up 8 of 10 containers I expected.
Looks very weird to me.
By the way, the 2 hidden containers were still running in the background (ping worked).
However I could not enter in (lxc exec), probably because they were still hidden (not listed).
So, I had the idea to restart lxd, sudo systemctl restart snap.lxd.daemon.service. It seems OK apparently. Unfortunately, the last 2 containers don't exist anymore, no more ping. It looks like I've definitely erased these 2 (hidden) containers. Wow, frustrating.
These containers were 'staging' containers, so it's not a big deal. But I wonder what if it would happen on production containers. Feeling a bit scared.
@pvincent974b what version of LXD was this machine running? The 3.0.x stable series or the 3.x development one?
@freeekanayaka
The 3.0.x stable series, I guess so.
Months ago, I've installed with this command snap install lxd.
Quite sure, I did not try a development branch.
Good news, I re-create the 2 containers by using the same names.
Noticed a message of type 'Error: Common start logic: The container is already running'
Then, exec -- bash into these containers, and all the previous data were still there.
Cool.
hum, weeks ago, I have started playing with CRIU.
I know it is still experimental. That may explained why the last 2 containers (CRIU enabled) I created were missing after upgrade.
Hope it helps
@sabaini if you're still stuck in this state, try
systemctl reload snap.lxd.daemonthat will hopefully unstick you.@freeekanayaka we've seen a few reports of this happening (have seen it on some of our machines), any ideas?
I have been getting a similar error. Any calls to lxc binary were just failing.
lxc --debug info
DBUG[09-05|10:21:34] Connecting to a local LXD over a Unix socket
DBUG[09-05|10:21:34] Sending request to LXD method=GET url=http://unix.socket/1.0 etag=
Error: failed to begin transaction: exec
t=2019-09-04T11:49:37+0000 lvl=warn msg="Failed to get current cluster nodes: Failed to fetch nodes: driver: bad connection"
t=2019-09-04T11:49:38+0000 lvl=warn msg="Failed to get current cluster nodes: Failed to fetch nodes: driver: bad connection"
t=2019-09-04T11:49:39+0000 lvl=warn msg="Failed to get current cluster nodes: Failed to fetch nodes: driver: bad connection"
t=2019-09-04T11:49:40+0000 lvl=warn msg="Failed to get current cluster nodes: Failed to fetch nodes: driver: bad connection"
t=2019-09-04T11:49:40+0000 lvl=warn msg="Failed to get current cluster nodes: failed to begin transaction: exec"
t=2019-09-04T11:49:41+0000 lvl=warn msg="Failed to get current cluster nodes: failed to begin transaction: exec"
t=2019-09-04T11:49:42+0000 lvl=warn msg="Failed to get current cluster nodes: failed to begin transaction: exec"
t=2019-09-04T11:49:43+0000 lvl=warn msg="Failed to get current cluster nodes: failed to begin transaction: exec"
t=2019-09-04T11:49:43+0000 lvl=eror msg="Failed to load containers for scheduled snapshots" err="failed to begin transaction: exec"
t=2019-09-04T11:49:43+0000 lvl=eror msg="Failed to load containers for snapshot expiry" err="failed to begin transaction: exec"
The containers in the cluster, however, were just fine with the web-services running on them responding to the external calls and pings.
The command by @stgraber fixed this issue.
systemctl reload snap.lxd.daemon