snapd is restarting lxd, probably when installing a new version. Yesterday it upgraded it from v3 to v4 and it was hanging for many hours and snapd was spinning on a core. lxd itself was quiesced and unresponsibe to API requests.
I rebooted and it seemed to cure it. Today again snapd seems to have restarted it overnight and the same thing has happened again If I kill the lxd process it will restart and hang again. The last thing in the lxd log is "Initializing global database".
As above.
dmesg)lxc info NAME --show-log)lxc config show NAME --expanded)lxc monitor while reproducing the issue)t=2020-05-27T08:17:09+1000 lvl=info msg="LXD 4.1 is starting in normal mode" path=/var/snap/lxd/common/lxd
t=2020-05-27T08:17:09+1000 lvl=info msg="Kernel uid/gid map:"
t=2020-05-27T08:17:09+1000 lvl=info msg=" - u 0 0 4294967295"
t=2020-05-27T08:17:09+1000 lvl=info msg=" - g 0 0 4294967295"
t=2020-05-27T08:17:09+1000 lvl=info msg="Configured LXD uid/gid map:"
t=2020-05-27T08:17:09+1000 lvl=info msg=" - u 0 1000000 1000000000"
t=2020-05-27T08:17:09+1000 lvl=info msg=" - g 0 1000000 1000000000"
t=2020-05-27T08:17:09+1000 lvl=info msg="Kernel features:"
t=2020-05-27T08:17:09+1000 lvl=info msg=" - netnsid-based network retrieval: yes"
t=2020-05-27T08:17:09+1000 lvl=info msg=" - uevent injection: yes"
t=2020-05-27T08:17:09+1000 lvl=info msg=" - seccomp listener: yes"
t=2020-05-27T08:17:09+1000 lvl=info msg=" - seccomp listener continue syscalls: yes"
t=2020-05-27T08:17:09+1000 lvl=info msg=" - unprivileged file capabilities: yes"
t=2020-05-27T08:17:09+1000 lvl=info msg=" - cgroup layout: hybrid"
t=2020-05-27T08:17:09+1000 lvl=warn msg=" - Couldn't find the CGroup blkio.weight, I/O weight limits will be ignored"
t=2020-05-27T08:17:09+1000 lvl=warn msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored"
t=2020-05-27T08:17:09+1000 lvl=info msg=" - shiftfs support: disabled"
t=2020-05-27T08:17:09+1000 lvl=info msg="Initializing local database"
t=2020-05-27T08:17:10+1000 lvl=info msg="Starting /dev/lxd handler:"
t=2020-05-27T08:17:10+1000 lvl=info msg=" - binding devlxd socket" socket=/var/snap/lxd/common/lxd/devlxd/sock
t=2020-05-27T08:17:10+1000 lvl=info msg="REST API daemon:"
t=2020-05-27T08:17:10+1000 lvl=info msg=" - binding Unix socket" inherited=true socket=/var/snap/lxd/common/lxd/unix.socket
t=2020-05-27T08:17:10+1000 lvl=info msg=" - binding TCP socket" socket=[::]:8443
t=2020-05-27T08:17:10+1000 lvl=info msg="Initializing global database"
@freeekanayaka
@bigjools so to unstick things you'll want to:
pkill -9 lxd that usually motivates itlxd --debug --group lxd to see if you get more details when it hangsIn most cases, it's stuck on a corrupted or otherwise weird last segment, identifying it and removing it usually fixes things.
Is there going to be any private info in the database? I don't care about IP addresses, just stuff from inside the containers. Cheers.
root@jeeves:/var/snap/lxd/common/lxd/database/global# ls -l
total 119852
-rw------- 1 root root 909312 Jan 26 08:26 0000000000027273-0000000000027362
-rw------- 1 root root 599304 Jan 27 04:31 0000000000027363-0000000000027421
-rw------- 1 root root 268680 Jan 27 10:36 0000000000027422-0000000000027448
-rw------- 1 root root 764616 Jan 28 11:36 0000000000027449-0000000000027523
-rw------- 1 root root 4484184 Feb 4 09:51 0000000000027524-0000000000027959
-rw------- 1 root root 1864032 Feb 7 05:01 0000000000027960-0000000000028142
-rw------- 1 root root 7331640 Feb 18 11:52 0000000000028143-0000000000028853
-rw------- 1 root root 830832 Feb 19 15:47 0000000000028854-0000000000028936
-rw------- 1 root root 1198560 Feb 21 10:18 0000000000028937-0000000000029053
-rw------- 1 root root 574464 Feb 22 05:07 0000000000029054-0000000000029109
-rw------- 1 root root 1756488 Feb 24 21:02 0000000000029110-0000000000029280
-rw------- 1 root root 5387688 Mar 2 21:38 0000000000029281-0000000000029782
-rw------- 1 root root 380064 Mar 2 21:48 0000000000029783-0000000000029817
-rw------- 1 root root 7322688 Mar 13 10:52 0000000000029818-0000000000030518
-rw------- 1 root root 586416 Mar 14 03:21 0000000000030519-0000000000030569
-rw------- 1 root root 2231760 Mar 17 11:32 0000000000030570-0000000000030786
-rw------- 1 root root 227352 Mar 17 17:21 0000000000030787-0000000000030809
-rw------- 1 root root 475320 Mar 18 08:36 0000000000030810-0000000000030856
-rw------- 1 root root 785328 Mar 19 10:21 0000000000030857-0000000000030934
-rw------- 1 root root 3223632 Mar 24 10:17 0000000000030935-0000000000031247
-rw------- 1 root root 1285296 Mar 26 08:16 0000000000031248-0000000000031372
-rw------- 1 root root 867936 Mar 27 13:42 0000000000031373-0000000000031457
-rw------- 1 root root 206688 Mar 27 18:01 0000000000031458-0000000000031478
-rw------- 1 root root 785280 Mar 28 19:12 0000000000031479-0000000000031555
-rw------- 1 root root 1735872 Mar 31 09:31 0000000000031556-0000000000031725
-rw------- 1 root root 2087112 Apr 3 11:37 0000000000031726-0000000000031928
-rw------- 1 root root 2401296 Apr 7 00:52 0000000000031929-0000000000032163
-rw------- 1 root root 1058088 Apr 8 13:07 0000000000032164-0000000000032267
-rw------- 1 root root 2625288 Apr 10 15:17 0000000000032268-0000000000032478
-rw------- 1 root root 4896888 Apr 11 15:27 0000000000032479-0000000000032832
-rw------- 1 root root 3280992 Apr 16 05:58 0000000000032833-0000000000033144
-rw------- 1 root root 310008 Apr 16 14:52 0000000000033145-0000000000033175
-rw------- 1 root root 520824 Apr 17 05:37 0000000000033176-0000000000033227
-rw------- 1 root root 433992 Apr 17 19:22 0000000000033228-0000000000033270
-rw------- 1 root root 433992 Apr 18 09:12 0000000000033271-0000000000033313
-rw------- 1 root root 2768712 Apr 22 10:02 0000000000033314-0000000000033578
-rw------- 1 root root 4401528 Apr 29 04:58 0000000000033579-0000000000034006
-rw------- 1 root root 5251776 May 6 11:28 0000000000034007-0000000000034501
-rw------- 1 root root 5868384 May 15 08:58 0000000000034502-0000000000035067
-rw------- 1 root root 7190568 May 26 01:43 0000000000035068-0000000000035757
-rw------- 1 root root 48 May 26 06:34 0000000000035758-0000000000035758
-rw------- 1 root root 124032 May 26 06:40 0000000000035759-0000000000035771
-rw------- 1 root root 48 May 26 09:18 0000000000035772-0000000000035772
-rw------- 1 root root 48 May 26 09:52 0000000000035773-0000000000035773
-rw------- 1 root root 644448 May 27 01:22 0000000000035774-0000000000035832
-rw------- 1 root root 48 May 27 08:05 0000000000035833-0000000000035833
-rw------- 1 root root 48 May 27 08:17 0000000000035834-0000000000035834
-rw------- 1 root root 421888 May 27 01:22 db.bin
-rw------- 1 root root 32768 Sep 16 2018 db.bin-shm
-rw------- 1 root root 3506152 May 27 01:22 db.bin-wal
-rw------- 1 root root 32 Dec 9 16:00 metadata1
-rw------- 1 root root 32 Dec 9 16:00 metadata2
-rw------- 1 root root 8388608 May 27 08:17 open-1
-rw------- 1 root root 8388608 May 27 08:17 open-2
-rw------- 1 root root 8388608 May 27 08:17 open-3
-rw------- 1 root root 2424280 May 6 07:59 snapshot-30-34492-5566138967
-rw------- 1 root root 56 May 6 07:59 snapshot-30-34492-5566138967.meta
-rw------- 1 root root 681520 May 22 03:58 snapshot-30-35516-6934105325
-rw------- 1 root root 56 May 22 03:58 snapshot-30-35516-6934105325.meta
root@jeeves:/var/snap/lxd/common/lxd/database/global# lxd --debug --group lxd
DBUG[05-27|11:31:32] Connecting to a local LXD over a Unix socket
DBUG[05-27|11:31:32] Sending request to LXD method=GET url=http://unix.socket/1.0 etag=
INFO[05-27|11:31:32] LXD 4.1 is starting in normal mode path=/var/snap/lxd/common/lxd
INFO[05-27|11:31:32] Kernel uid/gid map:
INFO[05-27|11:31:32] - u 0 0 4294967295
INFO[05-27|11:31:32] - g 0 0 4294967295
INFO[05-27|11:31:32] Configured LXD uid/gid map:
INFO[05-27|11:31:32] - u 0 1000000 1000000000
INFO[05-27|11:31:32] - g 0 1000000 1000000000
INFO[05-27|11:31:32] Kernel features:
INFO[05-27|11:31:32] - netnsid-based network retrieval: yes
INFO[05-27|11:31:32] - uevent injection: yes
INFO[05-27|11:31:32] - seccomp listener: yes
INFO[05-27|11:31:32] - seccomp listener continue syscalls: yes
INFO[05-27|11:31:32] - unprivileged file capabilities: yes
INFO[05-27|11:31:32] - cgroup layout: hybrid
WARN[05-27|11:31:32] - Couldn't find the CGroup blkio.weight, I/O weight limits will be ignored
WARN[05-27|11:31:32] - Couldn't find the CGroup memory swap accounting, swap limits will be ignored
INFO[05-27|11:31:32] - shiftfs support: yes
INFO[05-27|11:31:32] Initializing local database
DBUG[05-27|11:31:32] Initializing database gateway
DBUG[05-27|11:31:32] Start database node address= role=voter id=1
DBUG[05-27|11:31:33] Connecting to a local LXD over a Unix socket
DBUG[05-27|11:31:33] Sending request to LXD method=GET url=http://unix.socket/1.0 etag=
DBUG[05-27|11:31:33] Detected stale unix socket, deleting
DBUG[05-27|11:31:33] Detected stale unix socket, deleting
INFO[05-27|11:31:33] Starting /dev/lxd handler:
INFO[05-27|11:31:33] - binding devlxd socket socket=/var/snap/lxd/common/lxd/devlxd/sock
INFO[05-27|11:31:33] REST API daemon:
INFO[05-27|11:31:33] - binding Unix socket socket=/var/snap/lxd/common/lxd/unix.socket
INFO[05-27|11:31:33] - binding TCP socket socket=[::]:8443
INFO[05-27|11:31:33] Initializing global database
DBUG[05-27|11:31:33] Dqlite: connected address=1 attempt=0
No, the DB doesn't contain any data about the content of the containers.
It contains the list of all containers, snapshots, images, networks, storage pools, storage volumes, ... and their config.
But LXD doesn't store any runtime data from within the instances in the database.
Sending that over shortly then, thanks.
Ok, so let's try with the 3 segments from today, one at a time:
One at a time, move them aside (or remove them since you have a backup tarball), then try running lxd --debug --group lxd and see if it makes it through then.
I moved the one ending in 34 (first in your list) and it gives:
EROR[05-27|12:11:14] Failed to start the daemon: Failed to start dqlite server: raft_start(): io: closed segment 0000000000035835-0000000000035835 is past last snapshot snapshot-30-35516-6934105325
Should I move aside each one cumulatively or each one on its own?
Hmm, I didn't expect that particular result, so it stills knows about 35835 somehow...
Can you move back any segment you moved away and instead move away the 3 files named open-?
I need @freeekanayaka to give me another lesson about the on-disk structure of dqite :)
I thought I understood snapshots (raft snapshots on clean shutdown, worse case we can wipe everything except them and you'll be back to the 22nd of May), then segments are transactions on top of that getting you to the current state. But I don't recall what's going on with the 3 open files, presumably those are open segments that haven't been committed yet but I'm a bit confused as to why we have 3 of those...
I've removed open* and it hangs as before.
ok, then slightly bigger hammer time, move away:
That should make it forget about the 27th of May and hopefully go back up this time?
Ok done. It comes up but hangs again.
FWIW it recreated the open- files, which are still there after I ctrl-c
Ok and it hangs at the exact same spot of Dqlite: connected address=1 attempt=0?
yes
ok, looks like we'll need that tarball to poke at it some more and track it down as it seems to have gotten into a funny state somehow.
What filesystem does /var/snap/lxd/common/lxd/database reside on?
That's on ext4
Ok, so that should support all the fancy async io bits that dqlite uses, kernel matches what we use on most our test systems too, so really wondering what's making it hang.
Can you e-mail the tarball to stgraber at ubuntu dot com and free dot ekanayaka at canonical dot com?
I'm way past EOD and about to go to bed but Free should be up in the next 4-5 hours and can figure it out for you.
It fixed it last time I rebooted so it doesn't seem permanent.
Maybe some state hanging around somewhere under /tmp?
Ok tarball sent. Cheers!
I had to reboot because for some reason the messing around above stopped all my containers, and one of them is running my DNS. Everything came back up ok afterwards. Hopefully this will give you a clue as to what's wrong.
Hmm, I didn't expect that particular result, so it stills knows about 35835 somehow...
Can you move back any segment you moved away and instead move away the 3 files namedopen-?
This happened because there is a mismatch between the directory listing posted above and the actual directory listing at the time @bigjools tried to remove 0000000000035834-0000000000035834. The actual listing has an additional 0000000000035835-0000000000035835 segment which presumably was added after the initial directory listing was posted in this issue (this would indicate that the database was working fine, or at least to the extent of committing a new transaction).
The additional segment is present in the tarball we got by email, and I can reproduce the same result that surprised @stgraber. However if instead of removing 0000000000035834-0000000000035834 I remove 0000000000035835-0000000000035835 (which is the actual last one), the daemon starts fine, as you were probably expecting to see.
Note though that removing 0000000000035835-0000000000035835 does not seem to be necessary: the database included in the tarball does not seem to be in a corrupted state, and the daemon starts fine with it (in terms of bringing up the database).
Ok so if the DB is OK, why is the daemon hanging at startup :)
That's a good question :) We'd probably need more logging in the daemon to understand where it gets stuck. The fact that you're seeing Dqlite: connected address=1 attempt=0 seems to indicate that the database is opened fine, and probably we get stuck somewhere later (remains to be seen where exactly).
I need @freeekanayaka to give me another lesson about the on-disk structure of dqite :)
Hehe. There is a bit of documentation in the include/raft/uv.h file, I now created https://github.com/canonical/raft/pull/140 to make it a bit more visible. Should help in getting some more insights on the disk format.
I thought I understood snapshots (raft snapshots on clean shutdown, worse case we can wipe everything except them and you'll be back to the 22nd of May)
That's correct. Except that we don't quite take snapshots at shutdown (we probably used to in the hashicorp/raft implementation), we do however dump the database into db.bin, just for safety.
then segments are transactions on top of that getting you to the current state. But I don't recall what's going on with the 3 open files, presumably those are open segments that haven't been committed yet but I'm a bit confused as to why we have 3 of those...
Almost correct. The segment files named MMM-NNN are closed (no more entries will be added to them), while the segment files named open-N are open (more entries might be added to them). There are several open-N segments because they get fallocate'd ahead of time, so we can start writing to a new open segment as soon as the current one gets full. A segment gets closed when it gets full. Note that commitment is in general an orthogonal concept, if you have more than one node you'll typically have entries written on disk (either in open or closed segments) that aren't committed yet, because no quorum was reached yet.
Looking at the source code, the next log message we should see after Initializing global database is Firewall loaded driver, which however doesn't appear in the output above.
@stgraber @tomponline perhaps we get stuck in the
d.firewall = firewall.New()
call in daemon.go:828?
Just thinking loud.
@freeekanayaka if you enable debug logging you should see some additional output such as Firewall detected "nftables" incompatibility if it was trying to load multiple drivers. The function itself runs some commands to detect which firewall driver is in use, so its possible I suppose, but doesn't seem likely unless one of those external commands is hanging perhaps.
When I ran from the CLI with --debug, the last message was still Initializing global database.
And looking at ps fauxww doesn't show any sub-process hanging under lxd?
Built a debug version of 4.1 which will log a LOT more around that part of startup.
If it's hanging in firewall rather than dqlite, with that one, we'll know :)
To use it, do:
sha256 is 2541b4ea07f6a83b02aacd36f68a2a74ce5716e5eba9d1cbe106ab6d67caada3
Built a debug version of 4.1 which will log a LOT more around that part of startup.
If it's hanging in firewall rather than dqlite, with that one, we'll know :)To use it, do:
* curl https://dl.stgraber.org/lxd-7446 -o /var/snap/lxd/common/lxd.debug * chmod +x /var/snap/lxd/common/lxd.debug * lxd --debug --group lxdsha256 is 2541b4ea07f6a83b02aacd36f68a2a74ce5716e5eba9d1cbe106ab6d67caada3
Cool, can you paste the diff wrt 4.1? If you didn't yet, maybe I can add some other debug messages to narrow down the database setup phase as well.
That's on top of current master as 4.1 -> master doesn't have any changes that would prevent a subsequent downgrade.
Ok, looks noisy enough to get started :)
Snapd has not tried to restart lxd today so it's still working fine. I have done a manual restart systemctl restart snap.lxd.daemon and it recreated the issue immediately.
root@jeeves:~# ps -fe|grep lxd
root 1126 1 0 09:10 ? 00:00:00 /bin/sh /snap/lxd/15161/commands/daemon.start
root 1317 1 0 09:10 ? 00:00:00 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
root 1389 1126 2 09:10 ? 00:00:01 lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
root 1390 1126 0 09:10 ? 00:00:00 lxd waitready
root 1391 1126 0 09:10 ? 00:00:00 /bin/sh /snap/lxd/15161/commands/daemon.start
Doing that has also killed all my running containers :angry:
EROR[05-28|09:20:35] stgraber: open db
INFO[05-28|09:20:35] Initializing global database
DBUG[05-28|09:20:35] Dqlite: connected address=1 attempt=0
EROR[05-28|09:20:35] stgraber: OpenCluster success
EROR[05-28|09:20:35] stgraber: DB opened
EROR[05-28|09:20:35] stgraber: in Firewall New
EROR[05-28|09:20:35] stgraber: checking nft
EROR[05-28|09:20:35] stgraber: in nftables compat
EROR[05-28|09:20:35] stgraber: nft: got uname
EROR[05-28|09:20:35] stgraber: nft: found binary
EROR[05-28|09:20:35] stgraber: nft: got version
EROR[05-28|09:20:35] stgraber: nft: compared version
EROR[05-28|09:20:35] stgraber: nft parse: running nft --json nn list ruleset
EROR[05-28|09:20:35] stgraber: nft parse: spawned subcommand
EROR[05-28|09:20:35] stgraber: nft parse: parsed output
EROR[05-28|09:20:35] stgraber: nft parse: processed output
EROR[05-28|09:20:35] stgraber: nft parse: returning
EROR[05-28|09:20:35] stgraber: nft: parsed the rules
EROR[05-28|09:20:35] stgraber: checking xtables
EROR[05-28|09:20:35] stgraber: in xtables compat
EROR[05-28|09:20:35] stgraber: xtables: looking for iptables
EROR[05-28|09:20:35] stgraber: xtables: found iptables
EROR[05-28|09:20:35] stgraber: xtables: checked iptables for compat
EROR[05-28|09:20:35] stgraber: xtables: looking for ip6tables
EROR[05-28|09:20:35] stgraber: xtables: found ip6tables
EROR[05-28|09:20:35] stgraber: xtables: checked ip6tables for compat
EROR[05-28|09:20:35] stgraber: xtables: looking for ebtables
EROR[05-28|09:20:35] stgraber: xtables: found ebtables
EROR[05-28|09:20:35] stgraber: xtables: checked ebtables for compat
EROR[05-28|09:20:35] stgraber: xtables: checking iptables use
and then it hangs
Interesting, thanks for that. Do you see any hanging iptables processes running? Its interesting the longs are apparently out-of-order too.
@freeekanayaka @stgraber this potentially may be the issue https://github.com/lxc/lxd/pull/7453 as each iptables call to check whether each table is in use is using a defer cmd.Wait() which could mean multiple iptables processes waiting to be finished fully. This could conceivably cause locking issues, although I've not been able to re-create the issue so not sure.
Interesting, thanks for that. Do you see any hanging iptables processes running? Its interesting the longs are apparently out-of-order too.
Sorry, I rebooted before I saw your message. I'll check next time.
@bigjools could you also post the output of iptables-save thanks.
@bigjools could you also post the output of
iptables-savethanks.
It's huge, I am running fail2ban on my ssh port!
@bigjools OK, I suspect that may be the issue, I've updated my PR above to use a single call of iptables-save rather than call iptables once for each table, so hopefully that will resolve it.
That would make sense, if lxd gets started at bootup before fail2ban restores the tables.
@bigjools please can you try my custom build with the PR above using the same steps as @stgraber posted earlier, but using https://tomp.uk/files/lxd-7446 instead. Thanks
Will do, I have a plan to do an upgrade to 20.04 later so I'll add that into the mix.
@bigjools thanks, it would be good to test this before the upgrade so that nothing else changes in the test. Thanks
That's the plan since I will have to reboot anyway
I can confirm that your 7446 binary works a charm, thanks!
@bigjools thanks, please can you re-download the binary and re-test as I've modified it to revert to using iptables rather than iptables-save but with fixes to avoid the hang. Thanks
@bigjools also once this lands in the official snap, be sure to remove /var/snap/lxd/common/lxd.debug and reload the LXD service to ensure you're not stuck on the custom build. Thanks
Yes, works fine. Just to confirm, sha256 is:
db4102bdef1874ad0d78127b4c9da4af47f01a4f8bee16af5e758fedfa5d0761
Why is it restarting all the containers when I restart lxd? That seems unnecessary.
@bigjools yep thats the correct has.
@stgraber is restarting lxd intentionally restarting instances? Does "reload" just restart lxd?
systemctl reload snap.lxd.daemon is what you want to just bounce LXD.
systemctl restart snap.lxd.daemon will restart all instances.
Ok thanks. I'm not sure how that's at all useful and seems very dangerous. If you want to restart all containers I think it should be a command under lxd, given that if you restart the daemon, it needs it to start the containers up anyway. Please consider changing this behaviour.
restart is just stop and start as far as the service is concerned and we can't distinguish stop as part of a restart from stop because the system is going down.
Some of that is also made extra hard because of the snap being in charge of the systemd units, so some of the tricks we could use to distinguish between a restart and a host shutdown are not directly accessible to us.
Is there any reason restart can't just be a link to reload ?
Systemd doesn't let you alias restart to reload. systemctl does offer a reload-or-restart command though.
Ok, I'll get to the root problem.
What's really unacceptable is that Snap can restart my containers via
restarting LXD, with no warning, and this Git Issue has highlighted that
for me. I'm at the point where I am going to uninstall snap, because using
this stuff on servers running essential servers is bonkers. I don't know
who decided snap should do this by default and have no way to turn it off,
but I don't know a single person who thinks it's a good idea apart from
people at Canonical (that's not to say anything about you guys, BTW).
Thanks for the help with this one anyway, LXD and you guys keep showing me
how good it is.
Cheers
@bigjools This is actually one of the reasons that forced a decision to pack my own deb packages for LXD. You are welcome to use them also.
One thing worth noting here is that LXD snap refreshes do not call restart, they effectively call reload, specifically because we agree that it's unacceptable for the containers to be restarted on upgrades.
As for the auto-refresh behavior of snaps, there are many ways to control this behavior.
@tomponline is working on a LXD focused bit of documentation for that, explaining the various options available to choose what you may get through refresh (by selecting one of our specific channels), when those refreshes can happen, how to hold a specific snap on your system (or systems) for up to 3 months on a particular revision, how companies can centrally control what revision to deploy and how you can also just bring the whole thing offline if you feel the need to.
All that said, we're certainly fine with folks building from source or using alternative packages, so long as they're able to update such packages to a supported combination of components when reporting issues.
Most helpful comment
One thing worth noting here is that LXD snap refreshes do not call
restart, they effectively callreload, specifically because we agree that it's unacceptable for the containers to be restarted on upgrades.As for the auto-refresh behavior of snaps, there are many ways to control this behavior.
@tomponline is working on a LXD focused bit of documentation for that, explaining the various options available to choose what you may get through refresh (by selecting one of our specific channels), when those refreshes can happen, how to hold a specific snap on your system (or systems) for up to 3 months on a particular revision, how companies can centrally control what revision to deploy and how you can also just bring the whole thing offline if you feel the need to.
All that said, we're certainly fine with folks building from source or using alternative packages, so long as they're able to update such packages to a supported combination of components when reporting issues.