Lxd: "lxc stop" hangs

Created on 2 Oct 2015 · 35Comments · Source: lxc/lxd

Hi,
I have been experimenting with LXD, targeting of moving from "plain" LXC.
(using LXD 0.18-0ubuntu2~ubuntu14. from lxd-stable ppa)

lxc profile create my-default
lxc profile device add my-default "mnt_shared" disk "source=/lxd-mounts/guests.shared"  "path=/lxd-mnt/_shared" readonly=false
lxc launch :ubuntu1404 :d-glu -p my-default -p default
lxc stop   :d-glu
lxc config show :d-glu
lxc config  device add :d-glu "mnt_data" disk source=/lxd-mounts/guests.data/d-glu path=/lxd-mnt/data  readonly=false
lxc config  device add :d-glu "mnt_bak" disk source=/lxd-mounts/guests.bak/d-glu path=/lxd-mnt/bak   readonly=false
lxc config  device add :d-glu "mnt_glu1" disk source=/lxd-mounts/d-glu.glu1 path=/lxd-mnt/glu1  readonly=false
lxc config show :d-glu

lxc config set :d-glu boot.autostart   true
lxc config set :d-glu boot.autostart.delay   10
lxc config set :d-glu boot.autostart.priority   10
lxc config set :d-glu environment.ENV_IS_HERE   yes-it-is
lxc config set :d-glu limits.cpus   2
lxc config set :d-glu limits.memory   1024

Trying to do the actions above, "lxc stop" hangs (and the container keeps running).
If I kill "lxc stop" and then try it again it exits with error: exit status 254 even though the container ends up being stopped.

Ideally I would like to do this w/o having to start/stop the container, ie, by cloning it and leaving it stopped.
So, alternative ways of creating the container with this strategy would also be appreciated.

Thx

Source

joaocc

Most helpful comment

I hit the same issue. I found a few suggestions to get the systemd containers like CentOS v7 to respect SIGPWR. The below seems to work well.

1) Log into the container (lxc exec /bin/bash)

2) Create a new sigpwr.target like so:
ln -s /usr/lib/systemd/system/halt.target /etc/systemd/system/sigpwr.target

3) Force the container to restart so the change takes effect
lxc stop --force
lxc start

Result:
Now you can stop the container with a simple "lxc stop "

I found this in a comment in the below thread:
http://lxc-users.linuxcontainers.narkive.com/ekRrTST6/lxc-stop-doesn-t-stop-centos-waits-for-the-timeout

dmacbride on 14 Jan 2016

👍3

All 35 comments

you could use "lxc init" instead of "lxc launch"

stgraber on 2 Oct 2015

Also looks like github ate chunks of your commands with its markdown processor, you may have to escape some characters to make the report readable :)

stgraber on 2 Oct 2015

Hi. I meant actions "above" :) Thx for the "lxc init" hint.
In the meantime I went back to the trusted lxc/lxc 1.1.
But, regarding the hang-up of "lxc stop", I only get the exit status above. Any ideas?
Thx

joaocc on 2 Oct 2015

Not sure, lxc info containername --show-log may help there

stgraber on 2 Oct 2015

I am facing a similar issue

Environment:

Host OS: Mac OSX
Guest OS: Ubuntu Desktop 14.04

System Info

~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.3 LTS
Release:        14.04
Codename:       trusty


~$ uname -a
Linux openring 3.19.0-25-generic #26~14.04.1-Ubuntu SMP Fri Jul 24 21:16:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux


~$ sudo lxc version
0.19


~$ sudo lxc list
+-------+---------+-----------+------+-----------+-----------+
| NAME  |  STATE  |   IPV4    | IPV6 | EPHEMERAL | SNAPSHOTS |
+-------+---------+-----------+------+-----------+-----------+
| node1 | RUNNING | 10.0.3.26 |      | YES       | 0         |
+-------+---------+-----------+------+-----------+-----------+

# This hangs
~$ sudo lxc stop node1

Work around

~$ ps aux | grep "containers node1"
root      6619  0.0  0.2  75032  4280 ?        Ss   19:27   0:00 [lxc monitor] /var/lib/lxd/containers node1
pratz     7745  0.0  0.0  15916  2028 pts/1    S+   19:35   0:00 grep --color=auto containers node1


~$ sudo kill -9 6619


~$ sudo lxc list
+-------+---------+------+------+-----------+-----------+
| NAME  |  STATE  | IPV4 | IPV6 | EPHEMERAL | SNAPSHOTS |
+-------+---------+------+------+-----------+-----------+
| node1 | STOPPED |      |      | YES       | 0         |
+-------+---------+------+------+-----------+-----------+

pr4th4m on 6 Oct 2015

What's running in the container?

stgraber on 6 Oct 2015

lxc stop does a clean shutdown, in that it sends SIGPWR to the container's init process.

if init in the container doesn't react to the signal or fails to shutdown the container, you get a hang (unless you specify a timeout, in which case you'd get an error).

If your container can't be shutdown properly, you can pass --force to lxc stop which will instead kill the init process, instantly killing the whole container.

stgraber on 6 Oct 2015

Container info

~$ sudo lxc image info 50045c285f19
Fingerprint: 50045c285f19fafb411410c28094779c5aa7ec69a6096bfad5c38674fb059f89
Size: 57MB
Architecture: x86_64
Public: no
Timestamps:
    Created: 2015/10/06 03:22 UTC
    Uploaded: 2015/10/06 13:30 UTC
    Expires: never
Properties:
    description: Centos 7 (amd64)
Aliases:

pr4th4m on 6 Oct 2015

--force option is good, but I do not want to loose assigned ip-address.
If I kill the container, I have to launch it again and a new ip is assigned.

Also, I am not sure if there is some issue with the container image, as I am using the image from

http://images.linuxcontainers.org/images/centos/7/amd64/

pr4th4m on 6 Oct 2015

yeah, wouldn't surprise me if Centos 7's init system doesn't know about SIGPWR.

stgraber on 6 Oct 2015

As for the IP, that seems odd to me, the MAC of the container is static and that's handled at startup time, so even an unclean shutdown should keep the IP address.

stgraber on 6 Oct 2015

stgraber@castiana:~$ lxc launch images:centos/7/amd64 centos
Creating centos done.
Starting centos done.
stgraber@castiana:~$ lxc list centos
+--------+---------+------------+------+-----------+-----------+
|  NAME  |  STATE  |    IPV4    | IPV6 | EPHEMERAL | SNAPSHOTS |
+--------+---------+------------+------+-----------+-----------+
| centos | RUNNING | 10.0.3.225 |      | NO        | 0         |
+--------+---------+------------+------+-----------+-----------+
stgraber@castiana:~$ lxc stop centos --force
stgraber@castiana:~$ lxc start centos
stgraber@castiana:~$ lxc list centos
+--------+---------+------------+------+-----------+-----------+
|  NAME  |  STATE  |    IPV4    | IPV6 | EPHEMERAL | SNAPSHOTS |
+--------+---------+------------+------+-----------+-----------+
| centos | RUNNING | 10.0.3.225 |      | NO        | 0         |
+--------+---------+------------+------+-----------+-----------+
stgraber@castiana:~$

stgraber on 6 Oct 2015

Sorry, not working for me :(

~$ sudo lxc list
+-------+---------+------------+------+-----------+-----------+
| NAME  |  STATE  |    IPV4    | IPV6 | EPHEMERAL | SNAPSHOTS |
+-------+---------+------------+------+-----------+-----------+
| node1 | RUNNING | 10.0.3.243 |      | YES       | 0         |
+-------+---------+------------+------+-----------+-----------+

~$ sudo lxc stop node1 --force

~$ sudo lxc start node1
error: not found

pr4th4m on 6 Oct 2015

Your container is ephemeral so that's expected :)

stgraber on 6 Oct 2015

An ephemeral container is deleted once stopped, that's the very definition of ephemeral.

stgraber on 6 Oct 2015

Oh man!!! my bad, works well

~$ sudo lxc list
+-------+---------+------------+------+-----------+-----------+
| NAME  |  STATE  |    IPV4    | IPV6 | EPHEMERAL | SNAPSHOTS |
+-------+---------+------------+------+-----------+-----------+
| node1 | RUNNING | 10.0.3.244 |      | NO        | 0         |
+-------+---------+------------+------+-----------+-----------+

~$ sudo lxc stop node1 --force

~$ sudo lxc start node1

~$ sudo lxc list
+-------+---------+------------+------+-----------+-----------+
| NAME  |  STATE  |    IPV4    | IPV6 | EPHEMERAL | SNAPSHOTS |
+-------+---------+------------+------+-----------+-----------+
| node1 | RUNNING | 10.0.3.244 |      | NO        | 0         |
+-------+---------+------------+------+-----------+-----------+

pr4th4m on 6 Oct 2015

If SIGPWR does not work for few systems, can we have a fallback of SIGKILL
I do not have much os knowledge, just a suggestion if it helps

pr4th4m on 6 Oct 2015

We can't because LXD itself can't know whether it worked or not. The signal has always been supported by Linux so we'll never get an error sending it and LXD doesn't have any knowledge as to what's running in the container so it can't know whether the container is ignoring the signal or whether it's just taking very long to shutdown.

stgraber on 6 Oct 2015

That's why we have the --timeout and --force options, you can try a clean shutdown, timeout after 30s and then do a forced shutdown.

stgraber on 6 Oct 2015

Ok sounds good (y).
Thank you for contributing to LXD.

pr4th4m on 6 Oct 2015

"lxc stop" always hangs for me

fresh install of Ubuntu 15.10 && Updates / Upgrades

apt-get install bridge-utils (change eth0 to br0 in /etc/network/interfaces)
add-apt-repository ppa:ubuntu-lxc/lxd-stable 
apt-get update
apt-get dist-upgrade
apt-get install lxd 
lxc profile edit default (change bridge to br0)
lxc remote add images images.linuxcontainers.org
lxc launch images:centos/7/amd64 centos

lxc stop centos

hang

lxc launch images:debian/jessie/amd64 debian
lxc stop debian

hang

Easy to reproduce. Just follow the steps above.

Debianer on 18 Dec 2015

I tested lxc stop with a wily guest on a wily host. It works.

CentOS and Debian guests don't. See above.

Debianer on 22 Dec 2015

This might be the same?
https://github.com/lxc/lxc/issues/736

tomposmiko on 22 Dec 2015

I don't know. I use ext4.

Debianer on 22 Dec 2015

The problem here is that init in your container doesn't understand SIGPWR; you can work around it by just always using --force.

tych0 on 23 Dec 2015

Just ran into this myself. For reference, good old sysvinit does work, so using it instead of systemd will make lxc stop work:

lxc exec jessie-amd64-sysvinit -- /bin/bash
# apt-get update && apt-get install sysvinit-core
# exit
lxc stop jessie-amd64-sysvinit --force
lxc start jessie-amd64-sysvinit
lxc exec jessie-amd64-sysvinit -- /bin/bash
# apt-get remove --purge --auto-remove systemd
# exit
lxc stop jessie-amd64-sysvinit

saghul on 12 Jan 2016

I hit the same issue. I found a few suggestions to get the systemd containers like CentOS v7 to respect SIGPWR. The below seems to work well.

1) Log into the container (lxc exec /bin/bash)

2) Create a new sigpwr.target like so:
ln -s /usr/lib/systemd/system/halt.target /etc/systemd/system/sigpwr.target

3) Force the container to restart so the change takes effect
lxc stop --force
lxc start

Result:
Now you can stop the container with a simple "lxc stop "

I found this in a comment in the below thread:
http://lxc-users.linuxcontainers.narkive.com/ekRrTST6/lxc-stop-doesn-t-stop-centos-waits-for-the-timeout

dmacbride on 14 Jan 2016

👍3

@dmacbride I checked a vanilla Debian Jessie image and sigpwr is already pointing to halt.target, but still hangs.

oot@jessietest:~# ls -l /etc/systemd/system
total 28
lrwxrwxrwx 1 root root   37 Jan 10 22:49 default.target -> /lib/systemd/system/multi-user.target
-rw-r--r-- 1 root root  306 Jan 10 22:49 getty-static.service
drwxr-xr-x 2 root root 4096 Jan 10 22:49 getty.target.wants
-rw-r--r-- 1 root root 1538 Jan 10 22:49 [email protected]
drwxr-xr-x 2 root root 4096 Jan 10 22:48 halt.target.wants
drwxr-xr-x 2 root root 4096 Jan 10 22:49 multi-user.target.wants
drwxr-xr-x 2 root root 4096 Jan 10 22:48 poweroff.target.wants
drwxr-xr-x 2 root root 4096 Jan 10 22:48 reboot.target.wants
lrwxrwxrwx 1 root root   31 Jan 10 22:49 sigpwr.target -> /lib/systemd/system/halt.target
lrwxrwxrwx 1 root root    9 Jan 10 22:49 systemd-udevd.service -> /dev/null
lrwxrwxrwx 1 root root    9 Jan 10 22:49 udev.service -> /dev/null

So far, switching to sysvinit has been the only way I found to get Debian Jessie containers to stop gracefully.

saghul on 15 Jan 2016

Hi, i'm also faceing this issue.
host ubuntu 16 with ZFS, lxc is centos7 (i have also updated systemctl / systemd to latest 221) but no luck.

here [https://bbs.archlinux.org/viewtopic.php?id=181032] i have found a trick:

(_from container_)
ln -s /usr/lib/systemd/system/poweroff.target /etc/systemd/system/sigpwr.target

and finally lxc stop works as aspected

x86fantini on 9 Jun 2016

👍1

Similar problem with debian/jessie/amd64

lxc info

  driver: lxc
  driverversion: 2.0.1
  kernel: Linux
  kernelarchitecture: x86_64
  kernelversion: 4.4.0-28-generic
  server: lxd
  serverpid: 2674
  serverversion: 2.0.2
  storage: zfs
  storageversion: "5"
config:
  core.https_address: 0.0.0.0:8443
  core.https_allowed_headers: Content-Type
  core.https_allowed_methods: GET, POST, PUT, DELETE, OPTIONS
  core.https_allowed_origin: '*'
  core.trust_password: true
  storage.zfs_pool_name: zfspool
public: false

lxc init images:debian/jessie/amd64 debian2

 lxc list
+--------+-------+------+------+------------+-----------+
|  NAME  | STATE | IPV4 | IPV6 |    TYPE    | SNAPSHOTS |
+--------+-------+------+------+------------+-----------+
| debian2 | STOPPED |    |    | PERSISTENT |     0      |
+---------+---------+------+------+------------+-----------+

lxc start debian2
+--------+-------+------+------+------------+-----------+
|  NAME  | STATE | IPV4 | IPV6 |    TYPE    | SNAPSHOTS |
+--------+-------+------+------+------------+-----------+
| debian2 | RUNNING |    |    | PERSISTENT |     0      |
+---------+---------+------+------+------------+-----------+

lxc stop --force=true debian2

---------------------------------------------------------
lxc monitor
metadata:
  context:
    ip: '@'
    method: GET
    url: /1.0/containers/debian2
  level: info
  message: handling
timestamp: 2016-07-10T14:42:54.802717899+01:00
type: logging


metadata:
  context:
    ip: '@'
    method: PUT
    url: /1.0/containers/debian2/state
  level: info
  message: handling
timestamp: 2016-07-10T14:42:54.816220805+01:00
type: logging


metadata:
  class: task
  created_at: 2016-07-10T14:42:54.818145207+01:00
  err: ""
  id: 45389589-f4b3-4903-8258-f622984972a9
  may_cancel: false
  metadata: null
  resources:
    containers:
    - /1.0/containers/debian2
  status: Running
  status_code: 103
  updated_at: 2016-07-10T14:42:54.818145207+01:00
timestamp: 2016-07-10T14:42:54.818572589+01:00
type: operation


metadata:
  context: {}
  level: dbug
  message: 'New task operation: 45389589-f4b3-4903-8258-f622984972a9'
timestamp: 2016-07-10T14:42:54.818187735+01:00
type: logging


metadata:
  class: task
  created_at: 2016-07-10T14:42:54.818145207+01:00
  err: ""
  id: 45389589-f4b3-4903-8258-f622984972a9
  may_cancel: false
  metadata: null
  resources:
    containers:
    - /1.0/containers/debian2
  status: Pending
  status_code: 105
  updated_at: 2016-07-10T14:42:54.818145207+01:00
timestamp: 2016-07-10T14:42:54.818467611+01:00
type: operation


metadata:
  context: {}
  level: dbug
  message: 'Started task operation: 45389589-f4b3-4903-8258-f622984972a9'
timestamp: 2016-07-10T14:42:54.818538649+01:00
type: logging


metadata:
  context:
    ip: '@'
    method: GET
    url: /1.0/operations/45389589-f4b3-4903-8258-f622984972a9/wait
  level: info
  message: handling
timestamp: 2016-07-10T14:42:54.826518814+01:00
type: logging
------------------------------------------------------------------
lxc info --show-log debian2
            lxc 20160710145748.548 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.237 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.548 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.627 INFO     lxc_confile - confile.c:config_idmap:1500 - read uid map: type u nsid 0 hostid 165536 range 65536
            lxc 20160710145749.627 INFO     lxc_confile - confile.c:config_idmap:1500 - read uid map: type g nsid 0 hostid 165536 range 65536
            lxc 20160710145749.628 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.639 INFO     lxc_confile - confile.c:config_idmap:1500 - read uid map: type u nsid 0 hostid 165536 range 65536
            lxc 20160710145749.639 INFO     lxc_confile - confile.c:config_idmap:1500 - read uid map: type g nsid 0 hostid 165536 range 65536
            lxc 20160710145749.640 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.640 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.641 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.647 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.648 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.648 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.648 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.648 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.648 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.649 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.649 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.649 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.649 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.649 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.659 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.659 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.659 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.659 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.659 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.659 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.660 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.660 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.660 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.660 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected
            lxc 20160710145749.664 DEBUG    lxc_commands - commands.c:lxc_cmd_handler:893 - peer has disconnected

Something interesting, after I wait for a while ....

lxc list
+--------+-------+------+------+------------+-----------+
|  NAME  | STATE | IPV4 | IPV6 |    TYPE    | SNAPSHOTS |
+--------+-------+------+------+------------+-----------+
| debian2 | FREZZING |    |    | PERSISTENT |     0      |
+---------+---------+------+------+------------+-----------+

rst0git on 11 Jul 2016

Hi, so we've recently merged https://github.com/lxc/lxc/pull/1086 in LXC which aims to detect whether SIGRTMIN+3 is in the blocked signal set of the containers init process. If so, it sends SIGRTMIN+3 as shutdown signal instead of SIGPWR. This should take care of sending the correct shutdown signal to systemd-based init systems as it is the only init system (to our knowledge) which uses SIGRTMIN+3. So the ln -s hack will not be needed anymore.

brauner on 24 Jul 2016

@saghul thanks for solution. Neither Debian Jessie amd64 nor CentOS 7 amd64 doesn't work with ln -s .../poweroff.target .../sigpwr.target hack in my case.