Podman: Expected behaviour of kill SIGKILL from host to podman

Created on 10 Jun 2019 · 16Comments · Source: containers/podman

BUG REPORT? : not sure...

/kind bug

Description

Upon go get ./cmd/... in an application, the container consistently hangs when fetching a specific module. The container becomes un-useable and they only way to kill it is with kill -9 $(pidof podman). This leaves many artifacts behind that effect the behaviour of simple commands like podman images.

Steps to reproduce the issue:

Start golang container and copy across the project

$ podman run -u myuser -it --name app-c -v $PWD:/home/myuser/projects/app:Z -w /home/myuser/projects/app --hostname="app-c"  localhost/el7-base-go /bin/bash

[myuser@app-c app]$ go get ./cmd/... --verbose
go: finding github.com/go-sql-driver/mysql v1.4.1
go: finding github.com/jmoiron/sqlx v1.2.0
go: finding github.com/gorilla/handlers v1.4.0
go: finding github.com/gorilla/mux v1.7.0
go: finding gopkg.in/yaml.v2 v2.2.2
go: finding github.com/go-sql-driver/mysql v1.4.0

Although the issue appears to be go module related, I am unsure how to handle the following situation. The container just hangs with that out.

Describe the results you received:

The following info is available:

$ podman attach app-c

$ podman stats app-c
Error: unable to load cgroup at /libpod_parent/libpod-26c70b225c7728ae16fba9e764bdf98e4633829653183e22e93b875473f917e6: cgroups: cgroup deleted

$ podman stop app-c
Error: container 26c70b225c7728ae16fba9e764bdf98e4633829653183e22e93b875473f917e6 did not die within timeout

$ podman kill app-c
26c70b225c7728ae16fba9e764bdf98e4633829653183e22e93b875473f917e6

$ podman ps
CONTAINER ID  IMAGE                         COMMAND    CREATED         STATUS             PORTS  NAMES
26c70b225c77  localhost/el7-base-go:latest  /bin/bash  15 minutes ago  Up 15 minutes ago         app-c

$ podman events
2019-06-10 20:07:22.732887043 +1000 AEST container stop 26c70b225c7728ae16fba9e764bdf98e4633829653183e22e93b875473f917e6 (image=localhost/el7-base-go:latest, name=app-c)
2019-06-10 20:07:51.875878169 +1000 AEST container kill 26c70b225c7728ae16fba9e764bdf98e4633829653183e22e93b875473f917e6 (image=localhost/el7-base-go:latest, name=app-c)

I am now forced to kill podman from the host.

$ for p in $(pidof podman);do kill -9 $p;done

$ ps aux | grep podman
deefin   13926  0.0  0.0  77860  1884 ?        Ssl  19:52   0:00 /usr/libexec/podman/conmon -c 26c70b225c7728ae16fba9e764bdf98e4633829653183e22e93b875473f917e6 -u 26c70b225c7728ae16fba9e764bdf98e4633829653183e22e93b875473f917e6 -r /usr/bin/runc -b /home/deefin/.local/share/containers/storage/overlay-containers/26c70b225c7728ae16fba9e764bdf98e4633829653183e22e93b875473f917e6/userdata -p /tmp/1000/overlay-containers/26c70b225c7728ae16fba9e764bdf98e4633829653183e22e93b875473f917e6/userdata/pidfile -l /home/deefin/.local/share/containers/storage/overlay-containers/26c70b225c7728ae16fba9e764bdf98e4633829653183e22e93b875473f917e6/userdata/ctr.log --exit-dir /run/user/1000/libpod/tmp/exits --conmon-pidfile /home/deefin/.local/share/containers/storage/overlay-containers/26c70b225c7728ae16fba9e764bdf98e4633829653183e22e93b875473f917e6/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/deefin/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /tmp/1000 --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg cgroupfs --exit-command-arg --tmpdir --exit-command-arg /run/user/1000/libpod/tmp --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 26c70b225c7728ae16fba9e764bdf98e4633829653183e22e93b875473f917e6 --socket-dir-path /run/user/1000/libpod/tmp/socket -t --log-level error

$ kill -9 13926

$ podman images
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x130 pc=0x5611916de493]

goroutine 1 [running]:
panic(0x561192133180, 0x56119320d0f0)
    /usr/lib/golang/src/runtime/panic.go:565 +0x2c9 fp=0xc0008e9730 sp=0xc0008e96a0 pc=0x561190afe7b9
runtime.panicmem(...)
    /usr/lib/golang/src/runtime/panic.go:82
runtime.sigpanic()
    /usr/lib/golang/src/runtime/signal_unix.go:390 +0x415 fp=0xc0008e9760 sp=0xc0008e9730 pc=0x561190b14315
github.com/containers/libpod/libpod/image.(*Runtime).GetImages(0xc0009184b0, 0x2, 0x2, 0xc00072fb80, 0x3e, 0xc0008e9c40)
    /builddir/build/BUILD/libpod-7210727e205c333af9a2d0ed0bb66adcf92a6369/_build/src/github.com/containers/libpod/libpod/image/image.go:443 +0x43 fp=0xc0008e9bc0 sp=0xc0008e9760 pc=0x5611916de493
github.com/containers/libpod/pkg/adapter.(*LocalRuntime).GetImages(0xc000872e50, 0x0, 0x0, 0x0, 0x0, 0x561191aacdd9)
    /builddir/build/BUILD/libpod-7210727e205c333af9a2d0ed0bb66adcf92a6369/_build/src/github.com/containers/libpod/pkg/adapter/runtime.go:74 +0x4d fp=0xc0008e9c50 sp=0xc0008e9bc0 pc=0x561191990d2d
main.imagesCmd(0x56119328dec0, 0x0, 0x0)
    /builddir/build/BUILD/libpod-7210727e205c333af9a2d0ed0bb66adcf92a6369/_build/src/github.com/containers/libpod/cmd/podman/images.go:173 +0x2cf fp=0xc0008e9d70 sp=0xc0008e9c50 pc=0x561191a4f59f
main.glob..func50(0x561193220280, 0x5611932ac908, 0x0, 0x0, 0x0, 0x0)
    /builddir/build/BUILD/libpod-7210727e205c333af9a2d0ed0bb66adcf92a6369/_build/src/github.com/containers/libpod/cmd/podman/images.go:100 +0x88 fp=0xc0008e9d98 sp=0xc0008e9d70 pc=0x561191a94fa8
github.com/containers/libpod/vendor/github.com/spf13/cobra.(*Command).execute(0x561193220280, 0xc00000e090, 0x0, 0x0, 0x561193220280, 0xc00000e090)
    /builddir/build/BUILD/libpod-7210727e205c333af9a2d0ed0bb66adcf92a6369/_build/src/github.com/containers/libpod/vendor/github.com/spf13/cobra/command.go:762 +0x467 fp=0xc0008e9e80 sp=0xc0008e9d98 pc=0x561190ca7ed7
github.com/containers/libpod/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x561193229a80, 0xc0000becc0, 0x7ffdbd950031, 0x6)
    /builddir/build/BUILD/libpod-7210727e205c333af9a2d0ed0bb66adcf92a6369/_build/src/github.com/containers/libpod/vendor/github.com/spf13/cobra/command.go:852 +0x2ee fp=0xc0008e9f50 sp=0xc0008e9e80 pc=0x561190ca897e
github.com/containers/libpod/vendor/github.com/spf13/cobra.(*Command).Execute(...)
    /builddir/build/BUILD/libpod-7210727e205c333af9a2d0ed0bb66adcf92a6369/_build/src/github.com/containers/libpod/vendor/github.com/spf13/cobra/command.go:800
main.main()
    /builddir/build/BUILD/libpod-7210727e205c333af9a2d0ed0bb66adcf92a6369/_build/src/github.com/containers/libpod/cmd/podman/main.go:142 +0x8a fp=0xc0008e9f98 sp=0xc0008e9f50 pc=0x561191a5a58a
runtime.main()
    /usr/lib/golang/src/runtime/proc.go:200 +0x214 fp=0xc0008e9fe0 sp=0xc0008e9f98 pc=0x561190b00514
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc0008e9fe8 sp=0xc0008e9fe0 pc=0x561190b2c4a1

goroutine 2 [force gc (idle)]:
runtime.gopark(0x56119231d068, 0x561193284410, 0x1410, 0x1)
    /usr/lib/golang/src/runtime/proc.go:301 +0xf5 fp=0xc000074fb0 sp=0xc000074f90 pc=0x561190b00915
runtime.goparkunlock(...)
    /usr/lib/golang/src/runtime/proc.go:307
runtime.forcegchelper()
    /usr/lib/golang/src/runtime/proc.go:250 +0xbb fp=0xc000074fe0 sp=0xc000074fb0 pc=0x561190b007ab
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc000074fe8 sp=0xc000074fe0 pc=0x561190b2c4a1
created by runtime.init.6
    /usr/lib/golang/src/runtime/proc.go:239 +0x37

goroutine 3 [GC sweep wait]:
runtime.gopark(0x56119231d068, 0x5611932849c0, 0x140c, 0x1)
    /usr/lib/golang/src/runtime/proc.go:301 +0xf5 fp=0xc0000757a8 sp=0xc000075788 pc=0x561190b00915
runtime.goparkunlock(...)
    /usr/lib/golang/src/runtime/proc.go:307
runtime.bgsweep(0xc00009c000)
    /usr/lib/golang/src/runtime/mgcsweep.go:89 +0x138 fp=0xc0000757d8 sp=0xc0000757a8 pc=0x561190af3cc8
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc0000757e0 sp=0xc0000757d8 pc=0x561190b2c4a1
created by runtime.gcenable
    /usr/lib/golang/src/runtime/mgc.go:208 +0x5a

goroutine 4 [finalizer wait]:
runtime.gopark(0x56119231d068, 0x5611932ac868, 0x140f, 0x1)
    /usr/lib/golang/src/runtime/proc.go:301 +0xf5 fp=0xc000075f58 sp=0xc000075f38 pc=0x561190b00915
runtime.goparkunlock(...)
    /usr/lib/golang/src/runtime/proc.go:307
runtime.runfinq()
    /usr/lib/golang/src/runtime/mfinal.go:175 +0xaa fp=0xc000075fe0 sp=0xc000075f58 pc=0x561190aea80a
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc000075fe8 sp=0xc000075fe0 pc=0x561190b2c4a1
created by runtime.createfing
    /usr/lib/golang/src/runtime/mfinal.go:156 +0x63

goroutine 5 [timer goroutine (idle)]:
runtime.gopark(0x56119231d068, 0x561193293b20, 0x5611912b1414, 0x1)
    /usr/lib/golang/src/runtime/proc.go:301 +0xf5 fp=0xc000074760 sp=0xc000074740 pc=0x561190b00915
runtime.goparkunlock(...)
    /usr/lib/golang/src/runtime/proc.go:307
runtime.timerproc(0x561193293b20)
    /usr/lib/golang/src/runtime/time.go:303 +0x277 fp=0xc0000747d8 sp=0xc000074760 pc=0x561190b1d7e7
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc0000747e0 sp=0xc0000747d8 pc=0x561190b2c4a1
created by runtime.(*timersBucket).addtimerLocked
    /usr/lib/golang/src/runtime/time.go:169 +0x110

goroutine 6 [syscall]:
runtime.notetsleepg(0x5611932acf60, 0xffffffffffffffff, 0x0)
    /usr/lib/golang/src/runtime/lock_futex.go:227 +0x38 fp=0xc000076798 sp=0xc000076768 pc=0x561190adcf88
os/signal.signal_recv(0x0)
    /usr/lib/golang/src/runtime/sigqueue.go:139 +0x9e fp=0xc0000767c0 sp=0xc000076798 pc=0x561190b1511e
os/signal.loop()
    /usr/lib/golang/src/os/signal/signal_unix.go:23 +0x24 fp=0xc0000767e0 sp=0xc0000767c0 pc=0x561191062e24
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc0000767e8 sp=0xc0000767e0 pc=0x561190b2c4a1
created by os/signal.init.0
    /usr/lib/golang/src/os/signal/signal_unix.go:29 +0x43

goroutine 7 [GC worker (idle)]:
runtime.gopark(0x56119231cf00, 0xc000451ac0, 0x1417, 0x0)
    /usr/lib/golang/src/runtime/proc.go:301 +0xf5 fp=0xc000076f60 sp=0xc000076f40 pc=0x561190b00915
runtime.gcBgMarkWorker(0xc000054000)
    /usr/lib/golang/src/runtime/mgc.go:1836 +0x105 fp=0xc000076fd8 sp=0xc000076f60 pc=0x561190aee325
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc000076fe0 sp=0xc000076fd8 pc=0x561190b2c4a1
created by runtime.gcBgMarkStartWorkers
    /usr/lib/golang/src/runtime/mgc.go:1784 +0x79

goroutine 8 [GC worker (idle)]:
runtime.gopark(0x56119231cf00, 0xc000451ad0, 0x1417, 0x0)
    /usr/lib/golang/src/runtime/proc.go:301 +0xf5 fp=0xc000077760 sp=0xc000077740 pc=0x561190b00915
runtime.gcBgMarkWorker(0xc000056500)
    /usr/lib/golang/src/runtime/mgc.go:1836 +0x105 fp=0xc0000777d8 sp=0xc000077760 pc=0x561190aee325
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc0000777e0 sp=0xc0000777d8 pc=0x561190b2c4a1
created by runtime.gcBgMarkStartWorkers
    /usr/lib/golang/src/runtime/mgc.go:1784 +0x79

goroutine 9 [GC worker (idle)]:
runtime.gopark(0x56119231cf00, 0xc000451ae0, 0x1417, 0x0)
    /usr/lib/golang/src/runtime/proc.go:301 +0xf5 fp=0xc000077f60 sp=0xc000077f40 pc=0x561190b00915
runtime.gcBgMarkWorker(0xc000058a00)
    /usr/lib/golang/src/runtime/mgc.go:1836 +0x105 fp=0xc000077fd8 sp=0xc000077f60 pc=0x561190aee325
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc000077fe0 sp=0xc000077fd8 pc=0x561190b2c4a1
created by runtime.gcBgMarkStartWorkers
    /usr/lib/golang/src/runtime/mgc.go:1784 +0x79

goroutine 10 [GC worker (idle)]:
runtime.gopark(0x56119231cf00, 0xc000451af0, 0x1417, 0x0)
    /usr/lib/golang/src/runtime/proc.go:301 +0xf5 fp=0xc000070760 sp=0xc000070740 pc=0x561190b00915
runtime.gcBgMarkWorker(0xc00005af00)
    /usr/lib/golang/src/runtime/mgc.go:1836 +0x105 fp=0xc0000707d8 sp=0xc000070760 pc=0x561190aee325
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc0000707e0 sp=0xc0000707d8 pc=0x561190b2c4a1
created by runtime.gcBgMarkStartWorkers
    /usr/lib/golang/src/runtime/mgc.go:1784 +0x79

goroutine 18 [GC worker (idle)]:
runtime.gopark(0x56119231cf00, 0xc000451b00, 0x1417, 0x0)
    /usr/lib/golang/src/runtime/proc.go:301 +0xf5 fp=0xc0004f8760 sp=0xc0004f8740 pc=0x561190b00915
runtime.gcBgMarkWorker(0xc00005d400)
    /usr/lib/golang/src/runtime/mgc.go:1836 +0x105 fp=0xc0004f87d8 sp=0xc0004f8760 pc=0x561190aee325
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc0004f87e0 sp=0xc0004f87d8 pc=0x561190b2c4a1
created by runtime.gcBgMarkStartWorkers
    /usr/lib/golang/src/runtime/mgc.go:1784 +0x79

goroutine 11 [GC worker (idle)]:
runtime.gopark(0x56119231cf00, 0xc000451b10, 0x1417, 0x0)
    /usr/lib/golang/src/runtime/proc.go:301 +0xf5 fp=0xc000070f60 sp=0xc000070f40 pc=0x561190b00915
runtime.gcBgMarkWorker(0xc00005f900)
    /usr/lib/golang/src/runtime/mgc.go:1836 +0x105 fp=0xc000070fd8 sp=0xc000070f60 pc=0x561190aee325
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc000070fe0 sp=0xc000070fd8 pc=0x561190b2c4a1
created by runtime.gcBgMarkStartWorkers
    /usr/lib/golang/src/runtime/mgc.go:1784 +0x79

goroutine 12 [GC worker (idle)]:
runtime.gopark(0x56119231cf00, 0xc000504000, 0x1417, 0x0)
    /usr/lib/golang/src/runtime/proc.go:301 +0xf5 fp=0xc000071760 sp=0xc000071740 pc=0x561190b00915
runtime.gcBgMarkWorker(0xc000062000)
    /usr/lib/golang/src/runtime/mgc.go:1836 +0x105 fp=0xc0000717d8 sp=0xc000071760 pc=0x561190aee325
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc0000717e0 sp=0xc0000717d8 pc=0x561190b2c4a1
created by runtime.gcBgMarkStartWorkers
    /usr/lib/golang/src/runtime/mgc.go:1784 +0x79

goroutine 13 [GC worker (idle)]:
runtime.gopark(0x56119231cf00, 0xc000504010, 0x1417, 0x0)
    /usr/lib/golang/src/runtime/proc.go:301 +0xf5 fp=0xc000071f60 sp=0xc000071f40 pc=0x561190b00915
runtime.gcBgMarkWorker(0xc000064500)
    /usr/lib/golang/src/runtime/mgc.go:1836 +0x105 fp=0xc000071fd8 sp=0xc000071f60 pc=0x561190aee325
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc000071fe0 sp=0xc000071fd8 pc=0x561190b2c4a1
created by runtime.gcBgMarkStartWorkers
    /usr/lib/golang/src/runtime/mgc.go:1784 +0x79

goroutine 14 [chan receive]:
runtime.gopark(0x56119231d068, 0xc000378058, 0x170d, 0x3)
    /usr/lib/golang/src/runtime/proc.go:301 +0xf5 fp=0xc0004fa6d0 sp=0xc0004fa6b0 pc=0x561190b00915
runtime.goparkunlock(...)
    /usr/lib/golang/src/runtime/proc.go:307
runtime.chanrecv(0xc000378000, 0xc0004fa7b0, 0xc0002e2001, 0xc000378000)
    /usr/lib/golang/src/runtime/chan.go:524 +0x2ee fp=0xc0004fa760 sp=0xc0004fa6d0 pc=0x561190ad806e
runtime.chanrecv2(0xc000378000, 0xc0004fa7b0, 0x0)
    /usr/lib/golang/src/runtime/chan.go:411 +0x2b fp=0xc0004fa790 sp=0xc0004fa760 pc=0x561190ad7d6b
github.com/containers/libpod/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0x561193285bc0)
    /builddir/build/BUILD/libpod-7210727e205c333af9a2d0ed0bb66adcf92a6369/_build/src/github.com/containers/libpod/vendor/github.com/golang/glog/glog.go:882 +0x8d fp=0xc0004fa7d8 sp=0xc0004fa790 pc=0x5611913f056d
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc0004fa7e0 sp=0xc0004fa7d8 pc=0x561190b2c4a1
created by github.com/containers/libpod/vendor/github.com/golang/glog.init.0
    /builddir/build/BUILD/libpod-7210727e205c333af9a2d0ed0bb66adcf92a6369/_build/src/github.com/containers/libpod/vendor/github.com/golang/glog/glog.go:410 +0x274

goroutine 34 [syscall]:
runtime.notetsleepg(0x561193293c40, 0x6fc234bfe, 0x0)
    /usr/lib/golang/src/runtime/lock_futex.go:227 +0x38 fp=0xc0004f5f60 sp=0xc0004f5f30 pc=0x561190adcf88
runtime.timerproc(0x561193293c20)
    /usr/lib/golang/src/runtime/time.go:311 +0x2ee fp=0xc0004f5fd8 sp=0xc0004f5f60 pc=0x561190b1d85e
runtime.goexit()
    /usr/lib/golang/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc0004f5fe0 sp=0xc0004f5fd8 pc=0x561190b2c4a1
created by runtime.(*timersBucket).addtimerLocked
    /usr/lib/golang/src/runtime/time.go:169 +0x110
[1]    18721 abort (core dumped)  podman images

Only after deleting the bolt_state.db does podman become usable again.

$ rm /home/deefin/.local/share/containers/storage/libpod/bolt_state.db

I then have to force remove the image

$ buildah rmi a6beb02e9bd8
Could not remove image "a6beb02e9bd8" (must force) - container "26c70b225c7728ae16fba9e764bdf98e4633829653183e22e93b875473f917e6" is using its reference image: image is in use by a container
ERRO[0000] exit status 1                                

$ buildah rmi -f a6beb02e9bd8
a6beb02e9bd8ca0da34a034e51d28a36762fb2df29017718cfb113febb0b2600

Describe the results you expected:

Is this whats expected when a container freezes?

Is there a standard or better way to handle this?

Output of podman version:

$ podman version
Version:            1.3.1
RemoteAPI Version:  1
Go Version:         go1.12.2
OS/Arch:            linux/amd64

Output of podman info --debug:

$ podman info --debug
debug:
  compiler: gc
  git commit: ""
  go version: go1.12.2
  podman version: 1.3.1
host:
  BuildahVersion: 1.8.2
  Conmon:
    package: podman-1.3.1-1.git7210727.fc30.x86_64
    path: /usr/libexec/podman/conmon
    version: 'conmon version 1.12.0-dev, commit: c9a4c48d1bff85033b7fc9b62d25961dd5048689'
  Distribution:
    distribution: fedora
    version: "30"
  MemFree: 3593756672
  MemTotal: 16670965760
  OCIRuntime:
    package: runc-1.0.0-93.dev.gitb9b6cc6.fc30.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc8+dev
      commit: e3b4c1108f7d1bf0d09ab612ea09927d9b59b4e3
      spec: 1.0.1-dev
  SwapFree: 8405381120
  SwapTotal: 8405381120
  arch: amd64
  cpus: 8
  hostname: deefin
  kernel: 5.1.6-300.fc30.x86_64
  os: linux
  rootless: true
  uptime: 3h 45m 17s (Approximately 0.12 days)
registries:
  blocked: null
  insecure:
  - registry.local:5000
  search:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  ConfigFile: /home/deefin/.config/containers/storage.conf
  ContainerStore:
    number: 6
  GraphDriverName: overlay
  GraphOptions:
  - overlay.mount_program=/usr/bin/fuse-overlayfs
  GraphRoot: /home/deefin/.local/share/containers/storage
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 2
  RunRoot: /tmp/1000
  VolumePath: /home/deefin/.local/share/containers/storage/volumes

Additional environment details (AWS, VirtualBox, physical, etc.):

$ cat /etc/*-release
Fedora release 30 (Thirty)
NAME=Fedora
VERSION="30 (Workstation Edition)"
ID=fedora
VERSION_ID=30
VERSION_CODENAME=""
PLATFORM_ID="platform:f30"
PRETTY_NAME="Fedora 30 (Workstation Edition)"
ANSI_COLOR="0;34"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:30"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f30/system-administrators-guide/"
SUPPORT_URL="https://fedoraproject.org/wiki/Communicating_and_getting_help"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=30
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=30
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Workstation Edition"
VARIANT_ID=workstation
Fedora release 30 (Thirty)
Fedora release 30 (Thirty)

kinbug rootless

Source

dofinn

Most helpful comment

Hm. @giuseppe he's manually killing conmon - aren't we using conmon to hold open the namespaces for rootless? What happens if that gets a SIGKILL?

that should not be an issue anymore with the pause process.

The issue seems related to fuse-overlayfs, could you try the last version from bodhi? It solves exactly an issue where flock(2) would hang the fuse-overlayfs process and the container: https://bodhi.fedoraproject.org/updates/FEDORA-2019-fff1ded16e

giuseppe on 11 Jun 2019

🚀1 👍1

All 16 comments

@mheon PTAL

rhatdan on 10 Jun 2019

To update, I can only reproduce this behaviour when using go get with a go.mod. Outside the project dir, I can go get github.com/go-sql-driver/mysql without issues...

[myuser@app-c1 app]$ go version
go version go1.12.5 linux/amd64

dofinn on 10 Jun 2019

There seem to be two issues here.

The first is a stalled go get in a container, which could be a bug.

The second is kill -9 leaving artifacts around. That one's a lot easier to answer. We recommend that you never manually SIGKILL a container. If you have a nonresponsive container, podman kill and podman stop should be used instead; when the container exits, the Podman process will detect this and exit. If you're hitting Podman itself with a -9, you're also not killing the container, just the frontend; the container is daemonized with Conmon monitoring it.

That said, it seems like your container has managed to survive SIGKILL from podman stop and is, nearest Podman can tell, still running. Since that shouldn't be possible under normal circumstances, it's probably zombified, stuck in uninterruptable IO sleep. That's definitely not good, but I don't think it's Podman's fault - more likely whatever is running in the container.

The segfault on podman images looks like a real, serious bug. I'll poke around.

mheon on 10 Jun 2019

(Also, if you podman stop a container and it's still running, kill -9 isn't going to help - we've already done that as part of podman stop)

mheon on 10 Jun 2019

images segfault probably a nil pointer in the image runtime's store - which can potentially happen if we're running rootless, but probably shouldn't.

mheon on 10 Jun 2019

Hm. @giuseppe he's manually killing conmon - aren't we using conmon to hold open the namespaces for rootless? What happens if that gets a SIGKILL?

mheon on 10 Jun 2019

Hm. @giuseppe he's manually killing conmon - aren't we using conmon to hold open the namespaces for rootless? What happens if that gets a SIGKILL?

that should not be an issue anymore with the pause process.

giuseppe on 11 Jun 2019

🚀1 👍1

Thanks for the response @mheon

(Also, if you podman stop a container and it's still running, kill -9 isn't going to help - we've already done that as part of podman stop)

I disagree, podman kill fails to kill the running container and this is evident when issuing a podman ps.

Thus far the only solution I have had is to kill -9 from my host user. Im more then happy to keep debugging the go get side of things. Ill have a look through podman code and see if i can get it to spew anywhere... If you have any recommendations on where to start, please feel free to advise :)

images segfault probably a nil pointer in the image runtime's store - which can potentially happen if we're running rootless, but probably shouldn't.

I can 100% confirm this implementation has been rootless

Hm. @giuseppe he's manually killing conmon - aren't we using conmon to hold open the namespaces for rootless? What happens if that gets a SIGKILL?

After replicating, I can see no change in namespaces pre/post killing conmon. I do get booted from from my podman unshare namespace when i kill with -9.

$ lsns | grep 'fuse\|bash'
4026532674 user       12 17308 deefin /usr/bin/fuse-overlayfs -o lowerdir=/home/deefin/.local/share/containers/storage/overlay/l/UL73MVPSNKKGHH4JPF75PQVPK7:/home/deefin/.local/share/containers/storage/overlay/l/KQ4JDPWGAIXVOTFKW67CSRUL6F,upperdir=/home/deefin/.local/share/containers/storage/overlay/99def45a2bab9ae00900acf0ad7bf2022a6a310be2795622c5d74d29979a2165/diff,workdir=/home/deefin/.local/share/containers/storage/overlay/99def45a2bab9ae00900acf0ad7bf2022a6a310be2795622c5d74d29979a2165/work,context="system_u:object_r:container_file_t:s0:c374,c444" /home/deefin/.local/share/containers/storage/overlay/99def45a2bab9ae00900acf0ad7bf2022a6a310be2795622c5d74d29979a2165/merged
4026532685 mnt         2 17308 deefin /usr/bin/fuse-overlayfs -o lowerdir=/home/deefin/.local/share/containers/storage/overlay/l/UL73MVPSNKKGHH4JPF75PQVPK7:/home/deefin/.local/share/containers/storage/overlay/l/KQ4JDPWGAIXVOTFKW67CSRUL6F,upperdir=/home/deefin/.local/share/containers/storage/overlay/99def45a2bab9ae00900acf0ad7bf2022a6a310be2795622c5d74d29979a2165/diff,workdir=/home/deefin/.local/share/containers/storage/overlay/99def45a2bab9ae00900acf0ad7bf2022a6a310be2795622c5d74d29979a2165/work,context="system_u:object_r:container_file_t:s0:c374,c444" /home/deefin/.local/share/containers/storage/overlay/99def45a2bab9ae00900acf0ad7bf2022a6a310be2795622c5d74d29979a2165/merged
4026532686 mnt        10 17322 100999 /bin/bash
4026532687 uts        10 17322 100999 /bin/bash
4026532688 ipc        10 17322 100999 /bin/bash
4026532689 pid        10 17322 100999 /bin/bash
4026532691 net        10 17322 100999 /bin/bash

In reference to my previous output of the below.

$ podman stats app-c
Error: unable to load cgroup at /libpod_parent/libpod-26c70b225c7728ae16fba9e764bdf98e4633829653183e22e93b875473f917e6: cgroups: cgroup deleted

This might related? I get it when debugging podman run

WARN[0000] Failed to add conmon to cgroupfs sandbox cgroup: mkdir /sys/fs/cgroup/systemd/libpod_parent: permission denied

dofinn on 11 Jun 2019

Hm. @giuseppe he's manually killing conmon - aren't we using conmon to hold open the namespaces for rootless? What happens if that gets a SIGKILL?

that should not be an issue anymore with the pause process.

The issue seems related to fuse-overlayfs, could you try the last version from bodhi? It solves exactly an issue where flock(2) would hang the fuse-overlayfs process and the container: https://bodhi.fedoraproject.org/updates/FEDORA-2019-fff1ded16e

this resolved the go get issue :)

Is that WARN anything to worry about then?

dofinn on 11 Jun 2019

Thanks for the response @mheon

(Also, if you podman stop a container and it's still running, kill -9 isn't going to help - we've already done that as part of podman stop)

I disagree, podman kill fails to kill the running container and this is evident when issuing a podman ps.

Thus far the only solution I have had is to kill -9 from my host user. Im more then happy to keep debugging the go get side of things. Ill have a look through podman code and see if i can get it to spew anywhere... If you have any recommendations on where to start, please feel free to advise :)

Again, that won't help - we've tried that already as part of podman stop. You're killing Podman's command line process and conmon, our container monitor process, but the container itself is still running - if you grep through ps I bet you'll find a go get still running, zombified.

You can kill the Podman frontend without consequence - it's not actually frozen, just attached to a frozen container. You should be able to detach from it (default keys Control-p Control-q) or hit it with a SIGINT and it will close.

Killing conmon is more problematic. It's monitoring the container, and we use it to determine what the container's status is. The container itself is frozen here, so conmon is sitting there doing nothing, waiting for it to exit. I'd strongly recommend that you leave conmon around in cases like this - it'll keep Podman running smoothly, though you'll probably need a reboot to get rid of the zombie processes.

mheon on 11 Jun 2019

@mheon Thanks for the reply.

I was never able to drop out of the container with SIGINT(crtl-c), I always had to open a new terminal up and kill it that way.

Killing conmon is more problematic. It's monitoring the container, and we use it to determine what the container's status is. The container itself is frozen here, so conmon is sitting there doing nothing, waiting for it to exit. I'd strongly recommend that you leave conmon around in cases like this - it'll keep Podman running smoothly, though you'll probably need a reboot to get rid of the zombie processes.

So is there a recommendation on how to handle the following conditions:

frozen container state
unable to exit container with SIGINT
Unable to kill/stop with podman X

Seems like the only thing to do (besides rebooting) is to try kill all podman procs and leave conmon alone?

dofinn on 12 Jun 2019

Is that WARN anything to worry about then?

no, that is expected as rootless containers cannot (yet) use cgroups.

This is a fuse-overlayfs issue, I think we can close it as it is already been addressed and new packages are on their way

giuseppe on 12 Jun 2019

Please update and tests against the latest fuse-overlayfs.

rhatdan on 12 Jun 2019

Is that WARN anything to worry about then?

no, that is expected as rootless containers cannot (yet) use cgroups.

This is a fuse-overlayfs issue, I think we can close it as it is already been addressed and new packages are on their way

Yes, the latest fuse-overlayfs pkg resolved the go get hang.

Still unsure on how to handle that container state? or is the handling of this state no longer valid with the resolution?

dofinn on 12 Jun 2019

Ideally this happens only very rarely - processes should not get themselves
stuck such that SIGKILL doesn't get rid of them often. In those cases, I
think that manually removing Podman precesses and leaving Conmon around is
the right call.

On Wed, Jun 12, 2019, 05:37 dom finn notifications@github.com wrote:

Is that WARN anything to worry about then?

no, that is expected as rootless containers cannot (yet) use cgroups.

This is a fuse-overlayfs issue, I think we can close it as it is already
been addressed and new packages are on their way

Yes, the latest fuse-overlayfs pkg resolved the go get hang.

Still unsure on how to handle that container state? or is the handling of
this state no longer valid with the resolution?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/containers/libpod/issues/3289?email_source=notifications&email_token=AB3AOCDFYJ6RGWDREVUVBJ3P2C7UPA5CNFSM4HWSHVP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXP2UZQ#issuecomment-501197414,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB3AOCBTWCI32GDI46N4UILP2C7UPANCNFSM4HWSHVPQ
.