Podman: systemd user script `--healthcheck-command ...` always has Status: starting

Created on 27 Aug 2020 · 20Comments · Source: containers/podman

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Podman healthcheck command never changes status from 'starting' state.

$ podman inspect rqlite-5.4.0 | jq '.[0]["State"]["Healthcheck"]'
{
  "Status": "starting",
  "FailingStreak": 0,
  "Log": null
}

Steps to reproduce the issue:

Generate systemd unit file

podman pull rqlite/rqlite:5.4.0
podman create --name rqlite-5.4.0 rqlite/rqlite:5.4.0
podman generate systemd --files --new --name --restart-policy=always rqlite-5.4.0

Alter ExecStart to:

ExecStart=/usr/bin/podman run \
  --conmon-pidfile %t/container-rqlite-5.4.0.pid \
  --cidfile %t/container-rqlite-5.4.0.ctr-id \
  --cgroups=no-conmon \
  -d \
  --replace \
  --publish 4001:4001 \
  --publish 4002:4002 \
  --healthcheck-command 'CMD-SHELL curl http://localhost:4001 && curl http://localhost:4002 && echo "Okay" && exit 0 || exit 1' \
  --healthcheck-start-period 5s \
  --healthcheck-retries 5 \
  --name rqlite-5.4.0 \
  rqlite/rqlite:5.4.0

Enable and start the service

cp container-rqlite-5.4.0.service $HOME/.config/systemd/user/container-rqlite-5.4.0.service
systemctl --user enable container-rqlite-5.4.0
systemctl --user start container-rqlite-5.4.0
systemctl --user status container-rqlite-5.4.0

Describe the results you received:

$ podman inspect rqlite-5.4.0 | jq '.[0]["State"]["Healthcheck"]'
{
  "Status": "starting",
  "FailingStreak": 0,
  "Log": null
}

Describe the results you expected:

Some log entries showing the healthcheck-command ran
Healthcheck:Status should be Healthy

Additional information you deem important (e.g. issue happens only occasionally):

Reproducible.

Output of podman version:

$ podman version
Version:      2.0.4
API Version:  1
Go Version:   go1.14.4
Built:        Thu Jan  1 10:00:00 1970
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.15.0
  cgroupVersion: v1
  conmon:
    package: 'conmon: /usr/libexec/podman/conmon'
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.18, commit: '
  cpus: 2
  distribution:
    distribution: ubuntu
    version: "18.04"
  eventLogger: file
  hostname: desktop.local.lan
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.4.0-42-generic
  linkmode: dynamic
  memFree: 258641920
  memTotal: 16652058624
  ociRuntime:
    name: runc
    package: 'containerd.io: /usr/bin/runc'
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc10
      commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
      spec: 1.0.1-dev
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: 'slirp4netns: /usr/bin/slirp4netns'
    version: |-
      slirp4netns version 0.4.3
      commit: unknown
  swapFree: 4149723136
  swapTotal: 4156551168
  uptime: 42h 46m 44.24s (Approximately 1.75 days)
registries:
  search:
  - registry.access.redhat.com
  - docker.io
store:
  configFile: /home/<redacted>/.config/containers/storage.conf
  containerStore:
    number: 15
    paused: 0
    running: 1
    stopped: 14
  graphDriverName: vfs
  graphOptions: {}
  graphRoot: /home/<redacted>/.local/share/containers/storage
  graphStatus: {}
  imageStore:
    number: 77
  runRoot: /run/user/1000/containers
  volumePath: /home/<redacted>/.local/share/containers/storage/volumes
version:
  APIVersion: 1
  Built: 0
  BuiltTime: Thu Jan  1 10:00:00 1970
  GitCommit: ""
  GoVersion: go1.14.4
  OsArch: linux/amd64
  Version: 2.0.4

Package info (e.g. output of rpm -q podman or apt list podman):

$ apt list podman
Listing... Done
podman/unknown,now 2.0.4~1 amd64 [installed]

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

Physical

In Progress kinbug

Source

bbros-dev

All 20 comments

Does systemctl --user status <full ctr id>.service works and and shows healthy in the log?

Luap99 on 27 Aug 2020

As requested, I believe this shows the service is started without error:

$ systemctl --user status container-rqlite-5.4.0.service                                                                            
* container-rqlite-5.4.0.service - Podman container-rqlite-5.4.0.service
   Loaded: loaded (/home/<redacted>/.config/systemd/user/container-rqlite-5.4.0.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-08-31 14:36:34 AEST; 15s ago
     Docs: man:podman-generate-systemd(1)
  Process: 8016 ExecStopPost=/usr/bin/podman rm --ignore -f --cidfile /run/user/1000/container-rqlite-5.4.0.ctr-id (code=exited, status=0/SUCCESS)
  Process: 8056 ExecStart=/usr/bin/podman run --conmon-pidfile /run/user/1000/container-rqlite-5.4.0.pid --cidfile /run/user/1000/container-rqlite-5.4.0.ctr-id --cgroups=no-conmon --detach --replace --publish 4001:4001 --publish 4002:4002 --healthcheck-command CMD-SHELL curl http://localhost:4001 && curl http://localhost:4002 && echo "Healthy" && exit 0 || exit 1 --healthcheck-start-period 5s --healthcheck-retries 5 --name rqlite-5.4.0 rqlite/rqlite:5.4.0 (code=exited, status=0/SUCCESS)
  Process: 8055 ExecStartPre=/bin/rm -f /run/user/1000/container-rqlite-5.4.0.pid /run/user/1000/container-rqlite-5.4.0.ctr-id (code=exited, status=0/SUCCESS)
 Main PID: 8250 (conmon)
   CGroup: /user.slice/user-1000.slice/[email protected]/container-rqlite-5.4.0.service
           |-8213 /usr/bin/slirp4netns --disable-host-loopback --mtu 65520 --enable-sandbox --enable-seccomp -c -e 3 -r 4 --netns-type=path /run/user/1000/netns/cni-b8e8efbd-1387-3d19-c79c-582a0af8794d tap0
           |-8215 containers-rootlessport
           |-8223 containers-rootlessport-child
           |-8250 /usr/libexec/podman/conmon --api-version 1 -c 612c66a9b900d4736d03647a7328a009f20d4695355488f81295d9bb8a2c4e85 -u 612c66a9b900d4736d03647a7328a009f20d4695355488f81295d9bb8a2c4e85 -r /usr/bin/runc -b /home/<redacted>/.local/share/containers/storage/vfs-containers/612c66a9b900d4736d03647a7328a009f20d4695355488f81295d9bb8a2c4e85/userdata -p /run/user/1000/containers/vfs-containers/612c66a9b900d4736d03647a7328a009f20d4695355488f81295d9bb8a2c4e85/userdata/pidfile -n rqlite-5.4.0 --exit-dir /run/user/1000/libpod/tmp/exits --socket-dir-path /run/user/1000/libpod/tmp/socket -l k8s-file:/home/<redacted>/.local/share/containers/storage/vfs-containers/612c66a9b900d4736d03647a7328a009f20d4695355488f81295d9bb8a2c4e85/userdata/ctr.log --log-level error --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/user/1000/containers/vfs-containers/612c66a9b900d4736d03647a7328a009f20d4695355488f81295d9bb8a2c4e85/userdata/oci-log --conmon-pidfile /run/user/1000/container-rqlite-5.4.0.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/<redacted>/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /run/user/1000/containers --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg cgroupfs --exit-command-arg --tmpdir --exit-command-arg /run/user/1000/libpod/tmp --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --storage-driver --exit-command-arg vfs --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg container --exit-command-arg cleanup --exit-command-arg 612c66a9b900d4736d03647a7328a009f20d4695355488f81295d9bb8a2c4e85
           `-8267 rqlited -http-addr 0.0.0.0:4001 -raft-addr 0.0.0.0:4002 /rqlite/file/data

Aug 31 14:36:30 desktop.local.lan systemd[1569]: Starting Podman container-rqlite-5.4.0.service...
Aug 31 14:36:34 desktop.local.lan podman[8056]: time="2020-08-31T14:36:34+10:00" level=error msg="exit status 1"
Aug 31 14:36:34 desktop.local.lan podman[8056]: time="2020-08-31T14:36:34+10:00" level=error msg="Unit 612c66a9b900d4736d03647a7328a009f20d4695355488f81295d9bb8a2c4e85.service not found."
Aug 31 14:36:34 desktop.local.lan podman[8056]: 612c66a9b900d4736d03647a7328a009f20d4695355488f81295d9bb8a2c4e85
Aug 31 14:36:34 desktop.local.lan systemd[1569]: Started Podman container-rqlite-5.4.0.service.

On the host I am able to check ports 4001 (and 4002) respond:

$ curl localhost:4001
$ echo $?
0

I am not sure what to make of the two systemctl --user status ... entries:

Aug 31 14:36:34 desktop.local.lan podman[8056]: time="2020-08-31T14:36:34+10:00" level=error msg="exit status 1"
Aug 31 14:36:34 desktop.local.lan podman[8056]: time="2020-08-31T14:36:34+10:00" level=error msg="Unit 612c66a9b900d4736d03647a7328a009f20d4695355488f81295d9bb8a2c4e85.service not found."

bbros-dev on 31 Aug 2020

No I really mean systemctl --user status <full ctr id>.service because podman creates a transient .service and .timer unit with this name to run the healthcheck. I get output like this:

$ systemctl --user status eca453dab7b219b3f43cf726da4e12f08f66e08c72e9d65b9c8f6f1d4bd56d20
● eca453dab7b219b3f43cf726da4e12f08f66e08c72e9d65b9c8f6f1d4bd56d20.service - /home/paul/go/src/github.com/containers/libpod/bin/podman healthcheck run eca453dab7b219b3f43cf726da4e12f08f66e08>
     Loaded: loaded (/run/user/1000/systemd/transient/eca453dab7b219b3f43cf726da4e12f08f66e08c72e9d65b9c8f6f1d4bd56d20.service; transient)
  Transient: yes
     Active: failed (Result: exit-code) since Mon 2020-08-31 13:11:14 CEST; 9s ago
TriggeredBy: ● eca453dab7b219b3f43cf726da4e12f08f66e08c72e9d65b9c8f6f1d4bd56d20.timer
    Process: 12280 ExecStart=/home/paul/go/src/github.com/containers/libpod/bin/podman healthcheck run eca453dab7b219b3f43cf726da4e12f08f66e08c72e9d65b9c8f6f1d4bd56d20 (code=exited, status=1>
   Main PID: 12280 (code=exited, status=1/FAILURE)
        CPU: 88ms

Aug 31 13:11:14 paul-pc systemd[1265]: Started /home/paul/go/src/github.com/containers/libpod/bin/podman healthcheck run eca453dab7b219b3f43cf726da4e12f08f66e08c72e9d65b9c8f6f1d4bd56d20.
Aug 31 13:11:14 paul-pc podman[12280]: 2020-08-31 13:11:14.407376131 +0200 CEST m=+0.062986276 container exec eca453dab7b219b3f43cf726da4e12f08f66e08c72e9d65b9c8f6f1d4bd56d20 (image=docker.i>
Aug 31 13:11:14 paul-pc podman[12280]: unhealthy
Aug 31 13:11:14 paul-pc systemd[1265]: eca453dab7b219b3f43cf726da4e12f08f66e08c72e9d65b9c8f6f1d4bd56d20.service: Main process exited, code=exited, status=1/FAILURE
Aug 31 13:11:14 paul-pc systemd[1265]: eca453dab7b219b3f43cf726da4e12f08f66e08c72e9d65b9c8f6f1d4bd56d20.service: Failed with result 'exit-code'.

podman inspect eca453dab7b219b3f43cf726da4e12f08f66e08c72e9d65b9c8f6f1d4bd56d20 | jq '.[0]["State"]["Healthcheck"]'
{
  "Status": "unhealthy",
  "FailingStreak": 8,
  "Log": [
    {
      "Start": "2020-08-31T13:09:10.382942105+02:00",
      "End": "2020-08-31T13:09:10.481246474+02:00",
      "ExitCode": 1,
      "Output": ""
    },
    {
      "Start": "2020-08-31T13:09:41.382298249+02:00",
      "End": "2020-08-31T13:09:41.488733922+02:00",
      "ExitCode": 1,
      "Output": ""
    },
    {
      "Start": "2020-08-31T13:10:12.385250849+02:00",
      "End": "2020-08-31T13:10:12.461637263+02:00",
      "ExitCode": 1,
      "Output": ""
    },
    {
      "Start": "2020-08-31T13:10:43.386354714+02:00",
      "End": "2020-08-31T13:10:43.492748757+02:00",
      "ExitCode": 1,
      "Output": ""
    },
    {
      "Start": "2020-08-31T13:11:14.382488726+02:00",
      "End": "2020-08-31T13:11:14.451178764+02:00",
      "ExitCode": 1,
      "Output": ""
    }
  ]
}

I am not sure what to make of the two systemctl --user status ... entries:

Aug 31 14:36:34 desktop.local.lan podman[8056]: time="2020-08-31T14:36:34+10:00" level=error msg="exit status 1"
Aug 31 14:36:34 desktop.local.lan podman[8056]: time="2020-08-31T14:36:34+10:00" level=error msg="Unit 612c66a9b900d4736d03647a7328a009f20d4695355488f81295d9bb8a2c4e85.service not found."

The Error seams to be indicating that this special unit did not get created by podman in your case.

Luap99 on 31 Aug 2020

@baude PTAL

mheon on 31 Aug 2020

outside the context of the system business, it seemed to work perfectly. im going to ask @vrothberg to take a peek at this to see if anything systemd is interfering.

baude on 31 Aug 2020

@Luap99 correct, while the container is running ...

$ podman container ls
CONTAINER ID  IMAGE                          COMMAND               CREATED       STATUS           PORTS                             NAMES
612c66a9b900  docker.io/rqlite/rqlite:5.4.0  rqlited -http-add...  14 hours ago  Up 14 hours ago  0.0.0.0:4001-4002->4001-4002/tcp  rqlite-5.4.0

... the intermediate service is not created.

$ systemctl --user status 612c66a9b900d4736d03647a7328a009f20d4695355488f81295d9bb8a2c4e85.service
Unit 612c66a9b900d4736d03647a7328a009f20d4695355488f81295d9bb8a2c4e85.service could not be found.

bbros-dev on 31 Aug 2020

I cannot reproduce the issue. The .service and .timer are always created on my F32 workstation with the latest Podman.

@bbros-dev, can you try running the container manually and see if healthchecks work outside of a systemd unit?

@baude, if I use CMD-SHELL ls / the run fails but it succeeds without the CMD-SHELL. I don't see this documented. What's the purpose of it?

vrothberg on 7 Sep 2020

@vrothberg

First

$ systemctl --user stop container-rqlite-5.4.0.service

* container-rqlite-5.4.0.service - Podman container-rqlite-5.4.0.service
   Loaded: loaded (/home/<redacted>/.config/systemd/user/container-rqlite-5.4.0.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2020-09-08 10:45:42 AEST; 8min ago
     Docs: man:podman-generate-systemd(1)
  Process: 2811 ExecStopPost=/usr/bin/podman rm --ignore -f --cidfile /run/user/1000/container-rqlite-5.4.0.ctr-id (code=exited, status=0/SUCCESS)
  Process: 2744 ExecStop=/usr/bin/podman stop --ignore --cidfile /run/user/1000/container-rqlite-5.4.0.ctr-id -t 10 (code=exited, status=0/SUCCESS)
  Process: 26251 ExecStart=/usr/bin/podman run --conmon-pidfile /run/user/1000/container-rqlite-5.4.0.pid --cidfile /run/user/1000/container-rqlite-5.4.0.ctr-id --cgroups=no-conmon --detach --replace --publish 4001:4001 --publish 4002:4002 --healthcheck-command CMD-SHELL curl http://localhost:4001 && curl http://localhost:4002 && echo "Healthy" && exit 0 || exit 1 --healthcheck-start-period 5s --healthcheck-retries 5 --name rqlite-5.4.0 rqlite/rqlite:5.4.0 (code=exited, status=0/SUCCESS)
  Process: 26242 ExecStartPre=/bin/rm -f /run/user/1000/container-rqlite-5.4.0.pid /run/user/1000/container-rqlite-5.4.0.ctr-id (code=exited, status=0/SUCCESS)
 Main PID: 26496 (code=exited, status=2)
   CGroup: /user.slice/user-1000.slice/[email protected]/container-rqlite-5.4.0.service
           `-2572 /usr/bin/podman

Sep 08 10:41:39 desktop.local.lan podman[26251]: time="2020-09-08T10:41:39+10:00" level=error msg="exit status 1"
Sep 08 10:41:39 desktop.local.lan podman[26251]: time="2020-09-08T10:41:39+10:00" level=error msg="Unit 9b592768bcca63ccf736bc66aedded6ea8ba543c7c798decb099fc83aa447d37.service not found."
Sep 08 10:41:39 desktop.local.lan podman[26251]: 9b592768bcca63ccf736bc66aedded6ea8ba543c7c798decb099fc83aa447d37
Sep 08 10:41:39 desktop.local.lan systemd[2126]: Started Podman container-rqlite-5.4.0.service.
Sep 08 10:45:40 desktop.local.lan systemd[2126]: Stopping Podman container-rqlite-5.4.0.service...
Sep 08 10:45:41 desktop.local.lan podman[2744]: 9b592768bcca63ccf736bc66aedded6ea8ba543c7c798decb099fc83aa447d37
Sep 08 10:45:41 desktop.local.lan systemd[2126]: container-rqlite-5.4.0.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Sep 08 10:45:42 desktop.local.lan podman[2811]: 9b592768bcca63ccf736bc66aedded6ea8ba543c7c798decb099fc83aa447d37
Sep 08 10:45:42 desktop.local.lan systemd[2126]: container-rqlite-5.4.0.service: Failed with result 'exit-code'.
Sep 08 10:45:42 desktop.local.lan systemd[2126]: Stopped Podman container-rqlite-5.4.0.service.

then

$ /usr/bin/podman run --conmon-pidfile /run/user/1000/container-rqlite-5.4.0.pid --cidfile /run/user/1000/container-rqlite-5.4.0.ctr-id --cgroups=no-conmon --detach --replace --publish 4001:4001 --publish 4002:4002 --healthcheck-command CMD-SHELL curl http://localhost:4001 && curl http://localhost:4002 && echo "Healthy" && exit 0 || exit 1 --healthcheck-start-period 5s --healthcheck-retries 5 --name rqlite-5.4.0 qlite/rqlite:5.4.0
Error: container id file exists. Ensure another container is not using it or delete /run/user/1000/container-rqlite-5.4.0.ctr-id
exit
bash: exit: too many arguments

then

$ cat /run/user/1000/container-rqlite-5.4.0.ctr-id
9b592768bcca63ccf736bc66aedded6ea8ba543c7c798decb099fc83aa447d37

I won't clean this up in case you require some additional data from the current state.

Standing-by.

bbros-dev on 8 Sep 2020

Thanks for checking, @bbros-dev!

The Error: container id file exists. [...] forces us to remove the specified id file before we can run the container. Could you remove the file(s) and try again? Once the container is up and running, try running healthchecks again and please also check if the transient systemd timer and service exist (e.g., via systemctl --user $containerID.{service,timer}).

vrothberg on 8 Sep 2020

Possibly more informative error:

$ /usr/bin/podman run --conmon-pidfile /run/user/1000/container-rqlite-5.4.0.pid --cidfile /run/user/1000/container-r
qlite-5.4.0.ctr-id --cgroups=no-conmon --detach --replace --publish 4001:4001 --publish 4002:4002 --healthcheck-command CMD
-SHELL 'curl http://localhost:4001 && curl http://localhost:4002 && echo "Healthy" && exit 0 || exit 1' --healthcheck-start
-period 5s --healthcheck-retries 5 --name rqlite-5.4.0 rqlite/rqlite:5.4.0
Error: invalid reference format

please also check if the transient systemd timer and service exist (e.g., via systemctl --user $containerID.{service,timer}).

The file /run/user/1000/container-rqlite-5.4.0.ctr-id exists but is empty so no containerID service unit files to lookup.

bbros-dev on 9 Sep 2020

Possibly more informative error:

$ /usr/bin/podman run --conmon-pidfile /run/user/1000/container-rqlite-5.4.0.pid --cidfile /run/user/1000/container-r
qlite-5.4.0.ctr-id --cgroups=no-conmon --detach --replace --publish 4001:4001 --publish 4002:4002 --healthcheck-command CMD
-SHELL 'curl http://localhost:4001 && curl http://localhost:4002 && echo "Healthy" && exit 0 || exit 1' --healthcheck-start
-period 5s --healthcheck-retries 5 --name rqlite-5.4.0 rqlite/rqlite:5.4.0
Error: invalid reference format

^ this is missing quotes around the --health-check-command. Adding quotes around that works for me:

/usr/bin/podman run --conmon-pidfile /run/user/1000/container-rqlite-5.4.0.pid --cidfile /run/user/1000/container-rqlite-5.4.0.ctr-id --cgroups=no-conmon --detach --replace --publish 4001:4001 --publish 4002:4002 --healthcheck-command 'CMD-SHELL curl http://localhost:4001 && curl http://localhost:4002 && echo "Healthy" && exit 0 || exit 1' --healthcheck-start-period 5s --healthcheck-retries 5 --name rqlite-5.4.0 rqlite/rqlite:5.4.0

vrothberg on 9 Sep 2020

@bbros-dev, does it work with the corrected quoting?

vrothberg on 15 Sep 2020

Apologies. This will take a few days to get back to.

bbros-dev on 17 Sep 2020

Apolgies @vrothberg that command was missing quotes - not sure why that was.

But if I go back to the original command which has correct quotes around I still see the error:

$ /usr/bin/podman run   --conmon-pidfile ./container-rqlite-5.4.0.pid   --cidfile ./container-rqlite-5.4.0.ctr-id   --cgroups=no-conmon   -d   --replace   --publish 4001:4001   --publish 4002:4002   --healthcheck-command 'CMD-SHELL curl http://localhost:4001 && curl http://localhost:4002 && echo "Okay" && exit 0 || exit 1'   --healthcheck-start-period 5s   --healthcheck-retries 5   --name rqlite-5.4.0   rqlite/rqlite:5.4.0
Trying to pull docker.io/rqlite/rqlite:5.4.0...
Getting image source signatures
Copying blob 7e6591854262 done  
Copying blob 9c461696bc09 done  
Copying blob 45085432511a done  
Copying blob 089d60cb4e0a done  
Copying blob 54aee0b95676 done  
Copying blob 9697ac90a2b5 done  
Copying blob ae590f327014 done  
Copying config 452a727bb4 done  
Writing manifest to image destination
Storing signatures
ERRO[0041] exit status 1                                
ERRO[0041] Unit 8bde150132b0773c3360763ff63636a1d371ff4e28f4eccaf4adc4a8702ef4e0.service not found. 
8bde150132b0773c3360763ff63636a1d371ff4e28f4eccaf4adc4a8702ef4e0

bbros-dev on 27 Sep 2020

Thanks for coming back! I'll set up an Ubuntu VM and see if I can reproduce there.

vrothberg on 28 Sep 2020

Thanks for coming back! I'll set up an Ubuntu VM and see if I can reproduce there.

I can finally reproduce on Ubuntu 18.04.

vrothberg on 29 Sep 2020

Systemd does not like the user namespace:

$ podman unshare
# root@ubuntu:~/podman# strace -s1000 -e trace=%network systemctl --user 
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
getsockopt(3, SOL_SOCKET, SO_RCVBUF, [212992], [4]) = 0
setsockopt(3, SOL_SOCKET, SO_RCVBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted)
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [8388608], 4) = 0
getsockopt(3, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
setsockopt(3, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = -1 EPERM (Operation not permitted)
setsockopt(3, SOL_SOCKET, SO_SNDBUF, [8388608], 4) = 0
connect(3, {sa_family=AF_UNIX, sun_path="/run/user/1000/systemd/private"}, 32) = 0
getsockopt(3, SOL_SOCKET, SO_PEERCRED, {pid=1475, uid=0, gid=0}, [12]) = 0
getsockopt(3, SOL_SOCKET, SO_PEERSEC, "unconfined", [64->10]) = 0
getsockopt(3, SOL_SOCKET, SO_PEERGROUPS, "\376\377\0\0\376\377\0\0\376\377\0\0\376\377\0\0\376\377\0\0\376\377\0\0\0\0\0\0", [256->28]) = 0
getsockopt(3, SOL_SOCKET, SO_ACCEPTCONN, [0], [4]) = 0
getsockname(3, {sa_family=AF_UNIX}, [128->2]) = 0
sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\0AUTH EXTERNAL ", iov_len=15}, {iov_base="30", iov_len=2}, {iov_base="\r\nNEGOTIATE_UNIX_FD\r\nBEGIN\r\n", iov_len=28}], msg_iovlen=3, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 45
getsockopt(3, SOL_SOCKET, SO_PEERCRED, {pid=1475, uid=0, gid=0}, [12]) = 0
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="REJECTED\r\nERROR\r\nERROR\r\n", iov_len=256}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24
Failed to list units: Access denied
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=29541, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
+++ exited with 1 +++

@giuseppe, you may know what's going on? :)
systemd 237

vrothberg on 29 Sep 2020

Doing a better search helped. It's a known bug in systemd (see https://bugzilla.redhat.com/show_bug.cgi?id=1838081) as it reject cross-uid-namespace connections.

Unfortunately, there's nothing Podman can do. I suggest opening a bug against Ubuntu. Note that it works as root.

Thanks again for opening the issue and your help debugging the issue!

vrothberg on 29 Sep 2020

👍1

@baude @rhatdan FYI

vrothberg on 29 Sep 2020

Nice work @vrothberg