podman 🚀 - After podman 2 upgrade, systemd fails to start in containers on cgroups v1 hosts

can you make the image in question available ?

baude on 23 Jun 2020

No.

markstos on 23 Jun 2020

@baude Any idea why ports could be assigned to a second "pause" container instead of the intended one?

markstos on 23 Jun 2020

How is the pod created? Can you provide the command that was used to launch the pod?

mheon on 23 Jun 2020

Also, podman inspect output for both pod and container would be appreciated.

mheon on 23 Jun 2020

@markstos when using pods, all of the ports are assigned to the infra container. That is normal. Then each subsequent container in the pod joins the infra containers namespace. That is one of our definitions of a pod. As @mheon asked, can you provide the pod command used?

baude on 23 Jun 2020

I used a docker-compose.yml file like this:

version: "3.8"
services:
  devenv:
    image: devenv-img
    build:
      context: ./docker/ubuntu-18.04
      args:
        GITHUB_USERS: "markstos"
    container_name: devenv
    security_opt:
      - seccomp:unconfined
         # Expose port 2222 so you can ssh -p 2222 root@localhost 
    ports:
      - "127.0.0.1:2222:22"
      - "127.0.0.1:3000:3000"
    tmpfs:
      - /tmp
      - /run
      - /run/lock
    volumes:
      - "/sys/fs/cgroup:/sys/fs/cgroup:ro"
      - "./:/home/amigo/unity"

podman-compose was used, but had to be patched first:
https://github.com/containers/podman-compose/pull/200/commits/af832769a78fa906c34fff9960b938ef6453f63e

podman-compose up -d
using podman version: podman version 2.0.0
podman pod create --name=unity --share net -p 127.0.0.1:3000:3000 -p 127.0.0.1:2222:22
f7829db54fc270e903fa55be97ae192d131c89a3c476ef0220a3942c8e1192fa
0
podman run --name=devenv -d --pod=unity --security-opt seccomp=unconfined --label io.podman.compose.config-hash=123 --label io.podman.compose.project=unity --label io.podman.compose.version=0.0.1 --label com.docker.compose.container-number=1 --label com.docker.compose.service=devenv --tmpfs /tmp --tmpfs /run --tmpfs /run/lock -v /sys/fs/cgroup:/sys/fs/cgroup:ro -v /home/mark/git/unity/./:/home/amigo/unity --add-host devenv:127.0.0.1 --add-host devenv:127.0.0.1 devenv-img
50edda8bf329296490f771a8785c605415ca3be36171b3970ecba71211a825b9

Here's the inspect output for the container:

 podman inspect devenv
[
    {
        "Id": "50edda8bf329296490f771a8785c605415ca3be36171b3970ecba71211a825b9",
        "Created": "2020-06-23T15:52:29.053978355-04:00",
        "Path": "/usr/bin/fish",
        "Args": [
            "-c",
            "exec /sbin/init --log-target=journal 3>&1"
        ],
        "State": {
            "OciVersion": "1.0.2-dev",
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 2457442,
            "ConmonPid": 2457430,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2020-06-23T15:52:32.468351379-04:00",
            "FinishedAt": "0001-01-01T00:00:00Z",
            "Healthcheck": {
                "Status": "",
                "FailingStreak": 0,
                "Log": null
            }
        },
        "Image": "471497bb87d25cf7d9a2df9acf516901e38c34d93732b628a42ce3e2a2fc5099",
        "ImageName": "localhost/devenv-img:latest",
        "Rootfs": "",
        "Pod": "f7829db54fc270e903fa55be97ae192d131c89a3c476ef0220a3942c8e1192fa",
        "ResolvConfPath": "/run/user/1000/containers/vfs-containers/4054570f5694e73f1297c76e4d59ec482b5e03cf006bc5ebfe63fe44362a6235/userdata/resolv.conf",
        "HostnamePath": "/run/user/1000/containers/vfs-containers/50edda8bf329296490f771a8785c605415ca3be36171b3970ecba71211a825b9/userdata/hostname",
        "HostsPath": "/run/user/1000/containers/vfs-containers/4054570f5694e73f1297c76e4d59ec482b5e03cf006bc5ebfe63fe44362a6235/userdata/hosts",
        "StaticDir": "/home/mark/.local/share/containers/storage/vfs-containers/50edda8bf329296490f771a8785c605415ca3be36171b3970ecba71211a825b9/userdata",
        "OCIConfigPath": "/home/mark/.local/share/containers/storage/vfs-containers/50edda8bf329296490f771a8785c605415ca3be36171b3970ecba71211a825b9/userdata/config.json",
        "OCIRuntime": "runc",
        "LogPath": "/home/mark/.local/share/containers/storage/vfs-containers/50edda8bf329296490f771a8785c605415ca3be36171b3970ecba71211a825b9/userdata/ctr.log",
        "LogTag": "",
        "ConmonPidFile": "/run/user/1000/containers/vfs-containers/50edda8bf329296490f771a8785c605415ca3be36171b3970ecba71211a825b9/userdata/conmon.pid",
        "Name": "devenv",
        "RestartCount": 0,
        "Driver": "vfs",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "EffectiveCaps": [
            "CAP_AUDIT_WRITE",
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FOWNER",
            "CAP_FSETID",
            "CAP_KILL",
            "CAP_MKNOD",
            "CAP_NET_BIND_SERVICE",
            "CAP_NET_RAW",
            "CAP_SETFCAP",
            "CAP_SETGID",
            "CAP_SETPCAP",
            "CAP_SETUID",
            "CAP_SYS_CHROOT"
        ],
        "BoundingCaps": [
            "CAP_AUDIT_WRITE",
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FOWNER",
            "CAP_FSETID",
            "CAP_KILL",
            "CAP_MKNOD",
            "CAP_NET_BIND_SERVICE",
            "CAP_NET_RAW",
            "CAP_SETFCAP",
            "CAP_SETGID",
            "CAP_SETPCAP",
            "CAP_SETUID",
            "CAP_SYS_CHROOT"
        ],
        "ExecIDs": [],
        "GraphDriver": {
            "Name": "vfs",
            "Data": null
        },
        "Mounts": [
            {
                "Type": "bind",
                "Name": "",
                "Source": "/sys/fs/cgroup",
                "Destination": "/sys/fs/cgroup",
                "Driver": "",
                "Mode": "",
                "Options": [
                    "noexec",
                    "nosuid",
                    "nodev",
                    "rbind"
                ],
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Name": "",
                "Source": "/home/mark/Documents/RideAmigos/git/unity",
                "Destination": "/home/amigo/unity",
                "Driver": "",
                "Mode": "",
                "Options": [
                    "rbind"
                ],
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
        "Dependencies": [
            "4054570f5694e73f1297c76e4d59ec482b5e03cf006bc5ebfe63fe44362a6235"
        ],
        "NetworkSettings": {
            "EndpointID": "",
            "Gateway": "",
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "MacAddress": "",
            "Bridge": "",
            "SandboxID": "",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": [],
            "SandboxKey": ""
        },
        "ExitCommand": [
            "/usr/bin/podman",
            "--root",
            "/home/mark/.local/share/containers/storage",
            "--runroot",
            "/run/user/1000/containers",
            "--log-level",
            "error",
            "--cgroup-manager",
            "cgroupfs",
            "--tmpdir",
            "/run/user/1000/libpod/tmp",
            "--runtime",
            "runc",
            "--storage-driver",
            "vfs",
            "--events-backend",
            "file",
            "container",
            "cleanup",
            "50edda8bf329296490f771a8785c605415ca3be36171b3970ecba71211a825b9"
        ],
        "Namespace": "",
        "IsInfra": false,
        "Config": {
            "Hostname": "50edda8bf329",
            "Domainname": "",
            "User": "root",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:~/.yarn/bin",
                "TERM=xterm",
                "container=podman",
                "YARN_VERSION=1.10.1",
                "MONGO_VERSION=4.2.8",
                "NODE_VERSION=12.15.0",
                "LANG=C.UTF-8",
                "MONGO_MAJOR=4.2",
                "GPG_KEYS=E162F504A20CDF15827F718D4B7C549A058F8B6B",
                "HOME=/root",
                "NPM_CONFIG_LOGLEVEL=info",
                "HOSTNAME=50edda8bf329"
            ],
            "Cmd": [
                "-c",
                "exec /sbin/init --log-target=journal 3>&1"
            ],
            "Image": "localhost/devenv-img:latest",
            "Volumes": null,
            "WorkingDir": "/unity",
            "Entrypoint": "/usr/bin/fish",
            "OnBuild": null,
            "Labels": {
                "com.docker.compose.container-number": "1",
                "com.docker.compose.service": "devenv",
                "io.podman.compose.config-hash": "123",
                "io.podman.compose.project": "unity",
                "io.podman.compose.version": "0.0.1",
                "maintainer": "[email protected]"
            },
            "Annotations": {
                "io.container.manager": "libpod",
                "io.kubernetes.cri-o.ContainerType": "container",
                "io.kubernetes.cri-o.Created": "2020-06-23T15:52:29.053978355-04:00",
                "io.kubernetes.cri-o.SandboxID": "unity",
                "io.kubernetes.cri-o.TTY": "false",
                "io.podman.annotations.autoremove": "FALSE",
                "io.podman.annotations.init": "FALSE",
                "io.podman.annotations.privileged": "FALSE",
                "io.podman.annotations.publish-all": "FALSE",
                "io.podman.annotations.seccomp": "unconfined",
                "org.opencontainers.image.stopSignal": "37"
            },
            "StopSignal": 37,
            "CreateCommand": [
                "podman",
                "run",
                "--name=devenv",
                "-d",
                "--pod=unity",
                "--security-opt",
                "seccomp=unconfined",
                "--label",
                "io.podman.compose.config-hash=123",
                "--label",
                "io.podman.compose.project=unity",
                "--label",
                "io.podman.compose.version=0.0.1",
                "--label",
                "com.docker.compose.container-number=1",
                "--label",
                "com.docker.compose.service=devenv",
                "--tmpfs",
                "/tmp",
                "--tmpfs",
                "/run",
                "--tmpfs",
                "/run/lock",
                "-v",
                "/sys/fs/cgroup:/sys/fs/cgroup:ro",
                "-v",
                "/home/mark/Documents/RideAmigos/git/unity/./:/home/amigo/unity",
                "--add-host",
                "devenv:127.0.0.1",
                "--add-host",
                "devenv:127.0.0.1",
                "devenv-img"
            ]
        },
        "HostConfig": {
            "Binds": [
                "/sys/fs/cgroup:/sys/fs/cgroup:ro,rprivate,noexec,nosuid,nodev,rbind",
                "/home/mark/Documents/RideAmigos/git/unity:/home/amigo/unity:rw,rprivate,rbind"
            ],
            "CgroupMode": "host",
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "k8s-file",
                "Config": null
            },
            "NetworkMode": "container:4054570f5694e73f1297c76e4d59ec482b5e03cf006bc5ebfe63fe44362a6235",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": [],
            "CapDrop": [],
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": [
                "devenv:127.0.0.1",
                "devenv:127.0.0.1"
            ],
            "GroupAdd": [],
            "IpcMode": "private",
            "Cgroup": "",
            "Cgroups": "default",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "private",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": [
                "seccomp=unconfined"
            ],
            "Tmpfs": {
                "/run": "rw,rprivate,nosuid,nodev,tmpcopyup",
                "/run/lock": "rw,rprivate,nosuid,nodev,tmpcopyup",
                "/tmp": "rw,rprivate,nosuid,nodev,tmpcopyup"
            },
            "UTSMode": "private",
            "UsernsMode": "",
            "ShmSize": 65536000,
            "Runtime": "oci",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "/libpod_parent/f7829db54fc270e903fa55be97ae192d131c89a3c476ef0220a3942c8e1192fa",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": 0,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": [],
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0
        }
    }
]

I don't see an option to run podman inspect on pods.

markstos on 23 Jun 2020

podman pod inspect

baude on 23 Jun 2020

any chance we can sync up on irc? freenode.net #podman

baude on 23 Jun 2020

btw, can couple of simple things we should have asked. apologies if i missed the information.

can you see the ssh process running with ps
can you ssh directly to the container without involving the port mapping ? i.e. use :22

baude on 23 Jun 2020

all of the ports are assigned to the infra container.

Did I miss this in the docs? It's not intuitive to have port mappings appear on a container other than the one I installed. I wasn't thrilled to see the "pause" container from a third-party service on the internet that I had no intention of pulling down content from either.

markstos on 23 Jun 2020

can you see the ssh process running with ps

No. I presume that means I happened to break my own container about the time I also upgraded podman. I'm trying to get the container running under Docker now as a second point of reference.

markstos on 23 Jun 2020

Network mode is set to another container, which I'm assuming is the infra container (I don't see the ID in question in your first podman ps so perhaps you recreated). Container config on the whole seems fine, so I no longer believe this is a network issue, but is probably related to the SSH daemon itself.

What init are you using in the container, systemd or something else?

mheon on 23 Jun 2020

@baude One obvious thing: podman ps isn't displaying ports correctly.

1.9:

b4b47beefd3d  registry.fedoraproject.org/fedora:latest  bash     1 second ago    Up 1 second ago           0.0.0.0:2222->22/tcp  serene_tu
182529b785b3  registry.fedoraproject.org/fedora:latest  bash     15 seconds ago  Exited (0) 9 seconds ago  0.0.0.0:2222->22/tcp  pensive_chaum
64d111e06042  k8s.gcr.io/pause:3.2                               35 seconds ago  Up 15 seconds ago         0.0.0.0:2222->22/tcp  46ce3d0db44c-infra

2.0:

182529b785b3  registry.fedoraproject.org/fedora:latest  bash     20 seconds ago  Exited (0) 13 seconds ago                        pensive_chaum
3f4e33ba8a41  registry.fedoraproject.org/fedora:latest  bash     5 days ago      Exited (0) 5 days ago                            testctr1
64d111e06042  k8s.gcr.io/pause:3.2                               39 seconds ago  Up 19 seconds ago          0.0.0.0:2222->22/tcp  46ce3d0db44c-infra

mheon on 23 Jun 2020

Hm. It's also ordering containers incorrectly... I'd expect sort to be by time of creation, not by ID.

mheon on 23 Jun 2020

I'm using systemd. I was ssh'ing in fine before the upgrade. But I also have been tweaking the configuration all day, so it could be something on my end.

markstos on 23 Jun 2020

I built a test setup as close to yours as I could given provided information (pod with port 2222 forwarded, container in that pod with systemd as init + sshd, added a user, SSH'd in from another machine to public port, all rootless) and everything worked locally, so I think this is either environment, or some detail of the pod that is not clear from what is given here.

mheon on 23 Jun 2020

I'm on Kubernetes Slack server now. I forgot my IRC password.

markstos on 23 Jun 2020

@mheon Thanks for the attention. I'll test more with Docker as a control group reference and see if I can pinpoint some bug on my end that I introduced.

markstos on 23 Jun 2020

It booted fine with docker-compose up -d but not podman-compose up -d.

The plot thickens.

I'll see if I can some more useful case for you to reproduce from.

markstos on 23 Jun 2020

I've temporarily posted my Dockerfile here:

https://gist.github.com/markstos/9f7b982bc73106e4bb5a73e5524a3ec6

Once you've grabbed it, I'm going to take down the Gist.

markstos on 23 Jun 2020

I believe the last two things I was changing before it broke were setting fish_user_paths, and looping over users to add their SSH keys to authorized_keys-- both happen in the last 20 lines of the file.

markstos on 23 Jun 2020

Grabbed, thanks. It's a little late in the day here, but I'll pick this up tomorrow and see if I can chase something down.

Might be a compose-specific bug, or might be a result of upgrading an existing 1.9 system to 2.0

mheon on 23 Jun 2020

I've reduced the test a case a bit. Here's a script I successfully used to launch the container with 1.9 that fails with 2.0:

#!/bin/bash
podman run --detach \
  --name devenv \
  --security-opt "seccomp=unconfined" \
  --tmpfs /tmp \
  --tmpfs /run \
  --tmpfs /run/lock \
  --volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
  --volume '../../:/home/amigo/unity' \
  --publish "127.0.0.1:2222:22" \
  --publish "127.0.0.1:3000:3000" \
 devenv-img

The result is the same-- it starts without apparent error, but I can't SSH in. This eliminates anything to do with pods.

Using ps I can confirm that there's an init process running running under the expected user account but no sshd process.

I'm going to try to rollback recent changes to my Dockerfile assuming that my changes broke it, not podman.

markstos on 24 Jun 2020

I'd recommend checking the journal within the container to see why sshd is failing. Also, checking if port forwarding works at all would be helpful - if you use 8080:80 with a simple nginx container, can you access it?

mheon on 24 Jun 2020

Partial fix for the podman ps issues I noticed in #6761

mheon on 24 Jun 2020

@mheon how I can check the journal in the container if I can't get into it?

I tried this to narrow down the issue: I rewrote my start command to give me an interactive shell instead of starting system. Then within the shell I started sshd manually with sshd -D-- that's how systemd would start it. Then I tried to SSH in, and that worked. I double checked that systemd is set to start SSH at boot. So something changed which resulted in sshd not running when booted with systemd.

I don't think port-forwarding is the issue, since ps shows no sshd process running.

markstos on 24 Jun 2020

@markstos podman exec -t -i $CONTAINERNAME journalctl?

mheon on 24 Jun 2020

one idea that might pay off would be to run the container (even in the pod) manually with -it /bin/sh and then run the sshd binary by itself. this shoudl let you see if the binary actually runs and the you can check the "wiring". btw, is anything being puked out in the container logs?

baude on 24 Jun 2020

@mheon The command worked, but found no logs.

@baude I did that (noted in a comment from about 30 minutes ago). sshd runs in that context. podman container logs devenv shows nothing.

markstos on 24 Jun 2020

I found the root cause by running with systemd but also -ti so I could watch it boot. I don't know what this means, though:

Welcome to Ubuntu 18.04.4 LTS!

Set hostname to <26058b6f356f>.
Failed to read AF_UNIX datagram queue length, ignoring: No such file or directory
Failed to create /user.slice/user-1000.slice/session-2.scope/init.scope control group: Permission denied
Failed to allocate manager object: Permission denied
[!!!!!!] Failed to allocate manager object, freezing.
Freezing execution.

markstos on 24 Jun 2020

I stepping AFK now, but am in the Kubernetes Slack server if that's helpful.

markstos on 24 Jun 2020

There's a little more logging before that final error message:

systemd 237 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
Detected virtualization container-other.
Detected architecture x86-64.

markstos on 24 Jun 2020

Hm. Can you try removing --volume /sys/fs/cgroup:/sys/fs/cgroup:ro and adding --systemd=always?

This will cause Podman to automatically prepare the container for having systemd run in it, including configuring an appropriate cgroup mount.

mheon on 24 Jun 2020

@mheon I tried that, but got the same result. I found a related RedHat bug about it:

markstos on 25 Jun 2020

https://bugzilla.redhat.com/show_bug.cgi?id=1811920

markstos on 25 Jun 2020

People also ran into the same issue in the past few months after upgrading LXC: https://bugs.funtoo.org/browse/FL-6897

Unprivileged systemd containers quit working for them too.

markstos on 25 Jun 2020

This issue also sounds related, and was fixed only for the root case, not the rootless case.

https://github.com/containers/libpod/issues/1226

markstos on 25 Jun 2020

I tried to generate a reduced-test-case with a container that just contained systemd and sshd, but that triggered a different failure:

podman run --systemd always --privileged -d -p "127.0.0.1:2222:22" minimum2scp/systemd-stretch
Trying to pull docker.io/minimum2scp/systemd-stretch...
Getting image source signatures
Copying blob eec13681aaa4 done
Copying blob 2217437ef5a2 done
Copying blob 82ed86786e13 done
Copying blob 063d2793dea0 done
Copying blob 11a85ad34c0b done
Copying config f03c1e5ac4 done
Writing manifest to image destination
Storing signatures
Error: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"tmpfs\\\" to rootfs \\\"/home/mark/.local/share/containers/storage/vfs/dir/1c2d9d6c99338794c37601792cfe73a34ade17655c87a9ab8f6b8f2c65605ad7\\\" at \\\"/tmp/runctop733130215/runctmpdir915618330\\\" caused \\\"tmpcopyup: failed to copy /home/mark/.local/share/containers/storage/vfs/dir/1c2d9d6c99338794c37601792cfe73a34ade17655c87a9ab8f6b8f2c65605ad7/run to /tmp/runctop733130215/runctmpdir915618330: lchown /tmp/runctop733130215/runctmpdir915618330/initctl: no such file or directory\\\"\"": OCI runtime command not found error

A variation of that that same image produced the same failure mode:

 podman run  -d -p "127.0.0.1:2222:22" minimum2scp/systemd
Trying to pull docker.io/minimum2scp/systemd...
Getting image source signatures
Copying blob d4aaedabb7de done
Copying blob 2b2c197bb397 done
Copying blob c1e7846c2b6e done
Copying blob e51a3c06332d done
Copying blob 938abdf43fa0 done
Copying config af8b425bf0 done
Writing manifest to image destination
Storing signatures
Error: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"tmpfs\\\" to rootfs \\\"/home/mark/.local/share/containers/storage/vfs/dir/ec4f769dd31bda883f7271f7cc68ae37484e36ec29a19881c972b1c8c6fc35f1\\\" at \\\"/tmp/runctop953882458/runctmpdir205823217\\\" caused \\\"tmpcopyup: failed to copy /home/mark/.local/share/containers/storage/vfs/dir/ec4f769dd31bda883f7271f7cc68ae37484e36ec29a19881c972b1c8c6fc35f1/run to /tmp/runctop953882458/runctmpdir205823217: lchown /tmp/runctop953882458/runctmpdir205823217/initctl: no such file or directory\\\"\"": OCI runtime command not found error

markstos on 25 Jun 2020

Great, I have a one-line reduced-test for you that fails in the same way:

 podman run  -d -p "127.0.0.1:2222:22" solita/ubuntu-systemd-ssh

After running this, I can't ssh to the container and ps shows no sshd process running. I'm going to debug a bit more now.

markstos on 25 Jun 2020

Yep, there you go, this is my issue in a nutshell:

 podman run  --systemd=always  -it -p "127.0.0.1:2222:22" solita/ubuntu-systemd-ssh
systemd 229 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN)
Detected virtualization container-other.
Detected architecture x86-64.

Welcome to Ubuntu 16.04.5 LTS!

Set hostname to <81f350354616>.
Initializing machine ID from D-Bus machine ID.
Failed to read AF_UNIX datagram queue length, ignoring: No such file or directory
Failed to install release agent, ignoring: Permission denied
Failed to create /user.slice/user-1000.slice/session-2.scope/init.scope control group: Permission denied
Failed to allocate manager object: Permission denied
[!!!!!!] Failed to allocate manager object, freezing.
Freezing execution.

markstos on 25 Jun 2020

Does anyone have a copy of podman 1.9 handy to confirm if the reduced test case above worked before 2.0?

markstos on 25 Jun 2020

Is the host on Fedora with cgroup V2 or V1? Ubuntu V1?

rhatdan on 25 Jun 2020

Does everything work if you run as root?

rhatdan on 25 Jun 2020

The issue might be with cgroup V1 and systemd not being allowed to write to it.

rhatdan on 25 Jun 2020

The host is Ubuntu 20.04.

On Thu, Jun 25, 2020, 8:21 AM Daniel J Walsh notifications@github.com
wrote:

Does everything work if you run as root?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/containers/libpod/issues/6734#issuecomment-649507331,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAAGJZPINXFKMPSWSH6RB5LRYM6NVANCNFSM4OF56IIQ
.

markstos on 25 Jun 2020

@rhatdan, first it fails as root on Ubuntu 20.04 as well, but with a different error:

sudo podman run  --systemd=always  -it -p "127.0.0.1:2222:22" solita/ubuntu-systemd-ssh
Error: AppArmor profile "container-default" specified but not loaded

The system supports cgroupsv2:

$ grep cgroup /proc/filesystems
nodev   cgroup
nodev   cgroup2

markstos on 25 Jun 2020

@rhatdan Does this work on your system?

podman run  --systemd=always  -it -p "127.0.0.1:2222:22" solita/ubuntu-systemd-ssh

markstos on 25 Jun 2020

I think it's cgroups related:

12:cpuset:/
11:pids:/user.slice/user-1000.slice/session-18506.scope
10:rdma:/
9:perf_event:/
8:blkio:/user.slice
7:net_cls,net_prio:/
6:cpu,cpuacct:/user.slice
5:hugetlb:/
4:devices:/user.slice
3:memory:/user.slice
2:freezer:/
1:name=systemd:/user.slice/user-1000.slice/session-18506.scope
0::/user.slice/user-1000.slice/session-18506.scope

The user slice is persisting from the host.. in the container shouldn't systemd be /?

goochjj on 25 Jun 2020

Maybe rootless can't change that.

goochjj on 25 Jun 2020

@goochjj Does this container start for you with podman 2.0? This worked with 1.9.

podman run  --systemd=always  -it -p "127.0.0.1:2222:22" solita/ubuntu-systemd-ssh

markstos on 25 Jun 2020

Not rootless. I get the same error as you

goochjj on 25 Jun 2020

Do you have any avc messages in dmesg?

goochjj on 25 Jun 2020

It doesn't work on FCOS either on podman 1.9.2:
podman --log-level=debug run --systemd=always --rm -it -p "127.0.0.1:2222:22" solita/ubuntu-systemd-ssh sh -c 'exec /sbin/init --log-level=debug --log-target=console 3>&1'

Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[!!!!!!] Failed to mount API filesystems, freezing.
Freezing execution.

cgroups v2

Jun 25 16:49:31 FCOS audit[1674495]: AVC avc:  denied  { write } for  pid=1674495 comm="systemd" name="libpod-ac7041dcb1d73d819e20e8c15e84a646670b7f204689cce2d4780c0ef0bf30e9.scope" dev="cgroup2" ino=69019 scontext=system_u:system_r:container_t:s0:c303,c698 tcontext=unconfined_u:object_r:cgroup_t:s0 tclass=dir permissive=1
Jun 25 16:49:31 FCOS audit[1674495]: AVC avc:  denied  { add_name } for  pid=1674495 comm="systemd" name="systemd" scontext=system_u:system_r:container_t:s0:c303,c698 tcontext=unconfined_u:object_r:cgroup_t:s0 tclass=dir permissive=1
Jun 25 16:49:32 FCOS kernel: audit: type=1400 audit(1593103771.993:5619): avc:  denied  { write } for  pid=1674495 comm="systemd" name="libpod-ac7041dcb1d73d819e20e8c15e84a646670b7f204689cce2d4780c0ef0bf30e9.scope" dev="cgroup2" ino=69019 scontext=system_u:system_r:container_t:s0:c303,c698 tcontext=unconfined_u:object_r:cgroup_t:s0 tclass=dir permissive=1
Jun 25 16:49:32 FCOS kernel: audit: type=1400 audit(1593103771.993:5619): avc:  denied  { add_name } for  pid=1674495 comm="systemd" name="systemd" scontext=system_u:system_r:container_t:s0:c303,c698 tcontext=unconfined_u:object_r:cgroup_t:s0 tclass=dir permissive=1
Jun 25 16:49:32 FCOS kernel: audit: type=1400 audit(1593103771.993:5619): avc:  denied  { create } for  pid=1674495 comm="systemd" name="systemd" scontext=system_u:system_r:container_t:s0:c303,c698 tcontext=system_u:object_r:cgroup_t:s0 tclass=dir permissive=1
Jun 25 16:49:31 FCOS audit[1674495]: AVC avc:  denied  { create } for  pid=1674495 comm="systemd" name="systemd" scontext=system_u:system_r:container_t:s0:c303,c698 tcontext=system_u:object_r:cgroup_t:s0 tclass=dir permissive=1

goochjj on 25 Jun 2020

Hm. I think we do expect this to work as root. Perhaps systemd updated and now requires some additional permissions, or similar?

mheon on 25 Jun 2020

setsebool -P container_manage_cgroup 1
Need to turn on this boolean

rhatdan on 25 Jun 2020

container_connect_any --> off
container_manage_cgroup --> on
container_use_cephfs --> on

Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[!!!!!!] Failed to mount API filesystems, freezing.
Freezing execution.

For that matter setenforce doesn't fix it. It looks like my problem is different from @markstos though so I don't want to derail this. I'll see if Ubuntu 20.04 works any different.

goochjj on 25 Jun 2020

@rhatdan SELinux disabled, is that expected?

getsebool container_manage_cgroup
getsebool:  SELinux is disabled

I did another run with more debugging enabled. One thing that stands out to me is these three lines:

DEBU[0000] using runtime "/usr/bin/runc"
DEBU[0000] using runtime "/usr/bin/crun"
WARN[0000] Error initializing configured OCI runtime kata: no valid executable found for OCI runtime kata: invalid argument

Why would podman be using both the runc and crun runtimes? (and attempting to use the kata runtime as well?)

Here's the full debug output.

podman --log-level=debug run --systemd=always --rm -it
 -p "127.0.0.1:2222:22" solita/ubuntu-systemd-ssh sh -c 'exec /sbin/init --log-level=debug --log-target=console 3>&1'
INFO[0000] podman filtering at log level debug
DEBU[0000] Called run.PersistentPreRunE(podman --log-level=debug run --systemd=always --rm -it -p 127.0.0.1:2222:22 solita/ubuntu-systemd-ssh sh -c exec /sbin/init --log-level=debug --log-target=console 3>&1)
DEBU[0000] Ignoring libpod.conf EventsLogger setting "/home/mark/.config/containers/containers.conf". Use "journald" if you want to change this setting and remove libpod.conf files.
DEBU[0000] Reading configuration file "/usr/share/containers/containers.conf"
DEBU[0000] Merged system config "/usr/share/containers/containers.conf": &{{[] [] container-default [] host enabled [CAP_AUDIT_WRITE CAP_CHOWN CAP_DAC_OVERRIDE CAP_FOWNER CAP_FSETID CAP_KILL CAP_MKNOD CAP_NET_BIND_SERVICE CAP_NET_RAW CAP_SETFCAP CAP_SETGID CAP_SETPCAP CAP_SETUID CAP_SYS_CHROOT] [] []  [] [] [] false [PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] false false false  private k8s-file -1 slirp4netns false 2048 private /usr/share/containers/seccomp.json 65536k private host 65536} {false systemd [PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] [/usr/libexec/podman/conmon /usr/local/libexec/podman/conmon /usr/local/lib/podman/conmon /usr/bin/conmon /usr/sbin/conmon /usr/local/bin/conmon /usr/local/sbin/conmon /run/current-system/sw/bin/conmon] ctrl-p,ctrl-q true /run/user/1000/libpod/tmp/events/events.log file [/usr/share/containers/oci/hooks.d] docker:// /pause k8s.gcr.io/pause:3.2 /usr/libexec/podman/catatonit shm   false 2048 runc map[crun:[/usr/bin/crun /usr/sbin/crun /usr/local/bin/crun /usr/local/sbin/crun /sbin/crun /bin/crun /run/current-system/sw/bin/crun] kata:[/usr/bin/kata-runtime /usr/sbin/kata-runtime /usr/local/bin/kata-runtime /usr/local/sbin/kata-runtime /sbin/kata-runtime /bin/kata-runtime /usr/bin/kata-qemu /usr/bin/kata-fc] runc:[/usr/bin/runc /usr/sbin/runc /usr/local/bin/runc /usr/local/sbin/runc /sbin/runc /bin/runc /usr/lib/cri-o-runc/sbin/runc /run/current-system/sw/bin/runc]] missing false   [] [crun runc] [crun] [kata kata-runtime kata-qemu kata-fc] {false false false false false false} /etc/containers/policy.json false 3 /home/mark/.local/share/containers/storage/libpod 10 /run/user/1000/libpod/tmp /home/mark/.local/share/containers/storage/volumes} {[/usr/libexec/cni /usr/lib/cni /usr/local/lib/cni /opt/cni/bin] podman /etc/cni/net.d/}}
DEBU[0000] Reading configuration file "/etc/containers/containers.conf"
DEBU[0000] Merged system config "/etc/containers/containers.conf": &{{[] [] container-default [] host enabled [CAP_AUDIT_WRITE CAP_CHOWN CAP_DAC_OVERRIDE CAP_FOWNER CAP_FSETID CAP_KILL CAP_MKNOD CAP_NET_BIND_SERVICE CAP_NET_RAW CAP_SETFCAP CAP_SETGID CAP_SETPCAP CAP_SETUID CAP_SYS_CHROOT] [] []  [] [] [] false [PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] false false false  private k8s-file -1 slirp4netns false 2048 private /usr/share/containers/seccomp.json 65536k private host 65536} {false systemd [PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] [/usr/libexec/podman/conmon /usr/local/libexec/podman/conmon /usr/local/lib/podman/conmon /usr/bin/conmon /usr/sbin/conmon /usr/local/bin/conmon /usr/local/sbin/conmon /run/current-system/sw/bin/conmon] ctrl-p,ctrl-q true /run/user/1000/libpod/tmp/events/events.log file [/usr/share/containers/oci/hooks.d] docker:// /pause k8s.gcr.io/pause:3.2 /usr/libexec/podman/catatonit shm   false 2048 runc map[crun:[/usr/bin/crun /usr/sbin/crun /usr/local/bin/crun /usr/local/sbin/crun /sbin/crun /bin/crun /run/current-system/sw/bin/crun] kata:[/usr/bin/kata-runtime /usr/sbin/kata-runtime /usr/local/bin/kata-runtime /usr/local/sbin/kata-runtime /sbin/kata-runtime /bin/kata-runtime /usr/bin/kata-qemu /usr/bin/kata-fc] runc:[/usr/bin/runc /usr/sbin/runc /usr/local/bin/runc /usr/local/sbin/runc /sbin/runc /bin/runc /usr/lib/cri-o-runc/sbin/runc /run/current-system/sw/bin/runc]] missing false   [] [crun runc] [crun] [kata kata-runtime kata-qemu kata-fc] {false false false false false false} /etc/containers/policy.json false 3 /home/mark/.local/share/containers/storage/libpod 10 /run/user/1000/libpod/tmp /home/mark/.local/share/containers/storage/volumes} {[/usr/libexec/cni /usr/lib/cni /usr/local/lib/cni /opt/cni/bin] podman /etc/cni/net.d/}}
DEBU[0000] Using conmon: "/usr/libexec/podman/conmon"
DEBU[0000] Initializing boltdb state at /home/mark/.local/share/containers/storage/libpod/bolt_state.db
DEBU[0000] Using graph driver vfs
DEBU[0000] Using graph root /home/mark/.local/share/containers/storage
DEBU[0000] Using run root /run/user/1000/containers
DEBU[0000] Using static dir /home/mark/.local/share/containers/storage/libpod
DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp
DEBU[0000] Using volume path /home/mark/.local/share/containers/storage/volumes
DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "vfs"
DEBU[0000] Initializing event backend file
DEBU[0000] using runtime "/usr/bin/runc"
DEBU[0000] using runtime "/usr/bin/crun"
WARN[0000] Error initializing configured OCI runtime kata: no valid executable found for OCI runtime kata: invalid argument
INFO[0000] Setting parallel job count to 13
DEBU[0000] Adding port mapping from 2222 to 22 length 1 protocol ""
DEBU[0000] parsed reference into "[vfs@/home/mark/.local/share/containers/storage+/run/user/1000/containers]docker.io/solita/ubuntu-systemd-ssh:latest"
DEBU[0000] parsed reference into "[vfs@/home/mark/.local/share/containers/storage+/run/user/1000/containers]docker.io/solita/ubuntu-systemd-ssh:latest"
DEBU[0000] parsed reference into "[vfs@/home/mark/.local/share/containers/storage+/run/user/1000/containers]@356e2dfcfe16debeee7569ff50a20f396ad367d487b0352f0a9ceca4df67c6e3"
DEBU[0000] exporting opaque data as blob "sha256:356e2dfcfe16debeee7569ff50a20f396ad367d487b0352f0a9ceca4df67c6e3"
DEBU[0000] parsed reference into "[vfs@/home/mark/.local/share/containers/storage+/run/user/1000/containers]docker.io/solita/ubuntu-systemd-ssh:latest"
DEBU[0000] parsed reference into "[vfs@/home/mark/.local/share/containers/storage+/run/user/1000/containers]@356e2dfcfe16debeee7569ff50a20f396ad367d487b0352f0a9ceca4df67c6e3"
DEBU[0000] exporting opaque data as blob "sha256:356e2dfcfe16debeee7569ff50a20f396ad367d487b0352f0a9ceca4df67c6e3"
DEBU[0000] No hostname set; container's hostname will default to runtime default
DEBU[0000] Loading seccomp profile from "/usr/share/containers/seccomp.json"
DEBU[0000] Allocated lock 46 for container 5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36
DEBU[0000] parsed reference into "[vfs@/home/mark/.local/share/containers/storage+/run/user/1000/containers]@356e2dfcfe16debeee7569ff50a20f396ad367d487b0352f0a9ceca4df67c6e3"
DEBU[0000] exporting opaque data as blob "sha256:356e2dfcfe16debeee7569ff50a20f396ad367d487b0352f0a9ceca4df67c6e3"
DEBU[0003] created container "5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36"
DEBU[0003] container "5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36" has work directory "/home/mark/.local/share/containers/storage/vfs-containers/5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36/userdata"
DEBU[0003] container "5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36" has run directory "/run/user/1000/containers/vfs-containers/5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36/userdata"
DEBU[0003] container "5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36" has CgroupParent "/libpod_parent/libpod-5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36"
DEBU[0003] Handling terminal attach
DEBU[0003] Made network namespace at /run/user/1000/netns/cni-e0cf7e15-895f-3736-e006-219152f5189e for container 5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36
DEBU[0003] mounted container "5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36" at "/home/mark/.local/share/containers/storage/vfs/dir/e758af005751d167334f1f9fb5f028369e88d8c363e4eb7f75aeaaf3364127be"
DEBU[0003] slirp4netns command: /usr/bin/slirp4netns --disable-host-loopback --mtu 65520 --enable-sandbox --enable-seccomp -c -e 3 -r 4 --netns-type=path /run/user/1000/netns/cni-e0cf7e15-895f-3736-e006-219152f5189e tap0
DEBU[0003] rootlessport: time="2020-06-25T13:28:12-04:00" level=info msg="starting parent driver"
DEBU[0003] rootlessport: time="2020-06-25T13:28:12-04:00" level=info msg="opaque=map[builtin.readypipepath:/run/user/1000/libpod/tmp/rootlessport909813263/.bp-ready.pipe builtin.socketpath:/run/user/1000/libpod/tmp/rootlessport909813263/.bp.sock]"
                       time="2020-06-25T13:28:12-04:00" level=info msg="starting child driver in child netns (\"/proc/self/exe\" [containers-rootlessport-child])"
DEBU[0003] rootlessport: time="2020-06-25T13:28:12-04:00" level=info msg="waiting for initComplete"
DEBU[0003] rootlessport: time="2020-06-25T13:28:12-04:00" level=info msg="initComplete is closed; parent and child established the communication channel"
DEBU[0003] rootlessport: time="2020-06-25T13:28:12-04:00" level=info msg="exposing ports [{2222 22 tcp 127.0.0.1}]"
DEBU[0003] rootlessport is ready
DEBU[0003] rootlessport: time="2020-06-25T13:28:12-04:00" level=info msg=ready
                      time="2020-06-25T13:28:12-04:00" level=info msg="waiting for exitfd to be closed"
DEBU[0003] Created root filesystem for container 5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36 at /home/mark/.local/share/containers/storage/vfs/dir/e758af005751d167334f1f9fb5f028369e88d8c363e4eb7f75aeaaf3364127be
DEBU[0003] skipping loading default AppArmor profile (rootless mode)
INFO[0003] No non-localhost DNS nameservers are left in resolv.conf. Using default external servers: [nameserver 8.8.8.8 nameserver 8.8.4.4]
INFO[0003] IPv6 enabled; Adding default IPv6 external servers: [nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844]
DEBU[0003] /etc/system-fips does not exist on host, not mounting FIPS mode secret
DEBU[0003] reading hooks from /usr/share/containers/oci/hooks.d
DEBU[0003] Created OCI spec for container 5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36 at /home/mark/.local/share/containers/storage/vfs-containers/5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36/userdata/config.json
DEBU[0003] /usr/libexec/podman/conmon messages will be logged to syslog
DEBU[0003] running conmon: /usr/libexec/podman/conmon    args="[--api-version 1 -c 5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36 -u 5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36 -r /usr/bin/runc -b /home/mark/.local/share/containers/storage/vfs-containers/5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36/userdata -p /run/user/1000/containers/vfs-containers/5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36/userdata/pidfile -n wonderful_cerf --exit-dir /run/user/1000/libpod/tmp/exits --socket-dir-path /run/user/1000/libpod/tmp/socket -l k8s-file:/home/mark/.local/share/containers/storage/vfs-containers/5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36/userdata/ctr.log --log-level debug --syslog -t --conmon-pidfile /run/user/1000/containers/vfs-containers/5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/mark/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /run/user/1000/containers --exit-command-arg --log-level --exit-command-arg debug --exit-command-arg --cgroup-manager --exit-command-arg cgroupfs --exit-command-arg --tmpdir --exit-command-arg /run/user/1000/libpod/tmp --exit-command-arg --runtime --exit-command-arg runc --exit-command-arg --storage-driver --exit-command-arg vfs --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg 5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36]"
WARN[0003] Failed to add conmon to cgroupfs sandbox cgroup: error creating cgroup for pids: mkdir /sys/fs/cgroup/pids/libpod_parent: permission denied
DEBU[0003] Received: 3833883
INFO[0003] Got Conmon PID as 3833872
DEBU[0003] Created container 5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36 in OCI runtime
DEBU[0003] Attaching to container 5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36
DEBU[0003] connecting to socket /run/user/1000/libpod/tmp/socket/5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36/attach
DEBU[0003] Starting container 5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36 with command [sh -c exec /sbin/init --log-level=debug --log-target=console 3>&1]
DEBU[0003] Received a resize event: {Width:56 Height:16}
DEBU[0003] Started container 5b68adcaff700f9cd207e5819343a4d23f39329d47781e9f4147214d4afe0c36
systemd 229 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN)
Detected virtualization container-other.
Detected architecture x86-64.

Welcome to Ubuntu 16.04.5 LTS!

Set hostname to <5b68adcaff70>.
Initializing machine ID from D-Bus machine ID.
Failed to read AF_UNIX datagram queue length, ignoring: No such file or directory
Using cgroup controller name=systemd. File system hierarchy is at /sys/fs/cgroup/systemd/user.slice/user-1000.slice/session-2.scope.
Failed to install release agent, ignoring: Permission denied
Failed to create /user.slice/user-1000.slice/session-2.scope/init.scope control group: Permission denied
Failed to allocate manager object: Permission denied
Closing left-over fd 3
[!!!!!!] Failed to allocate manager object, freezing.
Freezing execution.

markstos on 25 Jun 2020

@markstos It checks for all the runtimes in your libpod.conf.

You can see from the conmon line running conmon: /usr/libexec/podman/conmon .... that it's using runc.

You can force it by using podman --runtime /usr/bin/crun run (other stuff)

goochjj on 25 Jun 2020

Does this run?
podman run --rm -it --systemd=always fedora /sbin/init

goochjj on 25 Jun 2020

No, same kind of failure:

podman run --rm -it --systemd=always fedora /sbin/init

Trying to pull docker.io/library/fedora...
Getting image source signatures
Copying blob 4c69497db035 done
Copying config adfbfa4a11 done
Writing manifest to image destination
Storing signatures
systemd v243.8-1.fc31 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization podman.
Detected architecture x86-64.

Welcome to Fedora 31 (Container Image)!

Set hostname to <f45ad95031fd>.
Initializing machine ID from random generator.
Failed to create /user.slice/user-1000.slice/session-2.scope/init.scope control group: Permission denied
Failed to allocate manager object: Permission denied
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...

I'm sure what else changed besides the podman upgrade, but I was definitely successfully launching and logging into containers with podman on Ubuntu 20.04 around the time of the podman 2 upgrade. I'll review my package update history and see what other packages were upgraded at the the same time.

markstos on 25 Jun 2020

I just asked because the image you were using was based on systemd 229. I'm pretty sure it was failing on my system because I have cgroups v2.

Can you ls -l /sys/fs/cgroup so I can see what cgroups you're running? It's obviously v1 of some sort.

goochjj on 25 Jun 2020

 ls -l /sys/fs/cgroup/
total 0
dr-xr-xr-x 6 root root  0 Jun 20 14:09 blkio/
lrwxrwxrwx 1 root root 11 Jun 20 14:09 cpu -> cpu,cpuacct/
lrwxrwxrwx 1 root root 11 Jun 20 14:09 cpuacct -> cpu,cpuacct/
dr-xr-xr-x 6 root root  0 Jun 20 14:09 cpu,cpuacct/
dr-xr-xr-x 3 root root  0 Jun 20 14:09 cpuset/
dr-xr-xr-x 7 root root  0 Jun 20 14:09 devices/
dr-xr-xr-x 9 root root  0 Jun 20 14:09 freezer/
dr-xr-xr-x 3 root root  0 Jun 20 14:09 hugetlb/
dr-xr-xr-x 6 root root  0 Jun 20 14:09 memory/
lrwxrwxrwx 1 root root 16 Jun 20 14:09 net_cls -> net_cls,net_prio/
dr-xr-xr-x 3 root root  0 Jun 20 14:09 net_cls,net_prio/
lrwxrwxrwx 1 root root 16 Jun 20 14:09 net_prio -> net_cls,net_prio/
dr-xr-xr-x 3 root root  0 Jun 20 14:09 perf_event/
dr-xr-xr-x 6 root root  0 Jun 20 14:09 pids/
dr-xr-xr-x 2 root root  0 Jun 20 14:09 rdma/
dr-xr-xr-x 6 root root  0 Jun 20 14:09 systemd/
dr-xr-xr-x 5 root root  0 Jun 20 14:09 unified/

I thought this indicated cgroupsv2 support:

$ grep cgroup /proc/filesystems
nodev   cgroup
nodev   cgroup2

markstos on 25 Jun 2020

Yep, this works for me on FCOS, with selinux enforcing, after upgrading the systemd.
I.e.

git clone https://github.com/solita/docker-systemd-ssh.git newer-test
cd newer-test
sed -i 's|16.04|18.04|g' Dockerfile 
buildah bud -t systemd-ssh .

[core@MININT-2M30JS4 build]$ podman run --rm -it --systemd=always -p "127.0.0.1:2222:22" localhost/systemd-ssh
systemd 237 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
Detected virtualization container-other.
Detected architecture x86-64.

Welcome to Ubuntu 18.04.1 LTS!

Set hostname to <f68a45a1ea8e>.
Failed to read AF_UNIX datagram queue length, ignoring: No such file or directory
File /lib/systemd/system/systemd-journald.service:36 configures an IP firewall (IPAddressDeny=any), but the local system does not support BPF/cgroup based firewalling.
Proceeding WITHOUT firewalling in effect! (This warning is only shown for the first loaded unit using IP firewalling.)
[  OK  ] Created slice System Slice.
[  OK  ] Listening on /dev/initctl Compatibility Named Pipe.
[  OK  ] Created slice system-getty.slice.
[  OK  ] Reached target Swap.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
[  OK  ] Reached target Local File Systems (Pre).
[  OK  ] Reached target Local File Systems.

(and more)

And I can access it on port 2222

goochjj on 25 Jun 2020

Here's some recent package install history. I opened this issue on the 23rd after the podman upgrade from 1.9.3 to 2.0. You can see some other package history before and after that where I was un-installing and re-installing Docker to try to get some container solution to work.

It's possible that my steps created some kind of unexpected configuration state relateds to containers if failures were not directly caused by the podman 2 upgrade.

Start-Date: 2020-06-22  12:01:59
Commandline: apt-get remove docker docker.io containerd runc
Requested-By: mark (1000)
Install: cri-o-runc:amd64 (1.0.0-3~dev2, automatic)
Upgrade: buildah:amd64 (1.14.9~1, 1.15.0~1)
Remove: runc:amd64 (1.0.0~rc10-0ubuntu1)
End-Date: 2020-06-22  12:02:06

Start-Date: 2020-06-22  12:06:38
Commandline: apt-get install -y -qq --no-install-recommends docker-ce
Requested-By: mark (1000)
Install: containerd.io:amd64 (1.2.13-2, automatic), docker-ce:amd64 (5:19.03.11~3-0~ubuntu-focal), docker-ce-cli:amd64 (5:19.03.11~3-0~ubuntu-focal, automatic)
End-Date: 2020-06-22  12:06:55

Start-Date: 2020-06-22  12:08:40
Commandline: apt remove docker-ce docker-ce-cli
Requested-By: mark (1000)
Remove: docker-ce:amd64 (5:19.03.11~3-0~ubuntu-focal), docker-ce-cli:amd64 (5:19.03.11~3-0~ubuntu-focal)
End-Date: 2020-06-22  12:08:45

Start-Date: 2020-06-23  12:09:41
Commandline: aptdaemon role='role-commit-packages' sender=':1.460'
Upgrade: update-manager-core:amd64 (1:20.04.10, 1:20.04.10.1), libgirepository-1.0-1:amd64 (1.64.0-2, 1.64.1-1~ubuntu20.04.1), update-manager:amd64 (1:20.04.10, 1:20.04.10.1), podman:amd64 (1.9.3~1, 2.0.0~1), conmon:amd64 (2.0.16~2, 2.0.18~1), gir1.2-freedesktop:amd64 (1.64.0-2, 1.64.1-1~ubuntu20.04.1), nautilus:amd64 (1:3.36.2-0ubuntu1, 1:3.36.3-0ubuntu1), libnautilus-extension1a:amd64 (1:3.36.2-0ubuntu1, 1:3.36.3-0ubuntu1), gir1.2-glib-2.0:amd64 (1.64.0-2, 1.64.1-1~ubuntu20.04.1), python3-update-manager:amd64 (1:20.04.10, 1:20.04.10.1), grub-efi-amd64-signed:amd64 (1.142+2.04-1ubuntu26, 1.142.1+2.04-1ubuntu26), nautilus-data:amd64 (1:3.36.2-0ubuntu1, 1:3.36.3-0ubuntu1)
End-Date: 2020-06-23  12:10:36

Start-Date: 2020-06-23  16:42:27
Commandline: apt-get install docker-ce docker-ce-cli containerd.io
Requested-By: mark (1000)
Install: aufs-tools:amd64 (1:4.14+20190211-1ubuntu1, automatic), cgroupfs-mount:amd64 (1.4, automatic), pigz:amd64 (2.4-1, automatic), docker-ce:amd64 (5:19.03.12~3-0~ubuntu-focal), docker-ce-cli:amd64 (5:19.03.12~3-0~ubuntu-focal)
End-Date: 2020-06-23  16:42:42

markstos on 25 Jun 2020

I'll try to reproduce the result above.

markstos on 25 Jun 2020

 ls -l /sys/fs/cgroup/
total 0
dr-xr-xr-x 6 root root  0 Jun 20 14:09 blkio/
lrwxrwxrwx 1 root root 11 Jun 20 14:09 cpu -> cpu,cpuacct/
lrwxrwxrwx 1 root root 11 Jun 20 14:09 cpuacct -> cpu,cpuacct/
dr-xr-xr-x 6 root root  0 Jun 20 14:09 cpu,cpuacct/
dr-xr-xr-x 3 root root  0 Jun 20 14:09 cpuset/
dr-xr-xr-x 7 root root  0 Jun 20 14:09 devices/
dr-xr-xr-x 9 root root  0 Jun 20 14:09 freezer/
dr-xr-xr-x 3 root root  0 Jun 20 14:09 hugetlb/
dr-xr-xr-x 6 root root  0 Jun 20 14:09 memory/
lrwxrwxrwx 1 root root 16 Jun 20 14:09 net_cls -> net_cls,net_prio/
dr-xr-xr-x 3 root root  0 Jun 20 14:09 net_cls,net_prio/
lrwxrwxrwx 1 root root 16 Jun 20 14:09 net_prio -> net_cls,net_prio/
dr-xr-xr-x 3 root root  0 Jun 20 14:09 perf_event/
dr-xr-xr-x 6 root root  0 Jun 20 14:09 pids/
dr-xr-xr-x 2 root root  0 Jun 20 14:09 rdma/
dr-xr-xr-x 6 root root  0 Jun 20 14:09 systemd/
dr-xr-xr-x 5 root root  0 Jun 20 14:09 unified/

I thought this indicated cgroupsv2 support:

$ grep cgroup /proc/filesystems
nodev   cgroup
nodev   cgroup2

You're running hybrid.I n your case, unified is the only part that's cgroups v2.

This is mine - note I don't have cpu/pids/etc... that's all cgroup v1.

$ ls -l /sys/fs/cgroup
total 0
-r--r--r--.  1 root root 0 Jun 18 19:13 cgroup.controllers
-rw-r--r--.  1 root root 0 Jun 18 19:16 cgroup.max.depth
-rw-r--r--.  1 root root 0 Jun 18 19:16 cgroup.max.descendants
-rw-r--r--.  1 root root 0 Jun 18 19:13 cgroup.procs
-r--r--r--.  1 root root 0 Jun 18 19:16 cgroup.stat
-rw-r--r--.  1 root root 0 Jun 22 14:16 cgroup.subtree_control
-rw-r--r--.  1 root root 0 Jun 18 19:16 cgroup.threads
-rw-r--r--.  1 root root 0 Jun 18 19:16 cpu.pressure
-r--r--r--.  1 root root 0 Jun 18 19:16 cpuset.cpus.effective
-r--r--r--.  1 root root 0 Jun 18 19:16 cpuset.mems.effective
drwxr-xr-x.  2 root root 0 Jun 18 20:07 init.scope
-rw-r--r--.  1 root root 0 Jun 18 19:16 io.cost.model
-rw-r--r--.  1 root root 0 Jun 18 19:16 io.cost.qos
-rw-r--r--.  1 root root 0 Jun 18 19:16 io.pressure
drwxr-xr-x.  5 root root 0 Jun 25 19:58 machine.slice
-rw-r--r--.  1 root root 0 Jun 18 19:16 memory.pressure
drwxr-xr-x. 76 root root 0 Jun 25 20:49 system.slice
drwxr-xr-x.  4 root root 0 Jun 25 20:26 user.slice

goochjj on 25 Jun 2020

The podman packages came from where - kubic?

goochjj on 25 Jun 2020

Yes, the packages are from kubic.

The 18.04 version of the container fails for me-- which is not surprising because my original container that exhibited this behavior was Ubuntu 18.04:

podman run --rm -it --systemd=always -p "127.0.0.1:2222:22" localhost/systemd-ssh
systemd 237 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
Detected virtualization container-other.
Detected architecture x86-64.

Welcome to Ubuntu 18.04.1 LTS!

Set hostname to <4f3806c08277>.
Failed to read AF_UNIX datagram queue length, ignoring: No such file or directory
Failed to install release agent, ignoring: Permission denied
Failed to create /user.slice/user-1000.slice/session-2.scope/init.scope control group: Permission denied
Failed to allocate manager object: Permission denied
[!!!!!!] Failed to allocate manager object, freezing.
Freezing execution.

markstos on 25 Jun 2020

Yeah ok. I will stand up a 20.04 VM for testing sometime soon.

goochjj on 25 Jun 2020

(Since kubic's 18.04 packages are still at 1.9.3)

goochjj on 25 Jun 2020

Thanks, @goochjj.

markstos on 25 Jun 2020

When podman starts, does it do a validation step to confirm that all the OS and kernel features are present before proceeding? That could help pre-empt hard to diagnose failures later due to unexpected or unsupported environment configurations.

markstos on 26 Jun 2020

My Ubuntu 20.04 LTS installation doesn't suffer from same problem. Using podman 2.0 with fuse-overlayfs.

Command: podman run --rm -it --systemd=always fedora /sbin/init

Produces:

systemd v243.8-1.fc31 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization podman.
Detected architecture x86-64.

Welcome to Fedora 31 (Container Image)!

Set hostname to <d737fda4545b>.
Initializing machine ID from random generator.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
[  OK  ] Reached target Local File Systems.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Reached target Slices.
[  OK  ] Reached target Swap.
[  OK  ] Listening on Process Core Dump Socket.
[  OK  ] Listening on initctl Compatibility Named Pipe.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
ldconfig.service: unit configures an IP firewall, but the local system does not support BPF/cgroup firewalling.
(This warning is only shown for the first unit using IP firewalling.)
         Starting Rebuild Dynamic Linker Cache...
         Starting Journal Service...
         Starting Create System Users...
[  OK  ] Started Create System Users.
[  OK  ] Started Rebuild Dynamic Linker Cache.
[  OK  ] Started Journal Service.
         Starting Flush Journal to Persistent Storage...
[  OK  ] Started Flush Journal to Persistent Storage.
         Starting Create Volatile Files and Directories...
[  OK  ] Started Create Volatile Files and Directories.
         Starting Rebuild Journal Catalog...
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Started Update UTMP about System Boot/Shutdown.
[  OK  ] Started Rebuild Journal Catalog.
         Starting Update is Completed...
[  OK  ] Started Update is Completed.
[  OK  ] Reached target System Initialization.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target Timers.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
         Starting Permit User Sessions...
[  OK  ] Started Permit User Sessions.
[  OK  ] Reached target Multi-User System.
[  OK  ] Reached target Graphical Interface.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Started Update UTMP about System Runlevel Changes.

Podman info for reference:

host:
  arch: amd64
  buildahVersion: 1.15.0
  cgroupVersion: v1
  conmon:
    package: 'conmon: /usr/libexec/podman/conmon'
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.18, commit: '
  cpus: 4
  distribution:
    distribution: ubuntu
    version: "20.04"
  eventLogger: file
  hostname: ubuntu
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  kernel: 5.4.0-37-generic
  linkmode: dynamic
  memFree: 289198080
  memTotal: 8348520448
  ociRuntime:
    name: runc
    package: 'containerd.io: /usr/bin/runc'
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc10
      commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
      spec: 1.0.1-dev
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: 'slirp4netns: /usr/bin/slirp4netns'
    version: |-
      slirp4netns version 1.0.0
      commit: unknown
      libslirp: 4.2.0
  swapFree: 1945247744
  swapTotal: 1964396544
  uptime: 79h 3m 20.92s (Approximately 3.29 days)
registries:
  search:
  - registry.access.redhat.com
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.centos.org
store:
  configFile: /home/someuser/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: 'fuse-overlayfs: /usr/bin/fuse-overlayfs'
      Version: |-
        fusermount3 version: 3.9.0
        fuse-overlayfs: version 0.7.6
        FUSE library version 3.9.0
        using FUSE kernel interface version 7.31
  graphRoot: /home/someuser/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 11
  runRoot: /run/user/1000/containers
  volumePath: /home/someuser/.local/share/containers/storage/volumes
version:
  APIVersion: 1
  Built: 0
  BuiltTime: Thu Jan  1 02:00:00 1970
  GitCommit: ""
  GoVersion: go1.13.8
  OsArch: linux/amd64
  Version: 2.0.0

I can ssh to port 2222 on localhost, after executing command:

podman run --systemd=always -it -p "127.0.0.1:2222:22" solita/ubuntu-systemd-ssh

@markstos I did complete cleanup on podman storage folders after upgrade and ran podman system migrate (just in case). The only differences that I can see between our configurations (unless you've hardened your ubuntu) is that I'm using fuse-overlayfs (which seems to work now well enough) and that you have a lot more images in your image store (which might indicate, that you didn't clean up your storage directories)

skorhone on 27 Jun 2020

I'm linking #6724 which is a permissions issue that appeared at the same time for podman on Ubuntu, but broken the "root" case instead of the "rootless" case.

markstos on 27 Jun 2020

OK, I'm not crazy.

Although Kubic doesn't offer downgrade packages, I found the 1.9.3 deb in /var/cache/archives. Changing nothing else, I un-installed podman 2.0 and re-installed podman 1.9.3 and re-ran the test case. It worked. There is definitely some kind of regression here. Considering that #6724 a parallel regression for the "root" case on Ubuntu during the 2.0 upgrade, presume thie rootless case is related. I'm not still not clear how my Ubuntu 20.04 install is different from the "clean" Ubuntu 20.04 VM that was tested above.

It seems like a really valuable feature for podman to have a way to check that all the permissions and OS features it needs are enabled.

I guess my personal problem is solved now-- I'll just make sure to pin my podman version to 1.9.3 until this gets sorted out!

With podman 1.9.3:

sudo podman run  --systemd=always  -it -p "127.0.0.1:2222:22" solita/ubuntu-systemd-ssh
systemd 229 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN)
Detected virtualization docker.
Detected architecture x86-64.

Welcome to Ubuntu 16.04.5 LTS!

Set hostname to <215f38849211>.
Initializing machine ID from D-Bus machine ID.
[  OK  ] Listening on Journal Socket.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Reached target Local File Systems.
[  OK  ] Reached target Swap.
[  OK  ] Created slice System Slice.
[  OK  ] Reached target Slices.
         Starting Journal Service...
         Starting Create Volatile Files and Directories...
[  OK  ] Reached target Paths.
[  OK  ] Started Journal Service.
[  OK  ] Started Create Volatile Files and Directories.
[  OK  ] Reached target System Initialization.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target Timers.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
         Starting Permit User Sessions...
         Starting LSB: Set the CPU Frequency Scaling governor to "ondemand"...
         Starting /etc/rc.local Compatibility...
         Starting Generate SSH host keys...
[  OK  ] Started D-Bus System Message Bus.
[  OK  ] Started Permit User Sessions.
[  OK  ] Started /etc/rc.local Compatibility.
[  OK  ] Started LSB: Set the CPU Frequency Scaling governor to "ondemand".
[  OK  ] Started Generate SSH host keys.
         Starting OpenBSD Secure Shell server...
[  OK  ] Started OpenBSD Secure Shell server.
[  OK  ] Reached target Multi-User System.

markstos on 27 Jun 2020

@skorhone Notice some differences in our different cases:

_podman 1.9.3 working:_ Detected virtualization _docker_.
_podman 2.0.0 not working:_ Detected virtualization _container-other_.
_podman 2.0.0 working:_ Detected virtualization _podman_.

Could that be related?

markstos on 27 Jun 2020

👍1

What causes podman to detect virtualization as docker vs container-other vs podman? I searched the whole containers Github organization for container-other and couldn't find any mentions of it in code.

What would cause some Ubuntu 20.04 systems to detect the same image as podman vs container-other when both are running podman 2.0.0?

markstos on 27 Jun 2020

@markstos You are getting packages from stable kubic repository?

skorhone on 27 Jun 2020

I'll double-check, but I'm fairly certain the tell-tales we leave in the container (most notably, the CONTAINER=podman environment variable) are identical between 1.9.x and 2.0.x. I'm fairly certain that's how systemd detects virtualization.

mheon on 27 Jun 2020

Hmmmm. They're not.

1.9.3, specifically when used with a Fedora-based image, has CONTAINER=oci (it looks like Fedora images deliberately define that, and it overrides our default on 1.9.3, but not 2.0).

You could try adding --env container=oci to your 2.0 containers and see if that changes what systemd detected

mheon on 27 Jun 2020

If I dopodman run --rm -it --systemd=always --env container=oci fedora /sbin/init to fedora, I get:

systemd v243.8-1.fc31 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization container-other.
Detected architecture x86-64.
...

Command podman run --rm -it --systemd=always fedora env:

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
TERM=xterm
container=podman
DISTTAG=f31container
FGC=f31
FBR=f31
HOSTNAME=cd4d6e620aef
HOME=/root

If this variable was overridden at system level, where would it be set?

Also, might be worth mentioning, that I have (or had) docker-ce installed on my ubuntu system. Could it have effect on different kind of results?

Edit: If I'm not mistaken, apparmor isn't active in rootless (podman complains with an error that describes it's not allowed in rootless, if you try to enable it). So it can't be apparmor.

skorhone on 28 Jun 2020

@markstos You are getting packages from stable kubic repository?

They come from here, which confusingly includes both "devel" and "stable" in the path:

deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_20.04/ /

markstos on 28 Jun 2020

Also, might be worth mentioning, that I have (or had) docker-ce installed on my ubuntu system. Could it have effect on different kind of results?

Could both be referencing settings in /etc/containers or does only Podman use those?

markstos on 28 Jun 2020

I investigated my package versions and did some strange things. First, I have a package called containers-common installed which is at version 1.0.0, but the "containers/common" project on Github hasn't reached 1.0.0 yet.

It looks like 2.0.1 has been released now on Kubic. I'll try upgrading to that and see if that helps.

markstos on 28 Jun 2020

We don't share config files we Docker, so I doubt that's it.

I do wonder if we're looking at different systemd versions here with different support for containers.

mheon on 28 Jun 2020

For reference my systemd version using systemctl --version:

systemd 245 (245.4-4ubuntu3.1)
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid

skorhone on 28 Jun 2020

@markstos I have containers-common at version 1.0.0 as well:

containers-common/unknown,now 1.0.0~2 all [installed,automatic]
  Configuration files for working with image signatures.

skorhone on 28 Jun 2020

@skorhone I believe I have the same:

systemd 245 (245.4-4ubuntu3.1)
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL > +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid

markstos on 28 Jun 2020

Actually, would this make any difference.

docker-ce package dependencies list containerd.io

docker-ce
  Depends: containerd.io

podman has runc, which is marked as conflicting package:

podman
 |Depends: runc
    containerd.io

Now executing sudo apt install runc would do following:

The following packages will be REMOVED:
  containerd.io
The following NEW packages will be installed:
  runc

My install order for virtualbox was docker-ce first followed by a podman installation

skorhone on 28 Jun 2020

If podman and docker-ce have dependencies to conflicting packages (which they shouldn't since many will run them on same machine), odd things will happen

skorhone on 28 Jun 2020

Here's my results from further testing today:

Upgrading to 2.0.1 did not resolve the issue for me.
Adding --env container=oci or --env CONTAINER=oci did not result change that container-other virtualization was detected.
I've lost the ability to do rootless builds. This was not due to the 2.0.1 upgrade. I'm no sure what caused it. Building as root still works. Here's the build failure:

error committing container for step {Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] Command:run Args:[apt update && apt install -y wget xz-utils git python-minimal make g++ libfontconfig curl libfreetype6 libfontconfig1 gpg rsync] Flags:[] Attrs:map[] Message:RUN apt update && apt install -y wget xz-utils git python-minimal make g++ libfontconfig curl libfreetype6 libfontconfig1 gpg rsync Original:RUN apt update && apt install -y wget xz-utils git python-minimal make g++ libfontconfig curl libfreetype6 libfontconfig1 gpg rsync}: error copying layers and metadata for container "a1c594215ea907acbe297bf74afc5f985a1bd08c97667fb37b3a4da53b3b02b9": Error committing the finished image: error adding layer with blob "sha256:fde75a5d68c37b09370ec8ac7176b0fa00364d51b849c1a3c6d90584c581fab4": Error processing tar file(exit status 1): operation not permitted

If I knew which files were involved, I could check the permissions, but maybe "operations not permitted" refers to Control Group or AppArmo violation? I'm not sure.

markstos on 28 Jun 2020

The containerd.io version I have installed is 1.2.13-2.

markstos on 28 Jun 2020

To those it may concern... I just built two identical Ubuntu 20.04 machines. Installed docker-ce and kubic packages. (Just for completeness)

One of them I set systemd.unified_cgroup_hierarchy=1 in the grub settings so I would get CgroupsV2, instead of the default which is hybrid. This one works great!

podman run --rm -it fedora cat /proc/self/cgroup
0::/

And podman run --rm -it fedora /sbin/init boots and systemd has absolutely 0 problems.

On the Cgroup1 machine however...

$ podman run --rm -it fedora /sbin/init
systemd v243.8-1.fc31 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization podman.
Detected architecture x86-64.

Welcome to Fedora 31 (Container Image)!

Set hostname to <be5d2bdcd154>.
Initializing machine ID from random generator.
Failed to create /user.slice/user-1000.slice/session-8.scope/init.scope control group: Permission denied
Failed to allocate manager object: Permission denied
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...

Cgroups:

$ podman run --rm -it fedora cat /proc/self/cgroup
12:perf_event:/
11:devices:/user.slice
10:net_cls,net_prio:/
9:blkio:/user.slice
8:memory:/user.slice/user-1000.slice/session-8.scope
7:rdma:/
6:hugetlb:/
5:cpuset:/
4:freezer:/
3:pids:/user.slice/user-1000.slice/session-8.scope
2:cpu,cpuacct:/user.slice
1:name=systemd:/user.slice/user-1000.slice/session-8.scope
0::/user.slice/user-1000.slice/session-8.scope

Doesn't jive with the Cgroups 2 install.
Crun doesn't want to run it at all:

$ podman --runtime /usr/bin/crun run --rm -it fedora cat /proc/self/cgroup
Error: cannot set limits without cgroups: OCI runtime error

WARN[0000] Failed to add conmon to cgroupfs sandbox cgroup: error creating cgroup for cpu: mkdir /sys/fs/cgroup/cpu/libpod_parent: permission denied
DEBU[0000] Received: -1
DEBU[0000] Cleaning up container c0aa72000d221c5cc8c66a130a37e6cdec33318feef748a52789b0a2cd409713
DEBU[0000] Tearing down network namespace at /run/user/1000/netns/cni-372d5028-1bb7-ee8b-6b5c-9d1cc64e4714 for container c0aa72000d221c5cc8c66a130a37e6cdec33318feef748a52789b0a2cd409713
DEBU[0000] unmounted container "c0aa72000d221c5cc8c66a130a37e6cdec33318feef748a52789b0a2cd409713"
DEBU[0000] Removing container c0aa72000d221c5cc8c66a130a37e6cdec33318feef748a52789b0a2cd409713
DEBU[0000] Removing all exec sessions for container c0aa72000d221c5cc8c66a130a37e6cdec33318feef748a52789b0a2cd409713
DEBU[0000] Cleaning up container c0aa72000d221c5cc8c66a130a37e6cdec33318feef748a52789b0a2cd409713
DEBU[0000] Network is already cleaned up, skipping...
DEBU[0000] Container c0aa72000d221c5cc8c66a130a37e6cdec33318feef748a52789b0a2cd409713 storage is already unmounted, skipping...
DEBU[0000] Container c0aa72000d221c5cc8c66a130a37e6cdec33318feef748a52789b0a2cd409713 storage is already unmounted, skipping...
DEBU[0000] ExitCode msg: "cannot set limits without cgroups: oci runtime error"
Error: cannot set limits without cgroups: OCI runtime error



md5-46082bb2d01530cda7b3521a76d7d869



$ ls -l /sys/fs/cgroup/
total 0
dr-xr-xr-x 6 root root  0 Jun 28 11:43 blkio/
lrwxrwxrwx 1 root root 11 Jun 28 11:43 cpu -> cpu,cpuacct/
dr-xr-xr-x 6 root root  0 Jun 28 11:43 cpu,cpuacct/
lrwxrwxrwx 1 root root 11 Jun 28 11:43 cpuacct -> cpu,cpuacct/
dr-xr-xr-x 3 root root  0 Jun 28 11:43 cpuset/
dr-xr-xr-x 6 root root  0 Jun 28 11:43 devices/
dr-xr-xr-x 4 root root  0 Jun 28 11:43 freezer/
dr-xr-xr-x 3 root root  0 Jun 28 11:43 hugetlb/
dr-xr-xr-x 6 root root  0 Jun 28 11:43 memory/
lrwxrwxrwx 1 root root 16 Jun 28 11:43 net_cls -> net_cls,net_prio/
dr-xr-xr-x 3 root root  0 Jun 28 11:43 net_cls,net_prio/
lrwxrwxrwx 1 root root 16 Jun 28 11:43 net_prio -> net_cls,net_prio/
dr-xr-xr-x 3 root root  0 Jun 28 11:43 perf_event/
dr-xr-xr-x 6 root root  0 Jun 28 11:43 pids/
dr-xr-xr-x 2 root root  0 Jun 28 11:43 rdma/
dr-xr-xr-x 6 root root  0 Jun 28 11:43 systemd/
dr-xr-xr-x 5 root root  0 Jun 28 11:43 unified/



md5-7cc257069e550856e6fd0c57b9bb1e9a



$ podman info --debug
host:
  arch: amd64
  buildahVersion: 1.15.0
  cgroupVersion: v1
  conmon:
    package: 'conmon: /usr/libexec/podman/conmon'
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.0.18, commit: '
  cpus: 2
  distribution:
    distribution: ubuntu
    version: "20.04"
  eventLogger: file
  hostname: FocalCG1Dev
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.4.0-39-generic
  linkmode: dynamic
  memFree: 145498112
  memTotal: 1028673536
  ociRuntime:
    name: runc
    package: 'containerd.io: /usr/bin/runc'
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc10
      commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
      spec: 1.0.1-dev
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: 'slirp4netns: /usr/bin/slirp4netns'
    version: |-
      slirp4netns version 1.0.0
      commit: unknown
      libslirp: 4.2.0
  swapFree: 8583102464
  swapTotal: 8589930496
  uptime: 48m 36.55s
registries:
  search:
  - docker.io
  - quay.io
store:
  configFile: /home/mrwizard/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: 'fuse-overlayfs: /usr/bin/fuse-overlayfs'
      Version: |-
        fusermount3 version: 3.9.0
        fuse-overlayfs: version 0.7.6
        FUSE library version 3.9.0
        using FUSE kernel interface version 7.31
  graphRoot: /home/mrwizard/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 1
  runRoot: /run/user/1000/containers
  volumePath: /home/mrwizard/.local/share/containers/storage/volumes
version:
  APIVersion: 1
  Built: 0
  BuiltTime: Wed Dec 31 19:00:00 1969
  GitCommit: ""
  GoVersion: go1.13.8
  OsArch: linux/amd64
  Version: 2.0.1

goochjj on 28 Jun 2020

I think I figured out why we are seeing different behavior. My cgroups looks following:

12:hugetlb:/
11:freezer:/
10:blkio:/user.slice
9:cpu,cpuacct:/user.slice
8:perf_event:/
7:memory:/user.slice/user-1000.slice/[email protected]
6:cpuset:/
5:pids:/user.slice/user-1000.slice/[email protected]
4:devices:/user.slice
3:rdma:/
2:net_cls,net_prio:/
1:name=systemd:/user.slice/user-1000.slice/[email protected]/apps.slice/apps-org.gnome.Terminal.slice/vte-spawn-6773d329-9ee1-450e-ae44-d5e4810e64a2.scope/ef312c6459eb19dc7f99f918f0eb90c7a231c59f50e40e9141018baa13c48b35
0::/user.slice/user-1000.slice/[email protected]/apps.slice/apps-org.gnome.Terminal.slice/vte-spawn-6773d329-9ee1-450e-ae44-d5e4810e64a2.scope

As you can see, I'm running my commands using gnome / xterm. Now, if I drop to console mode, I see exactly same behavior:

systemd v243.8-1.fc31 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization podman.
Detected architecture x86-64.

Welcome to Fedora 31 (Container Image)!

Set hostname to <9d16995357a6>.
Initializing machine ID from random generator.
Failed to create /user.slice/user-1000.slice/session-232.scope/init.scope control group: Permission denied
Failed to allocate manager object: Permission denied
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...

skorhone on 28 Jun 2020

👍1

Cgroups output when running podman from console:

12:hugetlb:/
11:freezer:/
10:blkio:/user.slice
9:cpu,cpuacct:/user.slice
8:perf_event:/
7:memory:/user.slice/user-1000.slice/session-232.scope
6:cpuset:/
5:pids:/user.slice/user-1000.slice/session-232.scope
4:devices:/user.slice
3:rdma:/
2:net_cls,net_prio:/
1:name=systemd:/user.slice/user-1000.slice/session-232.scope
0::/user.slice/user-1000.slice/session-232.scope

File system permissions:

total 0
drwxr-xr-x  6 root    root    0 kesä   28 16:27 .
drwxr-xr-x  3 root    root    0 kesä   28 16:27 ..
-rw-r--r--  1 root    root    0 kesä   28 20:18 cgroup.clone_children
-rw-r--r--  1 root    root    0 kesä   28 20:18 cgroup.procs
-rw-r--r--  1 root    root    0 kesä   28 20:18 notify_on_release
drwxr-xr-x  2 root    root    0 kesä   28 20:10 session-232.scope
drwxr-xr-x  2 root    root    0 kesä   28 20:18 session-2.scope
-rw-r--r--  1 root    root    0 kesä   28 20:18 tasks
drwxr-xr-x 44 someuser someuser 0 kesä   28 20:04 [email protected]
drwxr-xr-x  2 root    root    0 kesä   28 20:18 [email protected]

skorhone on 28 Jun 2020

👍1

I was able to execute podman succesfully using: systemd-run --user -P podman run --rm -it fedora /sbin/init

Clearly something has changed in podman, if you see difference in behavior.

skorhone on 28 Jun 2020

This works.
podman --runtime /usr/bin/crun run --rm -it --pids-limit 0 --cgroup-manager systemd fedora /sbin/init

crun doesn't run, because the spec contains:

  "linux": {
    "resources": {
      "pids": {
        "limit": 2048
      }
    },

And crun knows it can't implement that with cgroupsv1 (it seems)

To get crun to run it I need to specify both
--pids-limit 0 to prevent the resources from being set, and
--cgroup-manager systemd to get it to use systemd to do the cgroups manipulation.

runc (compiled from master) says this:
Error: cgroup v2 not enabled on this host, can't use systemd (rootless) as cgroups manager: OCI runtime error

So it must be using cgroupfs and not doing the right thing..???

systemd-run --user -P podman run --rm -it fedora /sbin/init

Also works for me. It seems this takes it out of my SSH session's scope:

$ systemd-run --user -P podman run --rm -it fedora cat /proc/self/cgroup
Running as unit: run-u98.service
12:perf_event:/
11:devices:/user.slice
10:net_cls,net_prio:/
9:blkio:/user.slice
8:memory:/user.slice/user-1000.slice/[email protected]
7:rdma:/
6:hugetlb:/
5:cpuset:/
4:freezer:/
3:pids:/user.slice/user-1000.slice/[email protected]
2:cpu,cpuacct:/user.slice
1:name=systemd:/user.slice/user-1000.slice/[email protected]/run-u98.service/ab4328bc6f7d6ad7f28e6c4f7b55364386627b5fa773f1f45ed2acd3b48cde21
0::/user.slice/user-1000.slice/[email protected]/run-u98.service

goochjj on 28 Jun 2020

Yeah

$ ls -l /sys/fs/cgroup/*/user.slice/user-1000.slice/[email protected]/cgroup.procs
-rw-r--r-- 1 root     root     0 Jun 28 14:35 '/sys/fs/cgroup/memory/user.slice/user-1000.slice/[email protected]/cgroup.procs'
-rw-r--r-- 1 root     root     0 Jun 28 14:35 '/sys/fs/cgroup/pids/user.slice/user-1000.slice/[email protected]/cgroup.procs'
-rw-r--r-- 1 mrwizard mrwizard 0 Jun 28 14:35 '/sys/fs/cgroup/systemd/user.slice/user-1000.slice/[email protected]/cgroup.procs'
-rw-r--r-- 1 mrwizard mrwizard 0 Jun 28 14:35 '/sys/fs/cgroup/unified/user.slice/user-1000.slice/[email protected]/cgroup.procs'

ls -l /sys/fs/cgroup/*/user.slice/user-1000.slice/cgroup.procs
-rw-r--r-- 1 root root 0 Jun 28 14:35 /sys/fs/cgroup/memory/user.slice/user-1000.slice/cgroup.procs
-rw-r--r-- 1 root root 0 Jun 28 14:35 /sys/fs/cgroup/pids/user.slice/user-1000.slice/cgroup.procs
-rw-r--r-- 1 root root 0 Jun 28 14:36 /sys/fs/cgroup/systemd/user.slice/user-1000.slice/cgroup.procs
-rw-r--r-- 1 root root 0 Jun 28 14:36 /sys/fs/cgroup/unified/user.slice/user-1000.slice/cgroup.procs

I set [email protected] Delegate=yes to see if it would help, but it didn't.

So since my ssh session is under my slice, but not under the systemd service, I don't have access to create a subgroup under systemd/unified.

Podman 1.9.3 seems to work around this beautifully.

$ bin/podman run --rm -it alpine cat /proc/self/cgroup
12:freezer:/
11:rdma:/
10:cpuset:/
9:perf_event:/
8:memory:/user.slice/user-1000.slice/[email protected]
7:blkio:/user.slice
6:cpu,cpuacct:/user.slice
5:pids:/user.slice/user-1000.slice/[email protected]
4:devices:/user.slice
3:hugetlb:/
2:net_cls,net_prio:/
1:name=systemd:/user.slice/user-1000.slice/[email protected]/user.slice/podman-6443.scope/98b82d3716af4136be4596617a3dc99bc20fea81af8c203a963d15697eb2a3d7
0::/user.slice/user-1000.slice/[email protected]/user.slice/podman-6443.scope

vs 2.0.1

$ podman run --rm -it alpine cat /proc/self/cgroup
12:freezer:/
11:rdma:/
10:cpuset:/
9:perf_event:/
8:memory:/user.slice/user-1000.slice/session-1.scope
7:blkio:/user.slice
6:cpu,cpuacct:/user.slice
5:pids:/user.slice/user-1000.slice/session-1.scope
4:devices:/user.slice
3:hugetlb:/
2:net_cls,net_prio:/
1:name=systemd:/user.slice/user-1000.slice/session-1.scope
0::/user.slice/user-1000.slice/session-1.scope

Notable differences:
1) systemd+unified cgroups are descended from [email protected], not my session scope.
2) memory and pids are descended from [email protected], not my session scope

goochjj on 28 Jun 2020

Since I did those last tests with the exact same OCI runtime it has to be in libpod somewhere.

goochjj on 28 Jun 2020

I think that the change that caused this issue is this one: #6569

skorhone on 28 Jun 2020

Hmm. @giuseppe Thoughts, given you're the one that wrote it?

mheon on 28 Jun 2020

I reviewed #6569. It was was fixing #4483 -- The change was to fix one or two dbus processes leaking at most. Considering the "fix" for that is causing show stopping container crashes for for a common case, one option is to revert the patch and get a 2.0.2 release out quickly, while continuing to look for a solution for solves the dbus process leak without introducing a crashing regression.

markstos on 29 Jun 2020

For people just finding the ticket, I updated the description with the simplified test case that was eventually discovered.

Would it be helpful if I tried to contribute a system test for this case? I have some experience with bash-based test suites.

markstos on 29 Jun 2020

Tests are always welcome

rhatdan on 30 Jun 2020

The change was to fix one or two dbus processes leaking at most. Considering the "fix" for that is causing show stopping container crashes for for a common case, one option is to revert the patch and get a 2.0.2 release out quickly, while continuing to look for a solution for solves the dbus process leak without introducing a crashing regression.

I think the change is correct as we should not try to use systemd when the cgroup manager is set to cgroupfs.

systemd only needs the name=systemd hierarchy to be usable. Can you show me the output for cat /proc/self/cgroup from the environment where the issue is happening?

Does it work if your wrap the podman command with systemd-run --scope --user podman ...?

giuseppe on 30 Jun 2020

@giuseppe Yes, wrapping podman command with systemd-run works. I actually mentioned it earlier - but it's a long chain

Edit: I missed that you wanted that we try --scope. systemd-run --scope --user podman works as well.

Results of cat /proc/self/cgroup:

12:hugetlb:/
11:freezer:/
10:blkio:/user.slice
9:cpu,cpuacct:/user.slice
8:perf_event:/
7:memory:/user.slice/user-1000.slice/session-616.scope
6:cpuset:/
5:pids:/user.slice/user-1000.slice/session-616.scope
4:devices:/user.slice
3:rdma:/
2:net_cls,net_prio:/
1:name=systemd:/user.slice/user-1000.slice/session-616.scope
0::/user.slice/user-1000.slice/session-616.scope

At filesystem level notable difference between session scope and user service is that latter is owned by user.

skorhone on 30 Jun 2020

So is a solution for podman to re-exec itself as systemd-run --scope --user podman in this case?

Considering this is a show-stopping crashing regression, is there is a goal for when the fix will be released?

markstos on 1 Jul 2020

What is your cgroups manager?

goochjj on 1 Jul 2020

Did you try adding --cgroup-manager systemd?

goochjj on 1 Jul 2020

What is your cgroups manager?

@goochjj I did some reserach but can't figure out how to determine what my managing my cgroups.

Did you try adding --cgroup-manager systemd?

That fails, but with a different error:

podman run --cgroup-manager systemd --systemd=always -it -p "127.0.0.1:2222:22" solita
/ubuntu-systemd-ssh
Error: systemd cgroup flag passed, but systemd support for managing cgroups is not    available: OCI runtime error

markstos on 1 Jul 2020

Interesting, after adding cgroup-manager systemd it popped right up for me.

What's your cat /proc/self/cgroup?

goochjj on 1 Jul 2020

On a Ubuntu 20.04 host:

cat /proc/self/cgroup
12:cpu,cpuacct:/
11:pids:/user.slice/user-1000.slice/session-2.scope
10:freezer:/
9:devices:/user.slice
8:memory:/user.slice/user-1000.slice/session-2.scope
7:hugetlb:/
6:blkio:/user.slice
5:perf_event:/
4:net_cls,net_prio:/
3:cpuset:/
2:rdma:/
1:name=systemd:/user.slice/user-1000.slice/session-2.scope
0::/user.slice/user-1000.slice/session-2.scope

markstos on 1 Jul 2020

$ cat /proc/self/cgroup
12:pids:/user.slice/user-1000.slice/session-39.scope
11:rdma:/
10:devices:/user.slice
9:hugetlb:/
8:blkio:/user.slice
7:perf_event:/
6:memory:/user.slice/user-1000.slice/session-39.scope
5:cpuset:/
4:cpu,cpuacct:/user.slice
3:freezer:/
2:net_cls,net_prio:/
1:name=systemd:/user.slice/user-1000.slice/session-39.scope
0::/user.slice/user-1000.slice/session-39.scope

$ podman run --pids-limit 0 --cgroup-manager systemd --systemd=always -it -p "127.0.0.1:2222:22" fedora /sbin/init
systemd v243.8-1.fc31 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization podman.
Detected architecture x86-64.

Welcome to Fedora 31 (Container Image)!

Not sure why mine works and yours doesn't.

Maybe post a podman --log-level debug run .....?

goochjj on 1 Jul 2020

Could it be related that as part of trying to solve this, I added this kernel option in /etc/default/grub?

$ grep systemd /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash systemd.unified_cgroup_hierarchy=1"

markstos on 1 Jul 2020

If that were loaded, your cgroup file would look like this:

0::/user.slice/user-1000.slice/session-61.scope

What does /proc/cmdline say?

goochjj on 1 Jul 2020

@goochjj You are right, it doesn't show that argument as being active:

cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.4.0-37-generic root=/dev/mapper/vgubuntu-root ro quiet splash vt.handoff=7

But if I look in /boot/grub/grub.cfg, it's there:

grep systemd /boot/grub/grub.cfg
        linux   /vmlinuz-5.4.0-39-generic root=/dev/mapper/vgubuntu-root ro  quiet splash systemd.unified_cgroup_hierarchy=1 $vt_handoff
                linux   /vmlinuz-5.4.0-39-generic root=/dev/mapper/vgubuntu-root ro  quiet splash systemd.unified_cgroup_hierarchy=1 $vt_handoff
                linux   /vmlinuz-5.4.0-37-generic root=/dev/mapper/vgubuntu-root ro  quiet splash systemd.unified_cgroup_hierarchy=1 $vt_handoff

But I also see that a new kernel has been added since I last rebooted, so maybe the change wasn't fully activated when I made it, but perhaps since this new kernel was added, the change will be activated next time I reboot? I have some tasks to unwind, but I'll give that a shot soon.

markstos on 1 Jul 2020

Here's the latest from testing on a Ubuntu 20.04 host with /etc/default/grub modified to enabled systemd.unified_cgroup_hierarchy=1. After rebooting, I confirmed the change was in effect:

$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.4.0-39-generic root=/dev/mapper/vgubuntu-root ro quiet splash systemd.unified_cgroup_hierarchy=1

I also found an even simpler test image, which is just Ubuntu + Systemd.

I can now boot the Ubuntu 18.04 systemd container rootless:

$ podman  run -it jrei/systemd-ubuntu:18.04
systemd 237 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
Detected virtualization container-other.
Detected architecture x86-64.

Welcome to Ubuntu 18.04.4 LTS!

Set hostname to <6841bede8441>.
Failed to read AF_UNIX datagram queue length, ignoring: No such file or directory
File /lib/systemd/system/systemd-journald.service:36 configures an IP firewall (IPAddressDeny=any), but the local system does not support BPF/cgroup based firewalling.
Proceeding WITHOUT firewalling in effect! (This warning is only shown for the first loaded unit using IP firewalling.)
[  OK  ] Reached target Swap.
[  OK  ] Reached target Paths.
[  OK  ] Created slice System Slice.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
[  OK  ] Reached target Slices.
[  OK  ] Reached target Local File Systems.
         Starting Create Volatile Files and Directories...
         Starting Journal Service...
[  OK  ] Started Create Volatile Files and Directories.
[  OK  ] Started Journal Service.
[  OK  ] Reached target System Initialization.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target Timers.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
[  OK  ] Reached target Multi-User System.
[  OK  ] Reached target Graphical Interface.

However, attempting to run the Ubuntu 16.04 container the same way fails:

podman  run -it jrei/systemd-ubuntu:16.04
Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[!!!!!!] Failed to mount API filesystems, freezing.
Freezing execution.

Personally, I don't need run Ubuntu 16.04 systemd containers but thought I would test one anyway.

I tried adding some flags for 16.04, but they didn't help:

podman  run -it --volume /sys/fs/cgroup:/sys/fs/cgroup:ro --systemd=always --privileged jrei/systemd-ubuntu:16.04

markstos on 2 Jul 2020

I highly doubt that Ubuntu 16.04 and systemd v229 have any clue what cgroupsv2 looks like.

goochjj on 2 Jul 2020

Yes - we've seen the same thing with RHEL/CentOS 7 on to of v2. Supposedly there's a way to mount cgroups v1 into just the one container to enable it, but when we looked into it, it was a major pain.

mheon on 2 Jul 2020

Also, we've vastly diverged from your original compliant. To wit:

podman193 run --rm -it jrei/systemd-ubuntu:16.04
Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[!!!!!!] Failed to mount API filesystems, freezing.
Freezing execution.

So... Where exactly do we stand on this? :-D

goochjj on 2 Jul 2020

Supposedly there's a way to mount cgroups v1 into just the one container to enable it, but when we looked into it, it was a major pain.

crun 0.14 will allow to mount cgroup v1 inside of a container. It is just the name=systemd controller, but that is what systemd needs.

giuseppe on 2 Jul 2020

if you are using the development version of crun, you could try adding an annotation run.oci.systemd.force_cgroup_v1=/sys/fs/cgroup to the container.

giuseppe on 2 Jul 2020

@markstos can we have a summary at this point of
1) what the problem was
2) what the solution was (since I think it's resolved at this point), or if a workaround is needed (i.e. systemd-run), or what have you
3) what still needs resolution

This thread is really long - I think a reset/summary is a good idea

goochjj on 2 Jul 2020

Problem Statement

@goochjj There is a regression in the 2.0 release that prevents some systemd containers from launching in some environments. I think you captured it best in a comment four days ago:

https://github.com/containers/libpod/issues/6734#issuecomment-650806112

Here are the differences you noticed from 1.9 to 2.0:

Notable differences:

1. systemd+unified cgroups are descended from [email protected], not my session scope.
2. memory and pids are descended from [email protected], not my session scope

Root Cause

@skorhone believes the regression was introduced in this PR: https://github.com/containers/libpod/pull/6569

Known Workarounds

Enable cgroupsv2. Update /etc/default/grub to add systemd.unified_cgroup_hierarchy=1 to GRUB_CMDLINE_LINUX_DEFAULT and follow standard procedures to update Grub and reboot.
Don't use images with old versions of systemd, such as Ubuntu 16.04.

markstos on 2 Jul 2020

Known Workarounds

Enable cgroupsv2. Update /etc/default/grub to add systemd.unified_cgroup_hierarchy=1 to GRUB_CMDLINE_LINUX_DEFAULT and follow standard procedures to update Grub and reboot.

Don't use images with old versions of systemd, such as Ubuntu 16.04.

But to be clear, you tested this on Cgroups V2, and podman 1.9.3 can't run Systemd from Ubuntu 16.04 on Cgroups V2 either.

Did you test 16.04 on cgroups 1?

I show on Cgroups v1, you CAN run 16.04 images and old systemd images. It requires the same workarounds as before, systemd-run or whatever, to work on podman v2.

So ultimately.

1) Enable cgroupsv2. In your case, you do that with grub config, in FCOS, use kargs, etc. In F32, it's already default. Realize when you enable Cgroups V2, you lose the ability to run OLD systemd, in any form, in any container. Unless @giuseppe 's patch to crun, dev, allows this. That said, you can happily use 16.04 for whatever you want. The limitation is that systemd didn't get anything involving cgroups v2 until at least v233.

2) Launch your podman in a proper systemd scope, using systemd-run --scope --user or systemd-run --user -P
3) Manually specify cgroup-manager systemd:

(focal)mrwizard@FocalCG1Dev:~/src/podman
$ podman run --rm -it --cgroup-manager systemd jrei/systemd-ubuntu:16.04
systemd 229 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN)
Detected virtualization container-other.
Detected architecture x86-64.

Welcome to Ubuntu 16.04.6 LTS!

Set hostname to <91b5f3e21931>.
Failed to read AF_UNIX datagram queue length, ignoring: No such file or directory
Failed to install release agent, ignoring: Permission denied
[  OK  ] Created slice System Slice.
[  OK  ] Listening on Journal Socket.
[  OK  ] Reached target Slices.

If you want to be able to do sshd without systemd... you can.
podman --init --rm -it ubuntu:16.04 sh -c 'apt-get update && apt-get install -y openssh-server && /usr/sbin/sshd -D'

Obviously you'd do that in your build, not on the command line.
You could also use podman exec instead of ssh.
We don't know your use case, so we can't advise better.

NOW, if someone who knows more than me wants to figure out WHY --cgroup-manager systemd is required, why podman isn't automatically detecting that as an option or using it... those are actionable.

I can also say that setting

[engine]

# Cgroup management implementation used for the runtime.
# Valid options “systemd” or “cgroupfs”
#
cgroup_manager = "systemd"

Makes no difference, and the only warning I see cgroup related is:
time="2020-07-02T13:51:15-04:00" level=warning msg="Failed to add conmon to cgroupfs sandbox cgroup: error creating cgroup for blkio: mkdir /sys/fs/cgroup/blkio/libpod_parent: permission denied"

Which isn't entirely unexpected on CGv1. But the big question is - WHY is podman using cgroupfs?

goochjj on 2 Jul 2020

🚀1

To be clear - the regression wasn't introduced _directly_ in that patch, your real problem is cgroup-manager is wrong.

goochjj on 2 Jul 2020

@giuseppe fyi I can't get the annotation to work.

mrwizard@fedora32dev:~/src/podman
$ podman run --rm -it --annotation run.oci.systemd.force_cgroup_v1=/sys/fs/cgroup jrei/systemd-ubuntu:16.04
Error: mount `cgroup` to '/sys/fs/cgroup/systemd': Operation not permitted: OCI runtime permission denied error

goochjj on 2 Jul 2020

You need to be root. It is a privileged operation

giuseppe on 2 Jul 2020

👍1

$ sudo podman --runtime /usr/local/bin/crun run --rm -it --annotation run.oci.systemd.force_cgroup_v1=/sys/fs/cgroup jrei/systemd-ubuntu:16.04  ls -l /sys/fs/cgroup/
Error: mount `cgroup` to '/sys/fs/cgroup/systemd': Operation not permitted: OCI runtime permission denied error

goochjj on 2 Jul 2020

Using tag-0.14

goochjj on 2 Jul 2020

I wasn't aware of the issue as I could have documented it better, but it seems that it is necessary to mount the named hierarchy in the host first:

# mkdir /sys/fs/cgroup/systemd && mount -t cgroup cgroup -o none,name=systemd,xattr /sys/fs/cgroup/systemd

also please enforce the systemd mode with --systemd always unless your init binary is /sbin/init or systemd

If you create a subcgroup like:

# mkdir /sys/fs/cgroup/systemd/1000
# chown 1000:1000 mkdir /sys/fs/cgroup/systemd/1000
# echo $ROOTLESS_TERMINAL_PROCESS_PID > /sys/fs/cgroup/systemd/1000/cgroup.procs

you'll be able to use the feature also as rootless (user 1000 assumed in my example above).

giuseppe on 2 Jul 2020

👍1

OK that is just really cool. This is how I would implement.

As root: (likely wrap this up in a systemd unit to run at boot time, maybe systemd-cgv1.service)

mkdir /sys/fs/cgroup/systemd
mount -t cgroup cgroup -o none,name=systemd,xattr /sys/fs/cgroup/systemd

And wrap this up in something like [email protected]:

mkdir -P /sys/fs/cgroup/systemd/user.slice/user-1000.slice
chown -R 1000.1000 /sys/fs/cgroup/systemd/user.slice/user-1000.slice

Since we're making our own convention it could even use username instead of uid...

Note this essentially creates a kinda unified hybrid... Host systemd doesn't know about the v1 cgroup, which I suppose is fine.

$ cat /proc/self/cgroup
1:name=systemd:/
0::/user.slice/user-1000.slice/session-16.scope

So, as user - add this to your .bash_profile (or similar):

echo $BASHPID > "/sys/fs/cgroup/systemd/user.slice/user-$UID.slice/cgroup.procs"

And then you're good to go.
podman run --rm -it --annotation run.oci.systemd.force_cgroup_v1=/sys/fs/cgroup jrei/systemd-ubuntu:16.04

That works for me. That's pretty cool @giuseppe

goochjj on 3 Jul 2020

👍1

So are there ultimate any action items here?

markstos on 13 Jul 2020

I don't think we can revert the patch (https://github.com/containers/podman/pull/6569) that caused this, given it does solve another bug. Also, interacting with systemd with cgroup-manager is set to cgroupfs is inherently risky (we might be running on a system without systemd, so we can't rely on it being present). We could potentially look into setting up a fresh systemd cgroup as part of systemd-integrated containers only, but that will run into similar problems. Giuseppe's mount-v1-in-v2 solution is similarly interesting, but requires root for the initial setup, so I don't think we can rely on it by default.

mheon on 13 Jul 2020

This is the only part that's actionable IMHO:

NOW, if someone who knows more than me wants to figure out WHY --cgroup-manager systemd is required, why podman isn't automatically detecting that as an option or using it... those are actionable.

goochjj on 13 Jul 2020

I'm going to rename the issue for visibility, so people landing on the issue tracker with this can find it.

@giuseppe - Is it possible we can make the removed call to systemd to make a cgroup part of the systemd integration code? We'll still potentially leak dbus processes, but only in the case where the user was trying to start systemd-in-a-container on a non-systemd distro, from my understanding?

mheon on 17 Jul 2020

bump

dustymabe on 24 Jul 2020

@giuseppe PTAL

vrothberg on 24 Jul 2020

if reverting (#6569) solves your issue, you can force a new scope wrapping podman with systemd-run as systemd-run --user --scope podman ....

In your case it will be: systemd-run --user --scope podman run -it jrei/systemd-ubuntu:16.04

giuseppe on 11 Aug 2020

👍1

In your case it will be: systemd-run --user --scope podman run -it jrei/systemd-ubuntu:16.04

Thanks for this. It's useful for Molecule users with this problem. Molecule works again with Podman 2 on Ubuntu when running alias podman="systemd-run --user --scope podman".

c-goes on 14 Aug 2020

Note that we've got this issue flagged as something to be fixed before we switch to podman 2.x in Fedora CoreOS. Is there any resolution or more information that we should be using to inform our decision here?

Context: https://github.com/coreos/fedora-coreos-tracker/issues/575

dustymabe on 14 Aug 2020

@giuseppe is this something you can look at?

baude on 17 Aug 2020

PR: https://github.com/containers/podman/pull/7339

Can anyone who is on cgroup v1 please try it?

giuseppe on 17 Aug 2020

@dustymabe ^^ Mind testing this?

mheon on 17 Aug 2020

I can test if someone gives me a link to an RPM. Sorry for the delayed response.

dustymabe on 20 Aug 2020

I tried compiling podman today
This is the output of testing podman master branch:

ubuntu@test:~$ podman version
Version:      2.1.0-dev
API Version:  1
Go Version:   go1.13.8
Git Commit:   f99954c7ca4428e501676fa47a63b5cecadd9454
Built:        Wed Aug 26 22:23:48 2020
OS/Arch:      linux/amd64

ubuntu@test:~$ podman run --name systemd1 --privileged  --security-opt=seccomp=unconfined -it --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro geerlingguy/docker-ubuntu2004-ansible:latest
systemd 244.3-1ubuntu1 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
Detected virtualization podman.
Detected architecture x86-64.

Welcome to Ubuntu Focal Fossa (development branch)!

Set hostname to <3c67712e3767>.
Failed to create /user.slice/user-1000.slice/session-34.scope/init.scope control group: Permission denied
Failed to allocate manager object: Permission denied
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...

Rootless with systemd-run:

ubuntu@test:~$ systemd-run --user --scope podman run --name systemd1 --privileged  --security-opt=seccomp=unconfined -it --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro geerlingguy/docker-ubuntu2004-ansible:latest
Running scope as unit: run-r7a966731f3244da7995d2bd80fa9ae3c.scope
systemd 244.3-1ubuntu1 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
Detected virtualization podman.
Detected architecture x86-64.

Welcome to Ubuntu Focal Fossa (development branch)!

Set hostname to <bf428a64a7e5>.
Couldn't move remaining userspace processes, ignoring: Input/output error
/usr/lib/systemd/system-generators/systemd-crontab-generator failed with exit status 1.
/lib/systemd/system/dbus.socket:5: ListenStream= references a path below legacy directory /var/run/, updating /var/run/dbus/system_bus_socket → /run/dbus/system_bus_socket; please update the unit file accordingly.
Unnecessary job for /dev/sda1 was removed.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
proc-sys-fs-binfmt_misc.automount: Failed to initialize automounter: Operation not permitted
proc-sys-fs-binfmt_misc.automount: Failed with result 'resources'.
[FAILED] Failed to set up automount Arbitrary Executable File Formats File System Automount Point.
See 'systemctl status proc-sys-fs-binfmt_misc.automount' for details.
[  OK  ] Reached target Local Encrypted Volumes.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Reached target Slices.
[  OK  ] Reached target Swap.
[  OK  ] Listening on Syslog Socket.
[  OK  ] Listening on initctl Compatibility Named Pipe.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
         Mounting Kernel Debug File System...
         Starting Journal Service...
         Starting Remount Root and Kernel File Systems...
         Starting Apply Kernel Variables...
sys-kernel-debug.mount: Mount process exited, code=exited, status=32/n/a
sys-kernel-debug.mount: Failed with result 'exit-code'.
[FAILED] Failed to mount Kernel Debug File System.
See 'systemctl status sys-kernel-debug.mount' for details.
[  OK  ] Started Remount Root and Kernel File Systems.
         Starting Create System Users...
[  OK  ] Started Apply Kernel Variables.
[  OK  ] Started Journal Service.
         Starting Flush Journal to Persistent Storage...
[  OK  ] Started Flush Journal to Persistent Storage.
[  OK  ] Started Create System Users.
         Starting Create Static Device Nodes in /dev...
[  OK  ] Started Create Static Device Nodes in /dev.
[  OK  ] Reached target Local File Systems (Pre).
[  OK  ] Reached target Local File Systems.
         Starting Create Volatile Files and Directories...
[  OK  ] Started Create Volatile Files and Directories.
         Starting Network Name Resolution...
[  OK  ] Reached target System Time Set.
[  OK  ] Reached target System Time Synchronized.
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Started Update UTMP about System Boot/Shutdown.
[  OK  ] Reached target System Initialization.
[  OK  ] Started systemd-cron path monitor.
[  OK  ] Started Daily apt download activities.
[  OK  ] Started Daily apt upgrade and clean activities.
[  OK  ] Started systemd-cron daily timer.
[  OK  ] Started systemd-cron hourly timer.
[  OK  ] Started systemd-cron monthly timer.
[  OK  ] Started systemd-cron weekly timer.
[  OK  ] Started Periodic ext4 Online Metadata Check for All Filesystems.
[  OK  ] Started Message of the Day.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target systemd-cron.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Timers.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
[  OK  ] Started D-Bus System Message Bus.
[  OK  ] Started Save initial kernel messages after boot.
         Starting Remove Stale Online ext4 Metadata Check Snapshots...
         Starting System Logging Service...
         Starting Login Service...
         Starting Permit User Sessions...
[  OK  ] Started System Logging Service.
[  OK  ] Started Permit User Sessions.
[  OK  ] Started Network Name Resolution.
[  OK  ] Reached target Host and Network Name Lookups.
[  OK  ] Started Login Service.
[  OK  ] Reached target Multi-User System.
[  OK  ] Started Remove Stale Online ext4 Metadata Check Snapshots.
[  OK  ] Reached target Graphical Interface.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Started Update UTMP about System Runlevel Changes.

Rootfull works however:

ubuntu@test:~$ sudo podman run --name systemd1 --privileged  --security-opt=seccomp=unconfined -it --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro geerlingguy/docker-ubuntu2004-ansible:latest
systemd 244.3-1ubuntu1 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
Detected virtualization podman.
Detected architecture x86-64.

Welcome to Ubuntu Focal Fossa (development branch)!

Set hostname to <106593ccc268>.
/lib/systemd/system/dbus.socket:5: ListenStream= references a path below legacy directory /var/run/, updating /var/run/dbus/system_bus_socket → /run/dbus/system_bus_socket; please update the unit file accordingly.
Unnecessary job for /dev/sda1 was removed.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
[  OK  ] Set up automount Arbitrary Executable File Formats File System Automount Point.
[  OK  ] Reached target Local Encrypted Volumes.
[  OK  ] Reached target Remote File Systems.
[  OK  ] Reached target Slices.
[  OK  ] Reached target Swap.
[  OK  ] Listening on Syslog Socket.
[  OK  ] Listening on initctl Compatibility Named Pipe.
[  OK  ] Listening on Journal Audit Socket.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
         Mounting Huge Pages File System...
         Mounting Kernel Debug File System...
         Starting Journal Service...
         Mounting FUSE Control File System...
         Starting Remount Root and Kernel File Systems...
         Starting Apply Kernel Variables...
[  OK  ] Mounted Huge Pages File System.
[  OK  ] Mounted Kernel Debug File System.
[  OK  ] Mounted FUSE Control File System.
[  OK  ] Started Apply Kernel Variables.
[  OK  ] Started Remount Root and Kernel File Systems.
         Starting Create System Users...
[  OK  ] Started Create System Users.
         Starting Create Static Device Nodes in /dev...
[  OK  ] Started Create Static Device Nodes in /dev.
[  OK  ] Reached target Local File Systems (Pre).
[  OK  ] Reached target Local File Systems.
[  OK  ] Started Journal Service.
         Starting Flush Journal to Persistent Storage...
[  OK  ] Started Flush Journal to Persistent Storage.
         Starting Create Volatile Files and Directories...
[  OK  ] Started Create Volatile Files and Directories.
         Starting Network Name Resolution...
[  OK  ] Reached target System Time Set.
[  OK  ] Reached target System Time Synchronized.
         Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Started Update UTMP about System Boot/Shutdown.
[  OK  ] Reached target System Initialization.
[  OK  ] Started systemd-cron path monitor.
[  OK  ] Started Daily apt download activities.
[  OK  ] Started Daily apt upgrade and clean activities.
[  OK  ] Started systemd-cron daily timer.
[  OK  ] Started systemd-cron hourly timer.
[  OK  ] Started systemd-cron monthly timer.
[  OK  ] Started systemd-cron weekly timer.
[  OK  ] Started Periodic ext4 Online Metadata Check for All Filesystems.
[  OK  ] Started Message of the Day.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target systemd-cron.
[  OK  ] Reached target Paths.
[  OK  ] Reached target Timers.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
[  OK  ] Started D-Bus System Message Bus.
[  OK  ] Started Save initial kernel messages after boot.
         Starting Remove Stale Online ext4 Metadata Check Snapshots...
         Starting System Logging Service...
         Starting Login Service...
         Starting Permit User Sessions...
[  OK  ] Started System Logging Service.
[  OK  ] Started Permit User Sessions.
[  OK  ] Started Network Name Resolution.
[  OK  ] Reached target Host and Network Name Lookups.
[  OK  ] Started Login Service.
[  OK  ] Reached target Multi-User System.
[  OK  ] Started Remove Stale Online ext4 Metadata Check Snapshots.
[  OK  ] Reached target Graphical Interface.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Started Update UTMP about System Runlevel Changes.

c-goes on 27 Aug 2020

@c-goes - see https://github.com/containers/podman/issues/7441. I think you're hitting that.

dustymabe on 27 Aug 2020

👍1

Podman: After podman 2 upgrade, systemd fails to start in containers on cgroups v1 hosts

All 148 comments

Problem Statement

Root Cause

Recommended Solutions

Known Workarounds

Known Workarounds

Related issues