This is an RFE after talking with @mheon for a bit in IRC (thanks for that, sorry I kept you so late). In the shortest form I can think of the enhancement would be: facilitate podman/conmon interacting with systemd in a way that provides console output for systemctl and journalctl. In bullet form:
/sbin/nologin/etc/systemd/system/<unit>.service) that specifies that "system" user in User=.systemctl start <unit>.service and be able to see the console output of the containerjournalctl -u <unit>.service and be able to see the historical console output of the containerMy use case is that I want to use podman to run images that are essentially "system" services, but as "user" because I want the rootless isolation. I've been consuming podman for a bit now (starting with 1.8.2) and am likely stuck on that version because in new versions my approach gets broken: I loose all logging from the container. I have tried --log-driver=journald but have no idea how to find a hand-hold for the console output (what -u should I be looking for, because its not
Here is an example of mattermost, under 1.8.2 this works how I'd like it to work (e.g. I'm getting console output). I'm doing some things that are different than what podman generate systemd offers, but it's because my explicit goal is to:
sudo -u <user> -h <home> podman logs <container-name>)[root@vault ~]# systemctl cat podman-mattermost.service
# /etc/systemd/system/podman-mattermost.service
[Unit]
Description=Podman running mattermost
Wants=network.target
After=network-online.target
Requires=podman-mattermost-postgres.service
[Service]
WorkingDirectory=/app/gitlab
User=gitlab
Group=gitlab
Restart=no
ExecStartPre=/usr/bin/rm -f %T/%N.pid %T/%N.cid
ExecStartPre=/usr/bin/podman rm --ignore -f mattermost
ExecStart=/usr/bin/podman run --conmon-pidfile %T/%N.pid --cidfile %T/%N.cid --cgroups=no-conmon \
--name=mattermost \
--env-file /app/gitlab/mattermost/mattermost.env \
--publish 127.0.0.1:8065:8065 \
--security-opt label=disable \
--health-cmd=none \
--volume /app/gitlab/mattermost/data:/mattermost/data \
--volume /app/gitlab/mattermost/logs:/mattermost/logs \
--volume /app/gitlab/mattermost/config:/mattermost/config \
--volume /app/gitlab/mattermost/plugins:/mattermost/client/plugins \
docker.io/mattermost/mattermost-team-edition:release-5.24
ExecStop=/usr/bin/podman stop --ignore mattermost -t 30
ExecStopPost=/usr/bin/podman rm --ignore -f mattermost
ExecStopPost=/usr/bin/rm -f %T/%N.pid %T/%N.cid
KillMode=none
Type=simple
[Install]
WantedBy=multi-user.target default.target
[root@vault ~]# systemctl cat podman-mattermost-postgres.service
# /etc/systemd/system/podman-mattermost-postgres.service
[Unit]
Description=Podman running postgres for mattermost
Wants=network.target
After=network-online.target podman-mattermost.service
PartOf=podman-mattermost.service
[Service]
WorkingDirectory=/app/gitlab
User=gitlab
Group=gitlab
Restart=no
ExecStartPre=/usr/bin/rm -f %T/%N.pid %T/%N.cid
ExecStartPre=/usr/bin/podman rm --ignore -f postgres
ExecStart=/usr/bin/podman run --conmon-pidfile %T/%N.pid --cidfile %T/%N.cid --cgroups=no-conmon \
--name=postgres \
--env-file /app/gitlab/mattermost/postgres.env \
--net=container:mattermost \
--volume /app/gitlab/mattermost/postgres:/var/lib/postgresql/data:Z \
docker.io/postgres:12
ExecStop=/usr/bin/podman stop --ignore postgres -t 30
ExecStopPost=/usr/bin/podman rm --ignore -f postgres
ExecStopPost=/usr/bin/rm -f %T/%N.pid %T/%N.cid
KillMode=none
Type=simple
[Install]
WantedBy=multi-user.target default.target
With these units above I am able to:
Type=simple and lack of -d)systemctl <unit> and journalctl -u <unit>ExecPre and ExecStop)podman-mattermost.service requires the podman-mattermost-postgres.service (Requires=)podman-mattermost-postgres.service will get a stop signal if I stop podman-mattermost.service (PartOf=)podman-mattermost.service closes out the networking namespace before podman-mattermost-postgres.service can finish up (I think), so it's not ideal... i'd be interested in suggestions.Tagging @lsm5 as well since I think for my use case I'm relegated to use 1.8.2 in F32 for the time being... so I am wondering if that is going away any-time soon?
If I didn't hit it clearly, I did try to adopt 1.9.2. It requires a couple things (but ultimately does not work well). #6084 has some more information as well.
loginctl enable-linger on the "system" userpodman generate systemd like -d, Type=forkingStarting this up, you can only see console output from the container by doing sudo -u <user> -h <home> podman logs <containername> where systemctl/journalctl give you nothing.
The --log-driver=journald doesn't allow for anything better... because I can't figure out what the unit is to actually query logs from (I think it might be some composite of the container id?)... and when you do a sudo -u <user> -h <home> podman logs <containername> you get nothing.
you can get 1.8.2-2 from https://koji.fedoraproject.org/koji/buildinfo?buildID=1479547
I'll save it to my fedorapeople page as well and send you the URL later.
if you enable linger mode and there is already the user session running, is there any disadvantage in in installing the .service file into ~/.config/systemd/user/?
@giuseppe for you to be able to do that you'd need a shell for that "system" account. Above I'm creating the user as root with a /sbin/nologin shell. To access the systemctl --user session you'd actually need to login, or you'd need to set the XDG_RUNTIME_DIR variable... I think... (it could also be DBUS_SESSION_BUS_ADRESS) like XDG_RUNTIME_DIR=/run/user/$UID systemctl --user status. It generally gets messy.
Also, this suggestion doesn't address what I'm primarily asking for above: desiring console output from the running container to be seen by systemd/journald. Without being able to see the combination on:
You have an extremely hard time figuring out what is going on with the system (you have to look in multiple places to piece together the state of errors).
@vrothberg The core ask here (viewing logs for systemd-managed Podman) seems to be a pretty valid one - our current forking= approach does break this, and podman logs becomes very inconvenient when the services are running as rootless and you have to sudo into each of them to get logs.
I was thinking that it ought to be possible for the journald log driver to write straight to the logs for the unit file if we know it, and we did add something similar for auto-update?
There was a very similar request by @lucab :https://github.com/coreos/fedora-coreos-docs/pull/75#issuecomment-633512357
I was also thinking about the log driver :+1:
@ashley-cui Could you look into the --log-driver changes?
@storrgie I've been pursuing similar things recently.
Do -d, and keep the forking.
Enable --log-driver journald
That alone should take care of all container logs showing up in journald, you just need to do
journalctl CONTAINER_NAME=mattermost
As conmon will be providing those keys - CONTAINER_ID an CONTAINER_NAME. I've been doing lots of testing, basically what I've been doing is - start a container, generate the output to journald, then use journalctl -n 10 to grab the last 10 lines and find a line it logged, tweaking for 20 or 30 lines or whatever it takes. Then journalctl -n 10 -o json-pretty or -o json to get the raw line and figure out what other metadata you have to work with.
You could use CONTAINER_TAG too... i.e. add --log-opt tag=WhateverYouWant and find it with
journalctl CONTAINER_TAG=WhateverYouWant
If you want it to show under the unit, like I do, I do this:
--cgroup-parent=/system.slice/%n --cgroup-manager cgroupfs
Note, my container is root, not rootless, and the host is running Flatcar. My guess is you can get similar results by possibly tweaking the cgroup-parent. By putting the processes under the cgroup, systemd finds that they're associated with a unit - but I'd expect conmon being in the correct cgroup SHOULD be all you need.
The added benefit of running all the processes in the systemd service's cgroup is that bind mounted /dev/log ALSO associates to the unit file, automagically. You don't get the automagic CONTAINER_NAME from conmon journald records, but you DO get anything you put in the service file as a LogExtraField - so you could use that to find your logs as well.
I'm running rootless containers on Fedora Server. I'm able to see logs using --log-opt tag=<tag> and journalctl CONTAINER_TAG=<tag>. However, when I add --cgroup-parent=/system.slice/%n --cgroup-manager cgroupfs, my units fail with result 'exit-code. @rhatdan, are they failing because they're rootless?
I really do not recommend running --cgroup-manager=cgroupfs with systemd-managed Podman - you end up with both systemd and Podman potentially altering the same cgroup, and I think there's the potential for them to trample each other. If you want to stay in the systemd cgroup, I'd recommend using the crun OCI runtime and passing --cgroups=disabled to prevent Podman from creating a container cgroup. We lose the ability to set resource limits, but you can just set them from within the systemd unit, so it's not a big loss.
(There is also --cgroups=no-conmon to only place Conmon in the systemd cgroup - we use that by default in unit files from podman generate systemd)
I see traffic on the mailing list from @rhatdan about an FAQ... I'm feeling more and more as I learn about this project that the idea this can "replace" docker is basically gimmicky at this stage. There is no clear golden pathway for running containers as daemons on systems with podman+systemd. It seems fraught with edge cases. I'd really love to see this ticket be taken seriously as I think there are a LOT of people trying to depart docker land and systemd+podman is a way to rid yourself of the docker monolithic daemon.
I think we definitely need a single page containing everything we recommend about running containers inside units (best practices, and the reasons for them). I've probably explained why we made the choice for forking vs simple five times at this point; having a single page with a definitive answer on that would be greatly helpful to everyone. We'll need to hash some things out as part of this, especially the use of rootless Podman + root systemd as this issue asks, but even getting the basics written down would be a start.
@mheon that would indeed help, but I'm not sure that's going to solve much. For example, from the thread at https://github.com/coreos/fedora-coreos-docs/pull/75, that content currently exists in the form of a blog post which unfortunately is:
KillMode=none)I think it would be better to first devise a podman mode which works well when integrated in the systemd ecosystem, and only then document it.
As a sidenote, many containerized services (eg. etcd, haproxy, etc.) do use sd-notify in order to signal when they are actually initialized and ready to start serving requests. For that kind of autoscale-friendly logic to work, a Type=notify service unit would be required.
I believe the reason we can't auto-generate Type=notify is because things
are not good if the app in the container does not support it (Podman can
hang) but it should work if you set it (though I'm actually not sure if it
respects our PID files - if it acts like Type=simple in that respect it
will never be really safe to use.)
On the rest, I think the most important thing is getting logging via
Journald working properly. Some things like KillMode I do not expect to be
resolved, and I honestly don't view it as a problem - our design here is
different than typical services by necessity (running without a daemon
forced this), so we don't quite fit into the usual pattern Systemd expects.
Podman will still guarantee that things are cleaned up on stop, as we would
if we are not managed by Systemd.
On Sat, Jun 13, 2020, 04:29 Luca Bruno notifications@github.com wrote:
@mheon https://github.com/mheon that would indeed help, but I'm not
sure that's going to solve much. For example, from the thread at
coreos/fedora-coreos-docs#75
https://github.com/coreos/fedora-coreos-docs/pull/75, that content
currently exists in the form of a blog post
https://www.redhat.com/sysadmin/podman-shareable-systemd-services which
unfortunately is:
- already stale at this point (podman-generate does generate that unit
anymore)- not really integrating well with systemd service handling (e.g.
journald, sd-notify, user setting, etc)- somehow concerning/fragile (e.g. KillMode)
I think it would be better to first devise a podman mode which works well
when integrated in the systemd ecosystem, and only then document it.As a sidenote, many containerized services (eg. etcd, haproxy, etc.) do
use sd-notify in order to signal when they are actually initialized and
ready to start serving requests. For that kind of autoscale-friendly logic
to work, a Type=notify service unit would be required.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/containers/libpod/issues/6400#issuecomment-643591082,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AB3AOCDQFY7UN2HBATOOAETRWM2GLANCNFSM4NLPF4LQ
.
On the user setting specifically - I still believe that is a an issue with
Systemd. We're in contact with the Systemd team to try and find a solution.
On Sat, Jun 13, 2020, 10:11 Matthew Heon matthew.heon@gmail.com wrote:
I believe the reason we can't auto-generate Type=notify is because things
are not good if the app in the container does not support it (Podman can
hang) but it should work if you set it (though I'm actually not sure if it
respects our PID files - if it acts like Type=simple in that respect it
will never be really safe to use.)On the rest, I think the most important thing is getting logging via
Journald working properly. Some things like KillMode I do not expect to be
resolved, and I honestly don't view it as a problem - our design here is
different than typical services by necessity (running without a daemon
forced this), so we don't quite fit into the usual pattern Systemd expects.
Podman will still guarantee that things are cleaned up on stop, as we would
if we are not managed by Systemd.On Sat, Jun 13, 2020, 04:29 Luca Bruno notifications@github.com wrote:
@mheon https://github.com/mheon that would indeed help, but I'm not
sure that's going to solve much. For example, from the thread at
coreos/fedora-coreos-docs#75
https://github.com/coreos/fedora-coreos-docs/pull/75, that content
currently exists in the form of a blog post
https://www.redhat.com/sysadmin/podman-shareable-systemd-services
which unfortunately is:
- already stale at this point (podman-generate does generate that
unit anymore)- not really integrating well with systemd service handling (e.g.
journald, sd-notify, user setting, etc)- somehow concerning/fragile (e.g. KillMode)
I think it would be better to first devise a podman mode which works well
when integrated in the systemd ecosystem, and only then document it.As a sidenote, many containerized services (eg. etcd, haproxy, etc.) do
use sd-notify in order to signal when they are actually initialized and
ready to start serving requests. For that kind of autoscale-friendly logic
to work, a Type=notify service unit would be required.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/containers/libpod/issues/6400#issuecomment-643591082,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AB3AOCDQFY7UN2HBATOOAETRWM2GLANCNFSM4NLPF4LQ
.
We got the User setting working, it was mainly a problem with -d, no? Unless there's something else outstanding, I think that's solved. Similarly, the journald log-driver works well for me... unless you try to log a tty, which would be a bad idea anyway, now that exec is fixed.
Systemd integration isn't great with docker either - docker's log-driver is exactly analogous to what conmon does, docker's containers are launched by the daemon which puts them in another cgroup, unless you use cgroup-parent tricks, and sometimes getting the container to work right w.r.t. logging and groups requires hacks like systemd-docker which throws a hacky shim around sd-notify. So are we really saying podman+systemd is somehow worse? Or just not better? Because it seems better to me. Doesn't seem like Docker has a golden pathway either.
I've run docker w/ cgroup-parent sharing the unit's cgroup and systemd-docker (even though it's unsupported) for over a year, and haven't had any problems with systemd and docker fighting. I'm not sure why podman would... but I defer to the experts.
The only thing I have with docker now that I don't have with podman is bind mounting /dev/log works - because I put the docker container in the same cgroup as the unit. Without that, I'd need some sort of syslog proxy, which would probably have to live in conmon, and is a whole other discussion and probably only relevant to me.
@mheon that would indeed help, but I'm not sure that's going to solve much. For example, from the thread at coreos/fedora-coreos-docs#75, that content currently exists in the form of a blog post which unfortunately is:
* already stale at this point (podman-generate does generate that unit anymore)
That's not accurate. We just updated the blog post last week and do that regularly. The units are still generated the same way. Once Podman v2 is out, we need to create some upstream docs as a living document and point the blog post there.
* not really integrating well with systemd service handling (e.g. journald, sd-notify, user setting, etc)
We only support Type=forking with podman generate systemd.
* somehow concerning/fragile (e.g. `KillMode=none`)
We've been discussing that already in depth. We want Podman to handle shutdown (and killing) and prevent signal races with systemd which does not know the order in which all processes should be killed.
I think it would be better to first devise a podman mode which works well when integrated in the systemd ecosystem, and only then document it.
As a sidenote, many containerized services (eg. etcd, haproxy, etc.) do use sd-notify in order to signal when they are actually initialized and ready to start serving requests. For that kind of autoscale-friendly logic to work, a
Type=notifyservice unit would be required.
Type=notify is supported but we don't generate them with podman generate systemd. I guess this could be part of an upstream doc?
I think we definitely need a single page containing everything we recommend about running containers inside units (best practices, and the reasons for them). I've probably explained why we made the choice for forking vs simple five times at this point; having a single page with a definitive answer on that would be greatly helpful to everyone. We'll need to hash some things out as part of this, especially the use of rootless Podman + root systemd as this issue asks, but even getting the basics written down would be a start.
I agree and made a similar conclusion last week when working with support on some issues. Once v2 is out (and all fixes are in), I'd love us to create a living upstream document that the blog post can link to.
I opened https://github.com/containers/libpod/issues/6604 to break out the logging discussion.
@vrothberg thanks! I shouldn't have piled up more topics in here, sorry for that.
If you prefer, I can split the other ones (e.g. sd-notify) to their own tickets, so they can be incrementally closed as soon as we are done.
No worries at all, @lucab! All input and feedback is much appreciated.
If you prefer, I can split the other ones (e.g. sd-notify) to their own tickets, so they can be incrementally closed as soon as we are done.
That would be great, sure. While we support sd-notify, we don't generate these types. Having a dedicated issue will help us agree on how such a unit should look like and eventually get that into upstream docs (and man pages). Thanks a lot!
Since we're having this discussion, and there's plenty of talk about Killmode, and cgroups, and where things should reside - it makes sense to me that podman's integration with systemd already has a blueprint - that being systemd-nspawn. The [email protected] unit includes things like:
KillMode=mixed
Delegate=yes
Slice=machine.slice
This means (among other things) you end up with
/machine.slice/unit.service/supervisor - which contains the systemd-nspawn ("conmon"-esque) process, and
/machine.slice/unit.service/payload - which contains the contained processes
And systemd has no problem monitoring the supervisor Pid, I'm guessing because Delegate is set, and it's a sub-cgroup.
nspawn has options like --slice, --property, --register, and --keep-unit - probably all of which should be implemented similarly in podman... and the caveats are already spelled out in the documentation.
https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html
nspawn also has options for the journal - how it's bind mounted and supported, plus setting the machine ID properly for those logs... etc.
I'd imagine we'd want nspawn to be the template?
And doing Delegate and sub-cgroups like that also means systemctl status knows the Main PID is the supervisor, but shows the full process tree including the payload clearly in the status output, and the service type is sd-notify, so I imagine it's talking back to systemd to let it know these things.
For that matter I've wondered if it's possible to use/wrap/hack/mangle something into place to allow systemd-nspawn itself to be the OCI container runtime, instead of crun or runc. Moreso a thought experiment than anything else, but the key hangup seems to be nspawn wants a specific mount to use, which podman can provide since it already did all the work to create the appropriate overlay bindmount.
Probably involves reading config.json and turning it into command line arguments? I'm unclear separation-wise which parts of the above fit into which parts of the execution lifecycle.
There was talk about making nspawn accept OCI specs, even that may not be necessary. I don't know how well it would interface with Conmon though.
On the Delegate change - I'd have to think more about what this means for containers which forward host cgroups into the container (we'll need a way to guarantee that the entire unit cgroup isn't forwarded). I also think we'll need to ensure that the container remembers it was started with cgroupfs, so that other Podman commands launched from outside the unit file that require cgroups (e.g. podman stats) still work.
to simulate what nspawn does we'd need to tell the OCI runtime to use the cgroup already created by conmon instead of creating a new one.
Next crun version will automatically create a /container subcgroup in the same way nspawn does.
I think we can go a step further and get closer to what nspawn does by having a single cgroup for conmon+container payload
PoC implementation: https://github.com/containers/libpod/pull/6666
@giuseppe, do we need to additionally set delegation in the units?
yes, we need to add Delegate=true under [Service] so podman is able to manage the cgroup. For rootless it should happen only with cgroup v2.
I am trying to use this rootless on Fedora CoreOS which runs cgroups v1 and I am getting
Error: mkdir /sys/fs/cgroup/pids/system.slice/elasticsearch.service/supervisor: permission denied
Here is my unit:
[Unit]
Description=Elasticsearch Service
Wants=network.target
After=network-online.target
After=mycool-pod.service
[Service]
Delegate=true
User=mycool
Group=mycool
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=on-failure
ExecStartPre=-/usr/local/bin/podman pull elasticsearch:7.5.2
ExecStartPre=-/usr/local/bin/podman volume create %N
ExecStart=/usr/local/bin/podman run --replace --rm -d --log-driver=journald --log-opt tag="{{.ImageName}}" --pod mycool-pod --name %N -e ES_JAVA_OPTS="-Xms512m -Xmx512m" --conmon /usr/local/bin/conmon --cgroups=split --conmon-pidfile=%T/%N.pid --env-file /opt/mycool/envs/elasticsearch.env --volume %N:/usr/share/elasticsearch/data:Z elasticsearch:7.5.2
ExecStop=/usr/local/bin/podman stop -t 10 %N
ExecStopPost=/usr/local/bin/podman stop -t 10 %N
PIDFile=%T/%N.pid
KillMode=none
Type=forking
SyslogIdentifier=%N
[Install]
WantedBy=multi-user.target default.target
Will this only work with rootless on cgroups v2?
I don't think this helps much for cgroups v1 + rootless.
1) Delegate=true only gives ownership to the user for "unified" and "systemd" controllers.
2) Podman could be modified to skip pids, cpu, etc and only do the systemd ones for the supervisor group... but.
3) Rootless podman will not tell runc or crun to move the cgroup... because runc and crun can only receive a single cgroup path, and systemd will spawn this process with various cgroups (devices will be in /, others will be in /user.slice, etc). Even if passed it seems like runc throws it away when it can't set the cpuset group, and crun just ignores it.
So you end up with your entire container and conmon running in %N.service/supervisor, which is no different than just using cgroups enabled - everything will just be in %N.service
split does work with root containers - mainly because all containers can be delegated at that point.
I just rebooted Fedora CoreOS into cgroups v2 and I am seeing this now when setting --cgroups=split
Jun 29 21:32:21 mycool mycool-elasticsearch[2765]: Error: cannot set limits without cgroups: OCI runtime error
Jun 29 21:31:39 mycool systemd[1]: Starting Forem 12345 Elasticsearch Service...
Jun 29 21:31:40 mycool mycool-elasticsearch[1754]: Trying to pull registry.fedoraproject.org/elasticsearch:7.5.2...
Jun 29 21:31:40 mycool mycool-elasticsearch[1754]: manifest unknown: manifest unknown
Jun 29 21:31:40 mycool mycool-elasticsearch[1754]: Trying to pull registry.access.redhat.com/elasticsearch:7.5.2...
Jun 29 21:31:40 mycool mycool-elasticsearch[1754]: name unknown: Repo not found
Jun 29 21:31:40 mycool mycool-elasticsearch[1754]: Trying to pull registry.centos.org/elasticsearch:7.5.2...
Jun 29 21:31:40 mycool mycool-elasticsearch[1754]: manifest unknown: manifest unknown
Jun 29 21:31:40 mycool mycool-elasticsearch[1754]: Trying to pull docker.io/library/elasticsearch:7.5.2...
Jun 29 21:31:42 mycool mycool-elasticsearch[1754]: Getting image source signatures
Jun 29 21:31:42 mycool mycool-elasticsearch[1754]: Copying blob sha256:c82eff1e95f223957666595df82e112d158b37b577d3e3525bdd58890d3ffb0a
Jun 29 21:31:42 mycool mycool-elasticsearch[1754]: Copying blob sha256:63248d573ce9f12efb8d5de9d49e8b7beb5ce9c2b4ed1f2bd8c43fa123ec4781
Jun 29 21:31:42 mycool mycool-elasticsearch[1754]: Copying blob sha256:ab5ef0e5819490abe86106fd9f4381123e37a03e80e650be39f7938d30ecb530
Jun 29 21:31:42 mycool mycool-elasticsearch[1754]: Copying blob sha256:ac819c75e084c8a2b60fec278e7b0b4109aad3f68b4c549566dc99bd51e4ccca
Jun 29 21:31:42 mycool mycool-elasticsearch[1754]: Copying blob sha256:cca059a702d34723ea312d8b4fd3ab4943eb36cbec50252b443a25fc7c1683a7
Jun 29 21:31:42 mycool mycool-elasticsearch[1754]: Copying blob sha256:4a32d65abda10235eac68cfba8dc027d034247ebd09b40beefef9e7574750ec2
Jun 29 21:31:43 mycool mycool-elasticsearch[1754]: Copying blob sha256:6ce84b7d8f2193b31b410756c95cac55a575e133757dce774a9398007b78727d
Jun 29 21:32:01 mycool mycool-elasticsearch[1754]: Copying config sha256:929d271f17988709f8e34bc2e907265f6dc9fc5742326349e0ad808bb213f97a
Jun 29 21:32:01 mycool mycool-elasticsearch[1754]: Writing manifest to image destination
Jun 29 21:32:01 mycool mycool-elasticsearch[1754]: Storing signatures
Jun 29 21:32:20 mycool mycool-elasticsearch[1754]: 929d271f17988709f8e34bc2e907265f6dc9fc5742326349e0ad808bb213f97a
Jun 29 21:32:20 mycool mycool-elasticsearch[2725]: mycool-elasticsearch
Jun 29 21:32:21 mycool mycool-elasticsearch[2765]: Error: cannot set limits without cgroups: OCI runtime error
Jun 29 21:32:21 mycool systemd[1]: mycool-elasticsearch.service: Control process exited, code=exited, status=126/n/a
Jun 29 21:32:21 mycool mycool-elasticsearch[2914]: Error: container d07027fe16dd460d2409c1153ea0b3d4fa614aa9b7f6d8f05cc918585ebafefe does not exist in database: no such container
Jun 29 21:32:21 mycool systemd[1]: mycool-elasticsearch.service: Control process exited, code=exited, status=125/n/a
Jun 29 21:32:21 mycool systemd[1]: mycool-elasticsearch.service: Failed with result 'exit-code'.
Jun 29 21:32:21 mycool systemd[1]: Failed to start My Cool Elasticsearch Service.
Jun 29 21:32:21 mycool systemd[1]: mycool-elasticsearch.service: Consumed 27.470s CPU time.
Jun 29 21:32:21 mycool systemd[1]: mycool-elasticsearch.service: Scheduled restart job, restart counter is at 1.
$ rpm -qa runc crun
runc-1.0.0-144.dev.gite6555cc.fc32.x86_64
crun-0.13-2.fc32.x86_64
$ /usr/local/bin/podman --version
podman version 2.1.0-dev
$ /usr/local/bin/conmon --version
conmon version 2.0.19-dev
commit: ab8f5e5a9b808f7ab3c2098eeada04795914a161
$ cat /etc/os-release
NAME=Fedora
VERSION="32.20200625.1.0 (CoreOS)"
ID=fedora
VERSION_ID=32
VERSION_CODENAME=""
PLATFORM_ID="platform:f32"
PRETTY_NAME="Fedora CoreOS 32.20200625.1.0"
ANSI_COLOR="0;34"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:32"
HOME_URL="https://getfedora.org/coreos/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora-coreos/"
SUPPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
BUG_REPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=32
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=32
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="CoreOS"
VARIANT_ID=coreos
OSTREE_VERSION='32.20200625.1.0'
Here is my fcct config for Ignition:
variant: fcos
version: 1.0.0
passwd:
users:
- name: core
ssh_authorized_keys:
- ssh-ed25519 snip
- name: mycool
system: false
storage:
directories:
- path: /opt/mycool/pids
mode: 0755
user:
name: mycool
group:
name: mycool
- path: /opt/mycool/envs
mode: 0750
user:
name: mycool
group:
name: mycool
- path: /opt/mycool/configs
mode: 0750
user:
name: mycool
group:
name: mycool
- path: /opt/mycool/tmp
mode: 0755
user:
name: mycool
group:
name: mycool
files:
- path: /etc/systemd/system.conf.d/accounting.conf
mode: 0644
contents:
inline: |
[Manager]
DefaultCPUAccounting=yes
DefaultMemoryAccounting=yes
DefaultBlockIOAccounting=yes
- path: /etc/sysctl.d/max-user-watches.conf
mode: 0644
contents:
inline: |
fs.inotify.max_user_watches=16184
- path: /etc/zincati/config.d/55-updates-strategy.toml
mode: 0644
contents:
inline: |
[updates]
strategy = "fleet_lock"
[updates.fleet_lock]
base_url = "https://updates.forem.com/"
- path: /etc/zincati/config.d/90-disable-auto-updates.toml
mode: 0644
contents:
inline: |
[updates]
enabled = false
- path: /etc/hostname
mode: 0644
contents:
inline: |
mycool
- path: /usr/local/bin/podman
contents:
source: https://joedoss.com/downloads/podman.gz
compression: gzip
verification:
hash: sha512-edcca442e664c64eef694b19beac69ad88efd19916494a1845021d489715df7890a677de302357b4422ec2db623beddcf446b304c76a180b74cafad2c4c55fa0
mode: 0555
- path: /usr/local/bin/conmon
contents:
source: https://joedoss.com/downloads/conmon.gz
compression: gzip
verification:
hash: sha512-b85087042347de5fe417266ce4300d23475cc7d9a089c87d4337830c52cbbe434899659b471356f27f285bb37f17f411244fe2aa9d3b7bb27b05d123fc35bdc3
mode: 0555
- path: /opt/mycool/envs/elasticsearch.env
mode: 0640
user:
name: mycool
group:
name: mycool
contents:
inline: |
discovery.type=single-node
cluster.name=forem
bootstrap.memory_lock=true
discovery.type=single-node
xpack.security.enabled=false
xpack.monitoring.enabled=false
xpack.graph.enabled=false
xpack.watcher.enabled=false
systemd:
units:
- name: enable-cgroups-v2.service
enabled: true
contents: |
[Unit]
Description=Enable cgroups v2 (systemd.unified_cgroup_hierarchy=0)
ConditionFirstBoot=true
Wants=basic.target
Before=multi-user.target mycool-pod.service
[Service]
Type=oneshot
ExecStart=/usr/bin/rpm-ostree kargs --delete systemd.unified_cgroup_hierarchy=0 --reboot
ExecStartPost=/usr/bin/sleep infinity
[Install]
WantedBy=basic.target
- name: mycool-pod.service
enabled: true
contents: |
[Unit]
Description=My Cool pod service
Wants=network.target
After=network-online.target
Before=mycool-elasticsearch.service
[Service]
User=mycool
Group=mycool
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=on-failure
ExecStartPre=-/usr/local/bin/podman pod create --conmon /usr/local/bin/conmon --infra-conmon-pidfile %T/%N.pid --name %N -p 443:443 -p 80:80 -p 9090:9090
ExecStart=/usr/local/bin/podman pod start %N
ExecStop=/usr/local/bin/podman pod stop -t 10 %N
ExecStopPost=/usr/local/bin/podman pod stop -t 10 %N
PIDFile=%T/%N.pid
KillMode=none
Type=forking
SyslogIdentifier=%N
[Install]
WantedBy=multi-user.target default.target
- name: mycool-elasticsearch.service
enabled: true
contents: |
[Unit]
Description=My Cool Elasticsearch Service
Wants=network.target
After=network-online.target
After=mycool-pod.service
[Service]
Delegate=true
User=mycool
Group=mycool
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=on-failure
ExecStartPre=-/usr/local/bin/podman pull elasticsearch:7.5.2
ExecStartPre=-/usr/local/bin/podman volume create %N
ExecStart=/usr/local/bin/podman run --replace --rm -d --log-driver=journald --log-opt tag="{{.ImageName}}" --pod mycool-pod --name %N -e ES_JAVA_OPTS="-Xms512m -Xmx512m" --conmon /usr/local/bin/conmon --cgroups=split --conmon-pidfile=%T/%N.pid --env-file /opt/mycool/envs/elasticsearch.env --volume %N:/usr/share/elasticsearch/data:Z elasticsearch:7.5.2
ExecStop=/usr/local/bin/podman stop -t 10 %N
ExecStopPost=/usr/local/bin/podman stop -t 10 %N
PIDFile=%T/%N.pid
KillMode=none
Type=forking
SyslogIdentifier=%N
[Install]
WantedBy=multi-user.target default.target
Some super sweet podman run --log-level debug logs below:
Jun 30 00:13:22 mycool mycool-elasticsearch[33637]: time="2020-06-30T00:13:22Z" level=debug msg="running conmon: /usr/local/bin/conmon" args="[--api-version 1 -c 1221b4fc5b0884db2b86641b33c59723e10e035222ba9739f47c9a25b5433657 -u 1221b4fc5b0884db2b86641b33c59723e10e035222ba9739f47c9a25b5433657 -r /usr/bin/crun -b /var/home/forem-12345/.local/share/containers/storage/overlay-containers/1221b4fc5b0884db2b86641b33c59723e10e035222ba9739f47c9a25b5433657/userdata -p /tmp/run-1001/containers/overlay-containers/1221b4fc5b0884db2b86641b33c59723e10e035222ba9739f47c9a25b5433657/userdata/pidfile -n mycool-elasticsearch --exit-dir /tmp/run-1001/libpod/tmp/exits --socket-dir-path /tmp/run-1001/libpod/tmp/socket -l journald --log-level debug --syslog --log-tag docker.io/library/elasticsearch:7.5.2 --conmon-pidfile /tmp/mycool-elasticsearch.pid --exit-command /var/usrlocal/bin/podman --exit-command-arg --root --exit-command-arg /var/home/forem-12345/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /tmp/run-1001/containers --exit-command-arg --log-level --exit-command-arg debug --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /tmp/run-1001/libpod/tmp --exit-command-arg --runtime --exit-command-arg /usr/bin/crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mount_program=/usr/bin/fuse-overlayfs --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg --syslog --exit-command-arg true --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg 1221b4fc5b0884db2b86641b33c59723e10e035222ba9739f47c9a25b5433657]"
Jun 30 00:13:22 mycool mycool-elasticsearch[33649]: [conmon:d]: failed to write to /proc/self/oom_score_adj: Permission denied
Jun 30 00:13:22 mycool conmon[33649]: conmon 1221b4fc5b0884db2b86 <ndebug>: failed to write to /proc/self/oom_score_adj: Permission denied
Jun 30 00:13:22 mycool conmon[33650]: conmon 1221b4fc5b0884db2b86 <ninfo>: attach sock path: /tmp/run-1001/libpod/tmp/socket/1221b4fc5b0884db2b86641b33c59723e10e035222ba9739f47c9a25b5433657/attach
Jun 30 00:13:22 mycool conmon[33650]: conmon 1221b4fc5b0884db2b86 <ninfo>: addr{sun_family=AF_UNIX, sun_path=/tmp/run-1001/libpod/tmp/socket/1221b4fc5b0884db2b86641b33c59723e10e035222ba9739f47c9a25b5433657/attach}
Jun 30 00:13:22 mycool conmon[33650]: conmon 1221b4fc5b0884db2b86 <ninfo>: terminal_ctrl_fd: 13
Jun 30 00:13:22 mycool conmon[33650]: conmon 1221b4fc5b0884db2b86 <ninfo>: winsz read side: 15, winsz write side: 15
Jun 30 00:13:22 mycool conmon[33651]: conmon 1221b4fc5b0884db2b86 <nwarn>: Failed to chown stdin
Jun 30 00:13:22 mycool conmon[33650]: conmon 1221b4fc5b0884db2b86 <error>: Failed to create container: exit status 1
Jun 30 00:13:22 mycool mycool-elasticsearch[33637]: time="2020-06-30T00:13:22Z" level=debug msg="Received: -1"
Jun 30 00:13:22 mycool mycool-elasticsearch[33637]: time="2020-06-30T00:13:22Z" level=debug msg="Cleaning up container 1221b4fc5b0884db2b86641b33c59723e10e035222ba9739f47c9a25b5433657"
Jun 30 00:13:22 mycool mycool-elasticsearch[33637]: time="2020-06-30T00:13:22Z" level=debug msg="unmounted container \"1221b4fc5b0884db2b86641b33c59723e10e035222ba9739f47c9a25b5433657\""
Jun 30 00:13:22 mycool mycool-elasticsearch[33637]: time="2020-06-30T00:13:22Z" level=debug msg="ExitCode msg: \"cannot set limits without cgroups: oci runtime error\""
Jun 30 00:13:22 mycool mycool-elasticsearch[33637]: Error: cannot set limits without cgroups: OCI runtime error
Jun 30 00:13:22 mycool systemd[1]: mycool-elasticsearch.service: Control process exited, code=exited, status=126/n/a
Jun 29 21:32:21 mycool mycool-elasticsearch[2765]: Error: cannot set limits without cgroups: OCI runtime error
that is a problem in crun, that is fixed upstrem. I am going to cut a new release in the next days.
@giuseppe awesome! I just RTFMed a bit but I couldn't find any info on setting podman to use a different crun binary. Is there a flag I can set to call a different binary until this fix get merged in upstream crun and eventually in FCOS next stream?
@jdoss you could specify it on the command line like podman --runtime /path/to/the/other/executable/crun .. or you can override its path from the containers.conf file
@jdoss If you're using selinux, I suggest you compile and place the crun binary in /usr/local/bin, as that folder is recognized in the policy. If you're going to have a local podman or runc or crun it should be there, and chcon'd to match, i.e. chcon --reference=/usr/bin/crun /usr/local/bin/crun
In /etc/containers/containers.conf:
runtime = "crun"
[engine.runtimes]
crun = [ "/usr/local/bin/crun" ]
Or specify it on the command line as @giuseppe indicated.
@goochjj and @giuseppe I just compiled crun from master and put it in /usr/local/bin/crun and it's still getting the same error:
# /usr/local/bin/crun --version
crun version 0.13.227-d38b
commit: d38b8c28fc50a14978a27fa6afc69a55bfdd2c11
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
Jun 30 15:48:12 mycool mycool-elasticsearch[65963]: [conmon:d]: failed to write to /proc/self/oom_score_adj: Permission denied
Jun 30 15:48:12 mycool conmon[65963]: conmon c0cf8da55a1936150298 <ndebug>: failed to write to /proc/self/oom_score_adj: Permission denied
Jun 30 15:48:12 mycool conmon[65964]: conmon c0cf8da55a1936150298 <ninfo>: attach sock path: /tmp/run-1001/libpod/tmp/socket/c0cf8da55a1936150298fdf9608ad04154a9de89c774c0fc93e2809b489a97b7/attach
Jun 30 15:48:12 mycool conmon[65964]: conmon c0cf8da55a1936150298 <ninfo>: addr{sun_family=AF_UNIX, sun_path=/tmp/run-1001/libpod/tmp/socket/c0cf8da55a1936150298fdf9608ad04154a9de89c774c0fc93e2809b489a97b7/attach}
Jun 30 15:48:12 mycool conmon[65964]: conmon c0cf8da55a1936150298 <ninfo>: terminal_ctrl_fd: 13
Jun 30 15:48:12 mycool conmon[65964]: conmon c0cf8da55a1936150298 <ninfo>: winsz read side: 15, winsz write side: 15
Jun 30 15:48:12 mycool conmon[65965]: conmon c0cf8da55a1936150298 <nwarn>: Failed to chown stdin
Jun 30 15:48:12 mycool conmon[65964]: conmon c0cf8da55a1936150298 <error>: Failed to create container: exit status 1
Jun 30 15:48:12 mycool mycool-elasticsearch[65952]: time="2020-06-30T15:48:12Z" level=debug msg="Received: -1"
Jun 30 15:48:12 mycool mycool-elasticsearch[65952]: time="2020-06-30T15:48:12Z" level=debug msg="Cleaning up container c0cf8da55a1936150298fdf9608ad04154a9de89c774c0fc93e2809b489a97b7"
Jun 30 15:48:12 mycool mycool-elasticsearch[65952]: time="2020-06-30T15:48:12Z" level=debug msg="unmounted container \"c0cf8da55a1936150298fdf9608ad04154a9de89c774c0fc93e2809b489a97b7\""
Jun 30 15:48:12 mycool mycool-elasticsearch[65952]: time="2020-06-30T15:48:12Z" level=debug msg="ExitCode msg: \"cannot set limits without cgroups: oci runtime error\""
Jun 30 15:48:12 mycool mycool-elasticsearch[65952]: Error: cannot set limits without cgroups: OCI runtime error
Jun 30 15:48:12 mycool systemd[1]: mycool-elasticsearch.service: Control process exited, code=exited, status=126/n/a
Add --pids-limit 0 to your run args
Wait you're cgroups v2 now? I don't have that problem under cgroups v2 rootless. What does cat /proc/self/cgroup show?
--pids-limit 0 does let the containers start, but yea, I booted FCOS into cgroups v2 with rootless here. I have a non-root user mycool that is being used via systemd to launch these containers.
[core@mycool ~]$ cat /proc/self/cgroup
0::/user.slice/user-1000.slice/session-1.scope
I can't get the infra container to start, because you're binding to ports 80 and 443 as non-root...
setting /proc/sys/net/ipv4/ip_unprivileged_port_start
Hmm and there it is
If I remove your --pod it works
- path: /etc/sysctl.d/90-ip-unprivileged-port-start.conf
mode: 0644
contents:
inline: |
net.ipv4.ip_unprivileged_port_start = 0
To allow the pod to bind to those ports.
I think it's because you're using a pod.
When I run this as the user, rootless, I get this:
Pod creates:
/user.slice/user-(uid).slice/user@(uid).service/user.slice/user-libpod_pod_(podid).slice/libpod-(infracid).scope/container
/user.slice/user-(uid).slice/user@(uid).service/user.slice/user-libpod_pod_(podid).slice/libpod-conmon-(infracid).scope
Container (without split) creates:
/user.slice/user-(uid).slice/user@(uid).service/user.slice/user-libpod_pod_(podid).slice/libpod-(escid).scope/container
/user.slice/user-(uid).slice/user@(uid).service/user.slice/user-libpod_pod_(podid).slice/libpod-conmon-(escid).scope
Through Systemd as the user, I get this:
Pod creates:
/user.slice/user-(uid).slice/user@(uid).service/user.slice/user-libpod_pod_(podid).slice/libpod-(infracid).scope/container
/system.slice/mycool-pod.service
Container (without split) creates:
/user.slice/user-(uid).slice/user@(uid).service/user.slice/user-libpod_pod_(podid).slice/libpod-(escid).scope/container
/system.slice/mycool-elasticsearch.service
TLDR, @giuseppe would have to modify/extend another PR to handle pods.
It looks like when a container is spawned in a pod, it assumes its parent slice will be the parent cgroup path. (Which is reasonable) Since pod create doesn't have a --cgroups split option, the pod's conmon is attached to the service cgroup, and the pod's slice is in the user slice, divorced from the service's cgroup.
You can't simultaneously have a service (i.e. elasticsearch) be part of the unit's service, and also the pod's slice. Nor can you have a second systemd unit muck around with the pod's cgroup - that's probably a bad idea.
What's your desired outcome here, @jdoss jdoss?
/system.slice/mycool-pod.service/supervisor -> pod conmon
/system.slice/mycool-pod.service/container -> infra container
/system.slice/mycool-elasticsearch.service/supervisor -> conmon
/system.slice/mycool-elasticsearch.service/container -> ES processes
Then ALL the pod services aren't contained in a slice.
Right now it's
/system.slice/mycool-pod.service -> pod conmon
/(user's systemd service)/user.slice/user-libpod_pod_(podid).slice/libpod-(cid).scope/container -> infra procs
/system.slice/mycool-elasticsearch.service -> conmon
/(user's systemd service)/user.slice/user-libpod_pod_(podid).slice/libpod-(cid).scope/container -> elasticsearch procs
Is this insufficient in some way?
Or maybe we should do this in a more systemd-like way?
i.e. Slice=machines-mycool_pod.slice
Pod
/machines.slice/machines-mycool_pod.slice/mycool-pod.service/supervisor -> pod conmon
/machines.slice/machines-mycool_pod.slice/mycool-pod.service/container -> infra container
/machines.slice/machines-mycool_pod.slice/mycool-elasticsearch.service/supervisor -> conmon
/machines.slice/machines-mycool_pod.slice/mycool-elasticsearch.service/container -> ES processes
Then everything is properly in a parent slice - is this what we'd want split to do with pods?
If so, the --cgroups split would have to be set at the pod create level, and child services would have to know if split is passed, to not inherit the cgroup-parent of the pod.
--pids-limit 0does let the containers start, but yea, I booted FCOS into cgroups v2 with rootless here. I have a non-root usermycoolthat is being used via systemd to launch these containers.[core@mycool ~]$ cat /proc/self/cgroup 0::/user.slice/user-1000.slice/session-1.scope
@giuseppe I don't know what's causing this - but there are times when I need to set --pids-limit 0. It seems like there's a default of pids-limit 2048 coming from somewhere, not the config file and not the command line, and then when crun sees it can't do cgroups with pids-limit, it throws the runtime error.
If you happen to get the cgroup right - i.e. it's something crun can modify and it has a pids controller, then the error isn't present.
@goochjj I am trying to set things up so I can have many pods running under a rootless user/users via systemd units with the User= directive for each stack of applications running as rootless containers inside the pod. Having everything in it's own pod namespace as a rootless user is pretty great so I don't need to juggle ports on each stack of application, just the pod ports. I also like the isolation pods give each application stack deployment.
Since FCOS doesn't support user systemd units via Ignition, I have to set them up in as system units. Which is fine since I like using system units over user units anyways to prevent them from modified by nonroot users.
Right, but all this works for you without --cgroups split, correct? Is there something you're hoping to gain with --cgroups split?
The pids-limit is probably Podman automatically trying to set the maximum available for that rlimit - we should code that to only happen if cgroups are present.
@goochjj I was running FCOS with cgroups v1 up until I saw this thread that introduced --cgroups split so I started down this road of giving it a try with cgroups v2. Trying my old setup that works on FCOS cgroups v1 on FCOS with cgroups v2 doesn't work at all without setting --pids-limit 0.
I am am not trying to gain anything specific by using --cgroups split. I thought it would help provide me with a better setup for my use case.
@mehon I'm unclear on why cgroups aren't present... let alone that default.
It's really annoying, and seems to be cgroupsv1 specific. Should I create this as a separate issue?
I believe that's a requirement forced on us by cgroups v1 not being safe for rootless use, unless I'm greatly misunderstanding?
@mheon I'm fine with that, as long as it doesn't explicitly require me to --pids-limit 0 everything, which it's currently doing.
This code
118cf1fc63 (Daniel J Walsh 2019-09-14 06:21:10 -0400 302) // then ignore the settings. If the caller asked for a
118cf1fc63 (Daniel J Walsh 2019-09-14 06:21:10 -0400 303) // non-default, then try to use it.
118cf1fc63 (Daniel J Walsh 2019-09-14 06:21:10 -0400 304) setPidLimit := true
118cf1fc63 (Daniel J Walsh 2019-09-14 06:21:10 -0400 305) if rootless.IsRootless() {
118cf1fc63 (Daniel J Walsh 2019-09-14 06:21:10 -0400 306) cgroup2, err := cgroups.IsCgroup2UnifiedMode()
118cf1fc63 (Daniel J Walsh 2019-09-14 06:21:10 -0400 307) if err != nil {
118cf1fc63 (Daniel J Walsh 2019-09-14 06:21:10 -0400 308) return nil, err
118cf1fc63 (Daniel J Walsh 2019-09-14 06:21:10 -0400 309) }
4352d58549 (Daniel J Walsh 2020-03-27 10:13:51 -0400 310) if (!cgroup2 || (runtimeConfig != nil && runtimeConfig.Engine.CgroupManager != cconfig.SystemdCgroupsManager)) && config.Resources.PidsLimit == sysinfo.GetDefaultPidsLimit() {
118cf1fc63 (Daniel J Walsh 2019-09-14 06:21:10 -0400 311) setPidLimit = false
118cf1fc63 (Daniel J Walsh 2019-09-14 06:21:10 -0400 312) }
118cf1fc63 (Daniel J Walsh 2019-09-14 06:21:10 -0400 313) }
118cf1fc63 (Daniel J Walsh 2019-09-14 06:21:10 -0400 314) if setPidLimit {
118cf1fc63 (Daniel J Walsh 2019-09-14 06:21:10 -0400 315) g.SetLinuxResourcesPidsLimit(config.Resources.PidsLimit)
118cf1fc63 (Daniel J Walsh 2019-09-14 06:21:10 -0400 316) addedResources = true
118cf1fc63 (Daniel J Walsh 2019-09-14 06:21:10 -0400 317) }
in pkg/spec/spec.go seems to indicate it should already be ignoring the default on cgroups v1. I'm digging.
Cuz this isn't great.
(focal)mrwizard@FocalCG1Dev:~/src/podman
$ podman run --rm -it alpine sh
Error: cannot set limits without cgroups: OCI runtime error
This is definitely a bug. Is this 2.0? pkg/spec is deprecated, we've moved to pkg/specgen/generate - so the offending code likely lives there.
2.1.0-dev. Actually, master, plus my sdnotify
So, sounds like I should create a new issue.
:-D
A friendly reminder that this issue had no activity for 30 days.
Fixed in master.
Most helpful comment
I think we definitely need a single page containing everything we recommend about running containers inside units (best practices, and the reasons for them). I've probably explained why we made the choice for forking vs simple five times at this point; having a single page with a definitive answer on that would be greatly helpful to everyone. We'll need to hash some things out as part of this, especially the use of rootless Podman + root systemd as this issue asks, but even getting the basics written down would be a start.