/kind bug
Hello,
Env:
rootless container in user namespace
6/6 containers are running fine
managed by systemd
crash:
100% home full
current:
5 of 6 containers are working again
1 has problems.
infos:
/bin/podman run --rm --name test_service --image-volume=ignore --authfile /home/cadmin/.podman_creds.json registry.example/test/alpine:3.12.0
Error: error creating container storage: the container name "test_service" is already in use by "6e5d7bcf14a33187db1667493281a2a939859954b4a90c54de168243411fada9". You have to remove that container to be able to reuse that name.: that name is already in use
/bin/podman ps -a
no showing any other container than the 5 running.
I would expect a stopped/exited/created one.
also tried --sync
/bin/podman rm -f --storage 6e5d7bcf14a33187db1667493281a2a939859954b4a90c54de168243411fada9
Error: error unmounting container "6e5d7bcf14a33187db1667493281a2a939859954b4a90c54de168243411fada9": layer not known
Debug Level also didnt show any other errors.
Where does podman search for this names ?
Output of podman version:
Version: 2.0.4
API Version: 1
Go Version: go1.13.4
Built: Thu Jan 1 01:00:00 1970
OS/Arch: linux/amd64
Output of podman info --debug:
host:
arch: amd64
buildahVersion: 1.15.0
cgroupVersion: v1
conmon:
package: conmon-2.0.20-1.el8.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.20, commit: 838d2c05b5b53eff3f1cd1a06dbd81d8153feea3'
cpus: 4
distribution:
distribution: '"centos"'
version: "8"
eventLogger: file
hostname: herewasahostname
idMappings:
gidmap:
- container_id: 0
host_id: 1002
size: 1
- container_id: 1
host_id: 231072
size: 65536
uidmap:
- container_id: 0
host_id: 1002
size: 1
- container_id: 1
host_id: 231072
size: 65536
kernel: 4.18.0-193.6.3.el8_2.x86_64
linkmode: dynamic
memFree: 8551780352
memTotal: 16644939776
ociRuntime:
name: runc
package: runc-1.0.0-65.rc10.module_el8.2.0+305+5e198a41.x86_64
path: /usr/bin/runc
version: 'runc version spec: 1.0.1-dev'
os: linux
remoteSocket:
path: /run/user/1002/podman/podman.sock
rootless: true
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-0.4.2-3.git21fdece.module_el8.2.0+305+5e198a41.x86_64
version: |-
slirp4netns version 0.4.2+dev
commit: 21fdece2737dc24ffa3f01a341b8a6854f8b13b4
swapFree: 5000392704
swapTotal: 5003800576
uptime: 7h 12m 54.6s (Approximately 0.29 days)
registries:
search:
- registry.example.de
store:
configFile: /home/user/.config/containers/storage.conf
containerStore:
number: 8
paused: 0
running: 8
stopped: 0
graphDriverName: overlay
graphOptions:
overlay.mount_program:
Executable: /bin/fuse-overlayfs
Package: fuse-overlayfs-0.7.2-5.module_el8.2.0+305+5e198a41.x86_64
Version: |-
fuse-overlayfs: version 0.7.2
FUSE library version 3.2.1
using FUSE kernel interface version 7.26
graphRoot: /home/user/.local/share/containers/storage
graphStatus:
Backing Filesystem: xfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "false"
imageStore:
number: 17
runRoot: /tmp/run-1002
volumePath: /home/user/.local/share/containers/storage/volumes
version:
APIVersion: 1
Built: 0
BuiltTime: Thu Jan 1 01:00:00 1970
GitCommit: ""
GoVersion: go1.13.4
OsArch: linux/amd64
Version: 2.0.4
Package info (e.g. output of rpm -q podman or apt list podman):
podman-2.0.4-1.el8.x86_64
Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
Troubleshooting Guide Yes
i changed the names
podman system reset
Should clean up all containers and images and reset you to initial state.
Currently this isn't a solution for me because this will result in a downtime for all containers.
Well you could remove your libpod database, which will make podman loose the containers, but still have all of the images in storage.
This one seems kind of bizarre - c/storage is complaining that the name is in use, but simultaneously that the associated layer does not exist. @nalind Are we looking at potentially inconsistent c/storage state here?
@rhatdan can you give me some instructions what to do exactly? Shall i remove the .local/share/containers/storage/libpod folder with the datafile in it? "Loose" means i dont see any container with "podman ps" anymore?
PS: I will go for the reset next week if we release a new version.
Fixed it today with a downtime:
mv ~/.local ~/.local_old
I think this is mostly the same as "podman system reset" does.
I removed an entry the id of which matches with the error message from storage/overlay-containers/containers.json manually. Then it seems to work well.
I saw this entry too. But i was too scared to remove it causing a more inconsistent state.
Thanks for the hint.
Hey there! I'm experiencing the same error and manually altering containers.json fixes it. The difference in my case is that this happened after some sort of force-full system reset (power loss).
I think podman ps -a should indeed show containers if they still can be found in:
/run/containers/storage/overlay-containers/bbc080aab414c5812eea011d3af6afaf548cdba9fa1b6b092f03b279f17bc185
/var/lib/containers/storage/overlay-containers/bbc080aab414c5812eea011d3af6afaf548cdba9fa1b6b092f03b279f17bc185
/var/lib/containers/storage/overlay-containers/containers.json
As @mheon said I too think this is an inconsistency. I can't trust the podman ps command, I cannot use the command to resolve the problem and this situation not only occurs on a full disk but also in industry applications where the system state might not always be handled gracefully. I think you understand the problem when someone in a power plant disconnect a device running podman and on next boot randomly containers fail to start because of this inconsistent state.
Usually my systemd units have a ExecStartPre=-/usr/bin/podman rm "whatever" which ensures, that any left-overs are removed before attempting to create a new container. In this case, this command returns that there is no such container. Creating a container with the name "whatever" then fails, because although podman said before that a container with that name does not exist it now complains that the name is already taken by a container with an id which podman ps also does not present.
I know it would be better to file a PR with a fix instead of telling you all that from just my personal experience but honestly I don't know what the actual problem is and from what I read the issue does not receive the attention it maybe should given the implications of it
As of Podman v2.1.1, you can you podman ps --storage to see containers that are not in Podman's database but are present in the storage library. They can then be removed via podman rm --storage on the container ID.
(The rm --storage bit has worked for quite a long time - since 1.6.x, I believe?)
@mheon thanks a lot for your response! Could you elaborate on this? Naturally I'd assume that using podman rm deletes any remains in order to return to a coherent state. podman rm --storage seems to only remove containers from storage and only does so if it's not in the libpod database. Although that makes sense to some degree I'm still missing a single command with the semantics "remove this container such that it's as if it never existed".
In the example of the systemd-based approach I'd like to ensure a state in the ExecStartPre commands that I the actual podman container run|create commands will always work.
I assume the current expected way to do this is to have two ExecStartPre= statements, one with ExecStartPre=-/usr/bin/podman rm <whatever> the next with ExecStartPre=-/usr/bin/podman rm --storage <whatever>?
In my opinion the way this works with these two commands is quite unintuitive and too technical for a user with the distinction libpod db and podman storage.
Can you point me to some rational in a commit or something, there sure is a good explanation to why you decided to do it like this. :)
As this now tends to become a little of-topic I'll reach out via Matrix/IRC to elaborate a bit more about the whole systemd podman stuff.
At this point, if you're seeing this as a consistent problem, something is seriously wrong - we've done a lot to make sure that c/storage is reliably removed at the same time as the container, even in cases of error. We have, however, had some known issues where improperly-written systemd unitfiles can do this; any chance you can post the unit file in question?
There you go:
[Unit]
Description=NodeRed
[Service]
Type=simple
TimeoutStartSec=5m
Environment="NODERED_CONTAINER_VERSION=latest"
Environment="TZ=Europe/Berlin"
EnvironmentFile=-/etc/default/nodered
ExecStartPre=-/usr/bin/podman stop "nodered-runtime"
ExecStartPre=-/usr/bin/podman rm "nodered-runtime"
ExecStartPre=-/usr/bin/podman rm --storage "nodered-runtime"
ExecStartPre=/usr/bin/mkdir -p /var/srv/nodered/
ExecStartPre=-/usr/bin/cp --no-clobber /etc/node-red/initial-flows.json /var/srv/nodered/flows.json
ExecStartPre=/usr/bin/chown 1000:1000 -R /var/srv/nodered/
ExecStartPre=/usr/bin/chcon -Rt container_file_t /var/srv/nodered/
# Group dialout = 18
# Group tty = 5
# Use privileged and network=host for debugging purposes
ExecStart=/usr/bin/podman run \
--name "nodered-runtime" \
--authfile /etc/nodered.auth \
--read-only \
--memory=750M \
--systemd=true \
--privileged \
--group-add=18 \
--group-add=5 \
--user=root \
--network=host \
-v /var/srv/nodered/:/data \
-v /dev:/dev \
-e NODERED_ADMIN_PW=${NODERED_ADMIN_PASSWORD} \
-e TZ=${TZ} \
docker.io/nodered/node-red:${NODERED_CONTAINER_VERSION}
ExecReload=/usr/bin/podman stop "nodered-runtime"
ExecStop=/usr/bin/podman stop "nodered-runtime"
Restart=always
RestartSec=30
[Install]
WantedBy=multi-user.target
RequiredBy=boot-complete.target
I'm noting a few issues immediately:
KillMode=none on unit files launching Podman. We launch several processes after the container exits to clean up after it, and systemd has an annoying habit of shutting down these cleanup processes mid-execution when it wants to stop or restart a unit, which can lead to issues depending on when it was stopped.type=forking and using PID files to manage Podman under systemd. The container is not actually a direct child of Podman (it's a child of a monitor process we launch called Conmon, which double-forks to daemonize before launching the container) and, as part of creating the container, we also leave the cgroup of the systemd unit - so it can't actually track the state of the container itself unless given a PID file.You can use podman generate systemd --new to generate a sample unit file to show our recommended format for these.
A friendly reminder that this issue had no activity for 30 days.
I am going to close this, since @mheon gave you some solutions.