Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
I'd like to achieve the behavior when failed containers (exited with non-zero exit code) are automatically restarted by Podman in a deployment run by podman play kube. AFAIK, the default option is always (if nothing else is explicitly specified). However, when my container dies (Exited (1) 10 minutes ago), it doesn't get automatically restarted. I tried explicitly adding always option: restartPolicy: Always, but it changed nothing, exited container remains exited without any visible attempts to restart it.
Steps to reproduce the issue:
Create a Kubernetes file with a pod having a container that fails upon the start.
podman play kube
Describe the results you received:
Container fails and doesn't get restarted.
Describe the results you expected:
Podman tries to restart the exited container.
Additional information you deem important (e.g. issue happens only occasionally):
Output of podman version:
Version: 1.6.4
RemoteAPI Version: 1
Go Version: go1.13.4
OS/Arch: linux/amd64
Output of podman info --debug:
debug:
compiler: gc
git commit: ""
go version: go1.13.4
podman version: 1.6.4
host:
BuildahVersion: 1.12.0-dev
CgroupVersion: v1
Conmon:
package: conmon-2.0.6-1.module_el8.2.0+305+5e198a41.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.6, commit: a2b11288060ebd7abd20e0b4eb1a834bbf0aec3e'
Distribution:
distribution: '"centos"'
version: "8"
IDMappings:
gidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
MemFree: 11370590208
MemTotal: 16644763648
OCIRuntime:
name: runc
package: runc-1.0.0-65.rc10.module_el8.2.0+305+5e198a41.x86_64
path: /usr/bin/runc
version: 'runc version spec: 1.0.1-dev'
SwapFree: 0
SwapTotal: 0
arch: amd64
cpus: 4
eventlogger: journald
hostname: podman.novalocal
kernel: 4.18.0-193.14.2.el8_2.x86_64
os: linux
rootless: true
slirp4netns:
Executable: /usr/bin/slirp4netns
Package: slirp4netns-0.4.2-3.git21fdece.module_el8.2.0+305+5e198a41.x86_64
Version: |-
slirp4netns version 0.4.2+dev
commit: 21fdece2737dc24ffa3f01a341b8a6854f8b13b4
uptime: 219h 25m 40.99s (Approximately 9.12 days)
registries:
blocked: null
insecure: null
search:
- registry.access.redhat.com
- registry.redhat.io
- docker.io
store:
ConfigFile: /home/centos/.config/containers/storage.conf
ContainerStore:
number: 21
GraphDriverName: overlay
GraphOptions:
overlay.mount_program:
Executable: /usr/bin/fuse-overlayfs
Package: fuse-overlayfs-0.7.2-5.module_el8.2.0+305+5e198a41.x86_64
Version: |-
fuse-overlayfs: version 0.7.2
FUSE library version 3.2.1
using FUSE kernel interface version 7.26
GraphRoot: /home/centos/.local/share/containers/storage
GraphStatus:
Backing Filesystem: xfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "false"
ImageStore:
number: 11
RunRoot: /run/user/1000
VolumePath: /home/centos/.local/share/containers/storage/volumes
Package info (e.g. output of rpm -q podman or apt list podman):
podman-1.6.4-10.module_el8.2.0+305+5e198a41.x86_64
Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
Yes
Additional environment details (AWS, VirtualBox, physical, etc.):
CentOS 8 VM
Thanks for opening the issue. I am currently working on https://github.com/containers/podman/issues/7645 and consider this to be same issue. I am closing this one here. Feel free to move over to https://github.com/containers/podman/issues/7645.
Sorry. My brain tricked me to mix "restart" and "pull" policy. The issues are different, so I am going to reopen :)
In the meantime, I'll grab a fresh coffee.
have you try the latest version, such as v2.0.6?
Just tried this:
Version: 2.0.6
API Version: 1
Go Version: go1.13.4
Built: Tue Sep 8 19:37:13 2020
OS/Arch: linux/amd64
Same problem, container failed, but was not restarted:
...
"State": {
"OciVersion": "1.0.2-dev",
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 1,
"Error": "",
"StartedAt": "2020-09-16T17:05:52.611114598Z",
"FinishedAt": "2020-09-16T17:05:52.827035449Z",
"Healthcheck": {
"Status": "",
"FailingStreak": 0,
"Log": null
}
},
...
P.S. You might consider clarifying you issue-template a bit, because Have you tested with the latest version of Podman, IMHO, is not quite clear (I thought that you are referring to the latest version available in CentOS 8 repo, which is far behind the actual latest version).
Just to add some more context to it: my container fails at the first attempt of the initialization. But if, in a couple of seconds, I manually do the podman restart ... - it boots up normally. So it is possible to run this container, but Podman doesn't seem to event attempt restarting.
To verify the policy is not being set correctly, can you provide the output of podman inspect $CTRNAME | jq '.[0].HostConfig.RestartPolicy'
$ podman inspect tsd-db | jq '.[0].HostConfig.RestartPolicy'
{
"Name": "",
"MaximumRetryCount": 0
}
I don't see us dealing with this in the Yaml files at all.
Anyone interested in fixing this issue?
@ashley-cui PTAL if no one in the community looks into it.
Most helpful comment
@ashley-cui PTAL if no one in the community looks into it.