When running Job pod where contents generate non-terminating loop and object is specced with an activeDeadlineSeconds value, pod does not terminate within specced seconds or ever.
openshift v1.3.0-alpha.3+bbeb2f3
kubernetes v1.3.0+507d3a7
apiVersion: extensions/v1beta1
kind: Job
metadata:
name: pi
spec:
activeDeadlineSeconds: 5
selector:
matchLabels:
app: pi
parallelism: 1
completions: 1
template:
metadata:
name: pi
labels:
app: pi
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(987654321000000000000)"]
restartPolicy: Never
Job executes into perpetuity, consuming resources.
Job should terminate after 5 seconds.
When terminating the pod from Cockpit or Console, the Job pod does not restart as designed. This behavior is different than if the activeDeadlineSeconds parameter was omitted (omission of activeDeadlineSeconds in this scenario causes the pod to immediately restart after manual termination).
Uploaded formatted yml file for ease of consumption (with proper whitespace indentations!!)
@charlesrichard just use three backticks ``` before and after your code and you'll be fine with nice formatting, see https://guides.github.com/features/mastering-markdown/ for more info :-)
@charlesrichard I was fixing similar issue in k8s (see https://github.com/kubernetes/kubernetes/pull/31973) and I guess this might also be the root cause you're having. Generally jobs state is synchronized every job or underlying pod modification (no matter what kind of). Additionally, every 10 minutes all jobs are resynced and only then (assuming the job stabilized) we can catch that short timeout. Can you verify if in your case that 10 mins does the trick?
I am also seeing this issue with activeDeadlineSeconds: 900. I've left jobs running for well over an hour without them terminating. However, it does not fail to terminate every time, only sometimes. Unfortunately I don't have any solid non-anecdotal information on this.
Was this ever resolved? I'm having a similar issue
I'm seeing the same issue. We have activeDeadlineSeconds set to 21600, and there's a job that's been running for far longer than that.
@mdelaurentis I was able to work-around this issue by running a liveliness check on the pod:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
However still no resolution to the activeDeadlineSecond issue
Thanks for the tip!
I'll close this issue in favor of the upstream one: https://github.com/kubernetes/kubernetes/issues/32149 which is being currently worked on. So k8s 1.8 and origin 3.8 should get that fix.
Most helpful comment
Thanks for the tip!