Origin: Job pod does not terminate within activeDeadlineSeconds spec

Created on 1 Sep 2016 · 9Comments · Source: openshift/origin

When running Job pod where contents generate non-terminating loop and object is specced with an activeDeadlineSeconds value, pod does not terminate within specced seconds or ever.

Version

openshift v1.3.0-alpha.3+bbeb2f3
kubernetes v1.3.0+507d3a7

Steps To Reproduce

From console, add to project the following yml

apiVersion: extensions/v1beta1
kind: Job
metadata:
  name: pi
spec:
  activeDeadlineSeconds: 5
  selector:         
    matchLabels:
      app: pi
  parallelism: 1    
  completions: 1    
  template:         
    metadata:
      name: pi
      labels:
        app: pi
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(987654321000000000000)"]
      restartPolicy: Never

Current Result

Job executes into perpetuity, consuming resources.

Expected Result

Job should terminate after 5 seconds.

Additional Information

When terminating the pod from Cockpit or Console, the Job pod does not restart as designed. This behavior is different than if the activeDeadlineSeconds parameter was omitted (omission of activeDeadlineSeconds in this scenario causes the pod to immediately restart after manual termination).

componenrestapi kinquestion prioritP2

Source

charlesrichard

👍2

Most helpful comment

Thanks for the tip!

mdelaurentis on 2 Aug 2017

🎉2

All 9 comments

Uploaded formatted yml file for ease of consumption (with proper whitespace indentations!!)

pi_job.txt

charlesrichard on 1 Sep 2016

@charlesrichard just use three backticks ``` before and after your code and you'll be fine with nice formatting, see https://guides.github.com/features/mastering-markdown/ for more info :-)

soltysh on 2 Sep 2016

@charlesrichard I was fixing similar issue in k8s (see https://github.com/kubernetes/kubernetes/pull/31973) and I guess this might also be the root cause you're having. Generally jobs state is synchronized every job or underlying pod modification (no matter what kind of). Additionally, every 10 minutes all jobs are resynced and only then (assuming the job stabilized) we can catch that short timeout. Can you verify if in your case that 10 mins does the trick?

soltysh on 2 Sep 2016

I am also seeing this issue with activeDeadlineSeconds: 900. I've left jobs running for well over an hour without them terminating. However, it does not fail to terminate every time, only sometimes. Unfortunately I don't have any solid non-anecdotal information on this.

echlebek on 13 Jan 2017

Was this ever resolved? I'm having a similar issue

AlexMapley on 30 Jun 2017

I'm seeing the same issue. We have activeDeadlineSeconds set to 21600, and there's a job that's been running for far longer than that.

mdelaurentis on 2 Aug 2017

@mdelaurentis I was able to work-around this issue by running a liveliness check on the pod:

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/

However still no resolution to the activeDeadlineSecond issue

AlexMapley on 2 Aug 2017

Thanks for the tip!

mdelaurentis on 2 Aug 2017

🎉2

I'll close this issue in favor of the upstream one: https://github.com/kubernetes/kubernetes/issues/32149 which is being currently worked on. So k8s 1.8 and origin 3.8 should get that fix.

soltysh on 4 Aug 2017

Was this page helpful?

0 / 5 - 0 ratings