Keda: Apply scaledobject terminates all running jobs

Created on 20 Aug 2020 · 4Comments · Source: kedacore/keda

When applying an update (typically a new container image tag from our CI/CD pipeline) to a ScaledObject with scaleType: job terminates all running jobs.

This does not seem to fit well with the run-to-completion nature of jobs, and we have to make sure deploying new code does not interrupt our long running simulations (the main reason for choosing jobs over deployments).

Expected Behavior

Already started jobs run to completion with the configuration as it was when started.
New jobs triggered (e.g. by new incoming queue messages) should run with the new configuration.

Actual Behavior

Already running jobs and associated pods are terminated and deleted.

Steps to Reproduce the Problem

Define some long running queue triggered job ScaleType:

apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
  name: my-long-running-scaled-job
  namespace: default
spec:
  scaleType: job
  pollingInterval: 10   # Optional. Default: 30 seconds
  maxReplicaCount: 15  # Optional. Default: 100
  minReplicaCount: 0   # Optional. Default: 0
  cooldownPeriod:  30  # Optional. Default: 300 seconds
  jobTargetRef:
    parallelism: 1 # [max number of desired pods](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#controlling-parallelism)
    completions: 1 # [desired number of successfully finished pods](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#controlling-parallelism)
    activeDeadlineSeconds: 900 # Specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it; value must be positive integer
    backoffLimit: 6 # Specifies the number of retries before marking this job failed. Defaults to 6
    template:
      # describes the [job template](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/)
      metadata:
        labels:
          jobgroup: somejobgroupthing
      spec:
        containers:
          - name: busybox-looping
            image: busybox
            command: ['sh', '-c', 'x=1;while [ $x -le 100 ]; do let y=x*2; let z=x*3; let a=x*4; echo $x $y $z $a ; sleep 1; let x=x+1;done']
            env:
              - name: THE_QUEUE
                value: mytestqueuethatijustaddamessageto
              - name: STORAGE_ACCOUNT_CONNECTION_STRING
                valueFrom:
                  secretKeyRef:
                    name: my-secrets
                    key: STORAGE_ACCOUNT_CONNECTION_STRING
        restartPolicy: Never
  triggers:
    - type: azure-queue
      metadata:
        queueName: mytestqueuethatijustaddamessageto
        queueLength: '20' # Optional. Queue length target for HPA. Default: 5 messages
        connection: STORAGE_ACCOUNT_CONNECTION_STRING

save file and apply it to k8s with kubectl apply -f my-busybox-job-test.yaml
push a message to the queue
observe pods being created and start calculating
do a simple update to the YAML, e.g. spec.jobTargetRef.template.spec.containers.image, or command,
The running jobs/pods are terminated

Specifications

KEDA Version: 1.4.1
Platform & Version: *Azure AKS, *
Kubernetes Version: v1.16.9
Scaler(s): job

bug

Source

audunsol

👍2

All 4 comments

We are seeing this as well with our long running jobs and it does not play nice with the continuous delivery nature of our code bases that are using containers being scaled by KEDA.

The other alternative of course is to ensure that all of your batched jobs running via KEDA jobs are using some kind of saga pattern so when they do get interrupted, if they are driven off a queue with a visibility window, then the job will be kicked off again and you can resume close to where you were. However this depends on the nature of the work being done and is not always possible.

nrjohnstone on 24 Aug 2020

👍1

@TsuyoshiUshio Is this behavior the same with 2.0?

tomkerkhove on 24 Aug 2020

I have upgraded to keda-2.0.0-beta on our test cluster now, and as far as I can see, this issue seems to be fixed there. Thanks!

I am happy to close this issue then, unless you would like to address this somehow for 1.x as well (behavior and/or its docs or something).

audunsol on 15 Sep 2020

❤1

Let's close this then indeed, we don't have concrete plans to ship a new 1.x version.

tomkerkhove on 15 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Azure ServiceBus Scaler with Azure Functions failes because of EntityPath in ConnectionString

ThorstenHans · 3Comments

[v2] starting container process caused "exec: \"keda\": executable file not found in $PATH": unknown

mboutet · 3Comments

queueLength Variable

TAnas0 · 5Comments

keda doesn't support env resolution from pod fields

genadyk · 3Comments

[Scaler] Google Pub/Sub

jeffhollan · 5Comments