Is this a BUG REPORT or FEATURE REQUEST?: I'm not sure
What happened:
If a workflow that has an onExit step is terminated, the onExit step is not run.
What you expected to happen:
I would like a way to force an exit handler to run even if the workflow is terminated. We create test infrastructure and would like to make sure sure it gets torn down whether the workflow exited normally, errored out, or was manually terminated.
If this is the expected behavior of onExit (not clear from the example doc), an onTerm variant that does run even if a workflow is terminated would be very useful.
How to reproduce it (as minimally and precisely as possible):
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: term-test-
spec:
entrypoint: term-test
# Always run cleanup
onExit: cleanup
templates:
- name: term-test
steps:
- - name: sleep
template: sleep
- name: sleep
container:
image: alpine:3.9
command: ["/bin/sh", "-c"]
args: ["sleep 600"]
- name: cleanup
container:
image: alpine:3.9
command: ["/bin/sh", "-c"]
args: ["sleep 60; echo bye > /tmp/bye"]
outputs:
artifacts:
- name: bye
path: /tmp/bye
$ argo submit term-test.yaml
Name: term-test-fw552
Namespace: default
ServiceAccount: default
Status: Pending
Created: Thu Jun 06 11:51:13 -0700 (now)
$ argo terminate term-test-fw552
Workflow 'term-test-fw552' terminated
$ argo watch term-test-fw552
Name: term-test-fw552
Namespace: default
ServiceAccount: default
Status: Failed (Terminated)
Message: terminated
Created: Thu Jun 06 11:51:13 -0700 (12 seconds ago)
Started: Thu Jun 06 11:51:13 -0700 (12 seconds ago)
Finished: Thu Jun 06 11:51:23 -0700 (2 seconds ago)
Duration: 10 seconds
STEP PODNAME DURATION MESSAGE
✖ term-test-fw552 child 'term-test-fw552-1901370753' failed
└---✖ sleep term-test-fw552-1901370753 9s terminated
✖ term-test-fw552.onExit term-test-fw552-1268251629 1s terminated
Anything else we need to know?:
Environment:
$ argo version
argo: v2.3.0
BuildDate: 2019-05-20T22:11:23Z
GitCommit: 88fcc70dcf6e60697e6716edc7464a403c49b27e
GitTreeState: clean
GitTag: v2.3.0
GoVersion: go1.11.5
Compiler: gc
Platform: darwin/amd64
$ kubectl version -o yaml
clientVersion:
buildDate: "2018-12-13T19:44:19Z"
compiler: gc
gitCommit: eec55b9ba98609a46fee712359c7b5b365bdd920
gitTreeState: clean
gitVersion: v1.13.1
goVersion: go1.11.2
major: "1"
minor: "13"
platform: darwin/amd64
serverVersion:
buildDate: "2019-04-12T22:59:24Z"
compiler: gc
gitCommit: 8d9b8641e72cf7c96efa61421e87f96387242ba1
gitTreeState: clean
gitVersion: v1.12.7-gke.10
goVersion: go1.10.8b4
major: "1"
minor: 12+
platform: linux/amd64
Other debugging information (if applicable):
$ argo get <workflowname>
$ kubectl logs <failedpodname> -c init
$ kubectl logs <failedpodname> -c wait
$ kubectl logs -n kube-system $(kubectl get pods -l app=workflow-controller -n kube-system -o name)
+1 Any updates on this one?
This makes sense. Is it useful if we have onFailure, onSuccess and onComplete (took the names from scala)?
OnExit step is behaving as expected. OnExit is a special step which will execute after all Step or DAG tasks completed with regardless of Succeed or Failed. In the above case, User is forcefully terminating Workflow that means killing process. So, it is terminating all remaining steps.
@sarabala1979 So should I open a separate issue to request exit handler support for terminated workflows?
@kbruner
We'd also find an onTerminate feature useful. We have some tear-down we'd like to do if a workflow is forcefully stopped in the middle of execution.
Should onTerminate be an another template?
Or Can there be a single onExit template for workflow like today and onTerminate is like boolean.
onTerminate: true will trigger the workflow exit handler when workflow is terminated.
Default will be false
Is there any update for this issue? onTerminate feature is critical for some specific cases.
+1 for onTerminate feature
+1 too, using argo for my Continuous Deployment and wanna propose to the users to Abort the Deployment without implementing any complex logic.
What is the status of this issue? I would also be very happy with the onTerminate feature.
I've been thinking about this request for a bit this morning.
It seems from how it's implemented in the code that "Terminate" is supposed to act as a non-graceful shutdown of the Workflow: if it is triggered for whatever reason, it will shut things down immediately. It is easy to imagine scenarios where a user would want such functionality to stop execution without any regard.
It therefore seems a bit counter-intuitive (both in principle and in its would-be implementation) that new nodes be executed after a user requests an immediate shutdown. Perhaps the reason the Workflow was terminated had nothing to do with the Workflow itself, but with the cluster it was running on. In such a scenario, scheduling more nodes would defeat the purpose of the executing "Terminate".
Perhaps a better solution would be a softer, more graceful shutdown that will still run the onExit handler when activated. "Stop" would be the natural verb for this.
For those interested in this issue, I've opened #2352
@simster7 I think this PR only answers partially to the need.
IMO, terminating a workflow is not a failure/success. Therefore, we should be able to run different steps than the ones in onExit.
Let me give you my example: I am using Argo for my CICD. I'd like to give the user the possibility to use an Abort button. But when using this button, the status should not be Failed but Aborted.
Does it make sense?
Most helpful comment
+1 for onTerminate feature