Is this a BUG REPORT or FEATURE REQUEST?:
FEATURE REQUEST
What happened:
Argo workflows TTLSecondsAfterFinished offers the possibility so handle clean up of workflows that have finished execution. Which means workflows in either Succeeded, Failed or Error state. My question about the background to setting Succeeded, Failed, Errorto have the same time to live.
I am currently setting up Kubeflow which realise on Argo for what is called Kubeflow Pipelines. In order to make sure we don't end up with a lot of old finished/failed workflows we use the TTLSecondsAfterFinished (have also set up a mutating admission web hook to always default and check if users are within the allowed time window, happy to contribute this but will make a separate issue. ) However we have been in discussions in my team about if we should have different time to live depending on if the workflow fails or succeed. The main discussions point is that we would like to extend the time to live for workflows in the failed state to allow for debugging during a longer time window. How was the thought when this was set up? Would it be of interest to add a separation between Succeeded, Failed and Error?
What you expected to happen:
FEATURE REQUEST -- nothing to add here
How to reproduce it (as minimally and precisely as possible):
FEATURE REQUEST -- nothing to add here
Anything else we need to know?:
I am happy to make contributions depending on your feedback and thoughts.
Environment:
$ argo version
$ kubectl version -o yaml
Other debugging information (if applicable):
$ argo get <workflowname>
$ kubectl logs <failedpodname> -c init
$ kubectl logs <failedpodname> -c wait
$ kubectl logs -n argo $(kubectl get pods -l app=workflow-controller -n argo -o name)
@sarabala1979 Do you have some feedback on this?
It is a good feature add. we have two options
TTLSecondsAfterSuccess and TTLSecondsAfterFailed TTLSecondsAfterFinished and create new structs TTLStrategy which has these three elements TTLStrategy:
SecondsAfterComplete:
SecondsAfterSuccess:
SecondsAfterFailed:
I would like to add the same functionality in the controller level also. so if workflow doesn't have TTL element then the central configuration will keep off for those workflows. We can add the same struct in workflow-controller-configmap.
Feel free to contribute to this feature.
I will pick it up and start to work on it! Will check and comeback with feedback on which option I pursue.
I would say TTLStrategy with backwards compatibility scales better
yes . I already assigned this issue to your name.
On Fri, Dec 13, 2019 at 1:07 AM Niklas Hansson notifications@github.com
wrote:
Should I be assigned to it in order to work on it?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/argoproj/argo/issues/1827?email_source=notifications&email_token=AICWOVHYT4VEN5HSDLU5ILLQYNGFVA5CNFSM4JY2QBQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGZLUSI#issuecomment-565361225,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AICWOVEPUUA2J726IPZXJULQYNGFVANCNFSM4JY2QBQQ
.
--
Thanks & Regards
Saravanan
After looking further in to the code base and the current solution I have some questions related to Complete vs Succeded vs Failed? As I understand it Failed and Succeeded are sub-groups of Complete. Is this a correct understanding? If so i suggest that we keep it to:
TTLStrategy:
SecondsAfterSuccess:
SecondsAfterFailed:
Since it would not make sense to also have complete since it overlaps with both success and failed.
I will follow you suggestion also implementing it in the workflow-controller-configmap but giving priority to if it is set in the WorkflowSpec. So priority wise it will be WorkflowSpec > workflow-controller-configmap > TTLSecondsAfterFinished(so that we get the backward compatibility).
yes, your understanding is correct. Succeeded and Failed are sub-group of Complete. As I mentioned above TTLSecondsAfterFinished can be deprecated v2.5 but I can be supported next few releases. Ideally, we should have below configuration in workflowspec and configmap. So the user can configure in one place instead of two places. Code and implementation will be easier if everything in one place (Until completely deprecates, we can have a wrapper method to support TTLSecondsAfterFinished and SecondsAfterComplete.
TTLStrategy:
SecondsAfterComplete:
SecondsAfterSuccess:
SecondsAfterFailed:
As Priority, you can start WorkflowSpac first. We can create another enhancement issue for config-map support.
Great @sarabala1979 I will try to fix it all at once. Will start the work tomorrow :)
@simster7 I have also looked at the possibility to add default settings to argo/docs/workflow-controller-configmap.yaml to not make it necessary to set it on each workflow. But I don't understand how the argo/docs/workflow-controller-configmap.yaml is parsed? Could you point me in some directions? Thanks for the help!
@NikeNano Openened https://github.com/argoproj/argo/issues/1923 for this. It is a more generalized version of what you're asking. Feel free to continue the discussion there
This is resolved and I will therefore close this issue.
Most helpful comment
I would say
TTLStrategywith backwards compatibility scales better