YAML by default does not allow one to externalize portions of the file to other files. But as several of these StackOverflow answers describe, some YAML interpreters, such as Symfony and PyYAML, allow one to do this at runtime. There are a couple compelling use cases to argue that Argo's YAML interpreter should also allow this in some form.
I often want to run workflows which involve several potentially complicated Python script steps. When developing these workflows, the Python code must be supplied in the script.source field in the YAML, which is just raw text. I would like to be able to write this code in the friendly confines of my IDE, rather than in the middle of the Argo script.
One option around this, of course, is to save the Python code into my image and use a container instead. However, this means that every time I edit this Python code, I must rebuild the image, which is way too slow to be easily testable.
Another option is to develop the code in a separate Python file and copy it into the workflow at runtime. But adding this extra step every time I iterate is also time-consuming, especially with multiple interacting script steps that I'm actively developing.
It would be much simpler if Argo could do the copying for me at runtime.
Some of my workflows are modular, with multiple noninteracting steps that should be composed together in series. Different runs of such a workflow could include different subsets of these steps, and I want to be able to stitch them together at runtime rather than making multiple copies of those steps in different workflows.
WorkflowTemplates offer one potential solution to this -- I could make each of the steps into a template, and my overall workflow could just call all the templates. But this would mean that every change I make to one of these steps would involve rebuilding all of the templates, which is also too slow to be practical.
Another solution would be to use a global input parameter which is just a tuple of all the steps I want to run and add flags like this throughout:
when: "<name of step> IN {{workflow.parameters.steps}}"
This solution is okay (it's my current workaround), but requires that the steps only happen in one particular order. It also results in long configuration files that one must scroll through to read what's happening, albeit not as long as before WorkflowTemplates.
Motivated by the PyYAML solution, Argo could interpret !include to import the (relative) file. My script steps would look like this:
- name: script
script:
image: python:3.7
command: [python]
source: |
!include script.py
and composing multiple steps would look like this:
templates:
- name: plan
steps:
- - !include step1.yaml
- - !include step2a.yaml
- !include step2b.yaml
On top of the ease of modification, both of these formats would greatly improve the readability of Argo scripts.
Message from the maintainers:
If you wish to see this enhancement implemented please add a 馃憤 reaction to this issue! We often sort issues this way to know what to prioritize.
Perhaps do this as part of a bash script before submitting? Something such as:
cat workflow.yaml | sed 's/!include script.py/`cat script.py`/g' | argo submit -
_Note this should be treated as pseudo-code, didn't test the syntax_
I may be wrong, but I am not sure we would want to support a feature like this.
I don't think that idea would work. In particular, just using a simple sed wouldn't keep the indentation structure of the YAML, so the result of any script with multiple lines would come out like this:
- name: script
script:
image: python:3.7
command: [python]
source: |
print("hello world")
print("goodbye world")
It would also be ideal to retain other features of the standard CLI, which this approach would lose, like these:
argo submit WF1 WF2argo lint DIRInstead of include, would it be better to load source from ConfigMap, which can be generated/updated by eg. Kustomize's configmapGenerator from source code file?
This is a great request. I currently have a custom transformer script that loads in some python script as script tags in my workflows.
Most helpful comment
Instead of include, would it be better to load source from ConfigMap, which can be generated/updated by eg. Kustomize's configmapGenerator from source code file?