Argo: Aggregating the output parameters of parallel steps

Created on 19 May 2018 · 7Comments · Source: argoproj/argo

Is this a BUG REPORT or FEATURE REQUEST?: FEATURE REQUEST

When we use the loop feature:

https://github.com/argoproj/argo/blob/master/examples/loops-param-result.yaml

Or even when we run steps in parallel,

It does not seem possible to aggregate the outputs of all the steps in an array to provide to a next step.

For instance, imagine the following case in an ML context:

The first step of the workflow outputs that a model must be computed with parameter values X=3, X=4 and X=5.
The workflows then starts 3 branches: one branch computing a model for X=3, one branch computing a model for X=4, one branch computing a model for X=5.
The final step of each branch outputs the accuracy of the model (11 for branch 1, 22 for branch 2, 33 for branch 3) and the location of the model (gcs://11, gcs://22, gcs://33)
We then would like to execute a next step that would get as an input

[{X=11, location=gcs://11}, {X=22, location=gcs://22}, {X=33, location=gcs://33},]

The next step can then select the most accurate model and produce this result as an output.

The next step can then deploy this model to a server that can then server predictions.

Would it be possible to support something like this?

Source

vicaire

👍4

Most helpful comment

I agree some sort of aggregation is needed in the DSL. We also need to support output artifacts as well as parameters (see #854). I'm thinking the controller should provide some new variables, which would be json list of the output parameters/artifacts from the previous step. Consider the following step:

    steps:
    - - name: test-linux
        template: cat-os-release
        arguments:
          parameters:
          - name: image
            value: "{{item.image}}"
          - name: tag
            value: "{{item.tag}}"
        withItems:
        - { image: 'debian', tag: '9.1' }
        - { image: 'debian', tag: '8.9' }
        - { image: 'alpine', tag: '3.6' }
        - { image: 'ubuntu', tag: '17.10' }

I am proposing to make {{steps.test-linux.outputs.parameters}} a variable that holds a json list containing all output parameters from the expanded steps. A similar thing could be done for output artifacts (e.g. steps.test-linux.outputs.artifacts.

jessesuen on 22 May 2018

👍5

All 7 comments

@qimingj

vicaire on 19 May 2018

    steps:
    - - name: test-linux
        template: cat-os-release
        arguments:
          parameters:
          - name: image
            value: "{{item.image}}"
          - name: tag
            value: "{{item.tag}}"
        withItems:
        - { image: 'debian', tag: '9.1' }
        - { image: 'debian', tag: '8.9' }
        - { image: 'alpine', tag: '3.6' }
        - { image: 'ubuntu', tag: '17.10' }

jessesuen on 22 May 2018

👍5

@jessesuen I really like that Idea, and I think it would easily address #854 as well

decarboxy on 22 May 2018

@jessesuen how would you provide paths for a list of artifacts? Each artifact loaded into a step requires a full path, do you think the paths can be generated using a variable to an iterator?

Maybe something like this?

input:
artifacts:
- name: myfiles
- path: /tmp/file_{{item.index}}

javierbq on 22 May 2018

@javierbq, yes -- I think something like what you are proposing is needed. I filed #934 to handle artifacts separately, since I have a fix coming for parameters.

jessesuen on 3 Aug 2018

Fixed.

jessesuen on 3 Aug 2018

Is it somehow possible to access the aggregated output parameters by steps within the initial loop? i want to interrogate the status of other steps in the loop at the time any given step in the loop starts