Pipeline: Support Task looping syntax inside Pipelines

Created on 14 Feb 2020  路  26Comments  路  Source: tektoncd/pipeline

Desired behaviour

In some cases we are provided a parameter that is an array of values where we want to run a particular Task against each element of the array, when we don't know in advance the length of the array.

You could also imagine a Pipeline that operates on some value, such as the source code for and destination of, an image, and you want to be able to run the same Pipeline for 1+ images, when you only know at runtime how many images you'll be operating on.

Current behaviour

To support this we currently have to dynamically write a Pipeline that has the appropriate Tasks.

In addition, there are variations that we might want to think of here like parallel vs. sequential, conditional execution, early termination, and likely more.

areroadmap kinfeature

Most helpful comment

Even though there are cases that do not require results, I think it would be good to have support as part of the general solution since it makes it consistent with the Task interface.

Another decision we have to make here is around supporting parallel vs sequential task running. It might make a lot of sense here if we adopted the semantics of map (tasks run in parallel and produce an array of 'results') and reduce (tasks run sequentially accumulating the previous with the current value to create a final result). We might also consider adding a filter (tasks run in parallel to return true/false to return a filtered version of the original array input) but that could be done later.

tasks:
-  name: my-map-task
    map:
       array: [$(params.myArray[*])]
    taskRef:
      name: someParallelTask
    params: # map params have the variables: currentValue, index, and array in scope
      - name: param1
         value: $(currentValue)
      - name: param2
         value: $(index)
      - name: param3
         value: $(array)
    #results is an array of the individual task results
-  name: my-reduce-task
    reduce:
       array: [$(params.myArray[*])]
       initialValue: 0
    taskRef:
      name: someSequentialTask
    params: # reduce params have the variables: accumulator, currentValue, index, and array in scope
      - name: param0
         value: $(accumulator)
      - name: param1
         value: $(currentValue)
      - name: param2
         value: $(index)
      - name: param3
         value: $(array)
      #results is the final result of the last sequentially run task

This is borrowing a lot from what was spec'ed in Javascript Map and Reduce which in turn borrowed heavily from the ideas in Python.

All 26 comments

As long as we don't have Tekton dynamically creating Pipelines I'm all for this.

I'd like to see this more like a Python "list comprehension" vs. a looping construct that allows for parts of Pipelines to execute in loops.

re: comprehension -- yes, that's where my head is at too

I also would like a feature like this but at the Task Step level where the input is an array, and for each element in the array it explodes out into n steps.

To me ansible loops syntax look pretty nice. As very weak, but example, it could be used to run task for multiple cluster resources.

tasks:
  - name: run-deploy
    taskRef:
      name: deploy
    params:
      - name: cluster
        value: {{item.name}}
    for_each:
      - name: testing
      - name: fake_production
      - name: production
    when: !strings.Contains(item.name, "fake")

This type of fan-out tasks will also require syntax / logic for tasks to:

  • access the task result / output resources of any of the tasks
  • express workflow rules like runAfter against the collection of tasks

Conditions may be attached to the task. Do we run the condition N times? Ideally if we had a way to determine whether the condition depends on the param that generates the fan-out - we could run the condition one, but in general we will need to run it once for each instance of the task.

In my use case, I don't need access to the results or outputs, I'm just relying on the success or failure of the task. So maybe there's room for a first iteration of this facility that doesn't have all the bells and whistles, but allows a simple fan out

Even though there are cases that do not require results, I think it would be good to have support as part of the general solution since it makes it consistent with the Task interface.

Another decision we have to make here is around supporting parallel vs sequential task running. It might make a lot of sense here if we adopted the semantics of map (tasks run in parallel and produce an array of 'results') and reduce (tasks run sequentially accumulating the previous with the current value to create a final result). We might also consider adding a filter (tasks run in parallel to return true/false to return a filtered version of the original array input) but that could be done later.

tasks:
-  name: my-map-task
    map:
       array: [$(params.myArray[*])]
    taskRef:
      name: someParallelTask
    params: # map params have the variables: currentValue, index, and array in scope
      - name: param1
         value: $(currentValue)
      - name: param2
         value: $(index)
      - name: param3
         value: $(array)
    #results is an array of the individual task results
-  name: my-reduce-task
    reduce:
       array: [$(params.myArray[*])]
       initialValue: 0
    taskRef:
      name: someSequentialTask
    params: # reduce params have the variables: accumulator, currentValue, index, and array in scope
      - name: param0
         value: $(accumulator)
      - name: param1
         value: $(currentValue)
      - name: param2
         value: $(index)
      - name: param3
         value: $(array)
      #results is the final result of the last sequentially run task

This is borrowing a lot from what was spec'ed in Javascript Map and Reduce which in turn borrowed heavily from the ideas in Python.

Couple questions that spring to mind:

  • What does a Task with a runAfter: [my-map-task] do? Does the runAfter expand into a list of all the map tasks?
  • Similarly for runAfter: [my-reduce-task], I presume this expands into runAfter: [my-reduce-task-N] where N is the final iteration's index?

What I particularly like about this is it seems we could statically generate the entire Pipeline plan from this format. Does that sound right? If so that in turn leads me to think we could explore this in an experimental tool/library external to the controller initially. If we wrote this as a library then it could be reusable in tkn too for, e.g., something like tkn plan my-pipeline-reduce -p myArray=1,3,5 #... to see the complete generated Pipeline before it executes.

Pre-generation only works if tasks are only allowed to iterate on statically-defined arrays and not on the results of previous tasks. This seems like a key decision that has to be made.

Yes that's correct re: runAfter (although it really would be nice to have a higher order framing concept)

The idea was definitely that the structure can be statically generated at "TaskRun" time however the array might come from a previous TaskRun result so the structure cannot always be created before the pipeline run. (and repeating what @GregDritschler also said!)

I believe Argo Workflow supports looping, recursion, etc. It might be useful to explore how it's done there for ideas, although I'm not sure if it satisfies everyones' use cases.

I take back my earlier comment that static Pipeline generation wouldn't allow iterating on task results. If we assume that iterating on task results means "iterate on the results of a previous task which itself was iterated", then both tasks have the same number of iterations. Therefore it would be possible to generate the pipeline tasks and appropriately map the result references.

I came up with a starting list of functional considerations for task iteration, independently of how it may end up being implemented.

  • Where can Task iteration be specified?

    • Pipeline task
    • TaskRun ?
  • Can Tasks themselves iterate on arrays? See #2112.

  • Does the iteration syntax support either an array parameter or an inline list of strings?

  • How does the Pipeline task pass the current item (as a parameter value) to the task? Is there a special "item" variable?

  • Is iterating on a single array sufficient? What if the task needs to iterate through a "table"? Can it step through multiple arrays simultaneously? It might be possible to satisfy this by allowing array indexing.

  • Is an option required to specify sequential vs parallel execution? Should there be a concurrency limit (e.g. 1=sequential, n=throttled at n maximum, -1=no limit)?

  • Does iterating single tasks constrain the possible workflows too much?
    For example, if the user wants to clone and build multiple git repos, the workflow will be:
    clone1, clone2, ... cloneN, build1, build2, ... buildN
    whereas the user might really want:
    clone1, build1, clone2, build2, ... cloneN, buildN.
    (The former might need more resource, e.g. disk space, than the latter.)

  • Can PipelineResources be used with Task iteration? PipelineResources generate internal steps in TaskRuns. There's no way to coordinate them when a Task is iterated so they would behave unpredictably.

  • Can a Condition reference the current item? The problem is that a Condition failure causes that Task as well any any dependent tasks to be skipped. This implies doing task dependencies at the item level. The task skipping proposal in #2127 also needs to be investigated with respect to how it would work with iteration.

  • How are results from an iterated task handled? Is there an array formed from the individual task results? Is that array referenced via the normal pattern with [*] added, $(tasks.x.results.y[*])? Can another task iterate on this array?

  • Does the task timeout specify the maximum time for a single iteration or for all iterations?

  • Does the number of retries specify the number of retries for a single iteration or across all iterations?

I believe Argo Workflow supports looping, recursion, etc. It might be useful to explore how it's done there for ideas, although I'm not sure if it satisfies everyones' use cases.

@jcmcken yes Argo has full featured looping and recursion support - @Tomcli can point to some implementation details there

I take back my earlier comment that static Pipeline generation wouldn't allow iterating on task results. If we assume that iterating on task results means "iterate on the results of a previous task which itself was iterated", then both tasks have the same number of iterations. Therefore it would be possible to generate the pipeline tasks and appropriately map the result references.

I came up with a starting list of functional considerations for task iteration, independently of how it may end up being implemented.

Thanks @GregDritschler for the detailed analysis - is this being discussed in any workgroup meeting on Tekton side?

@animeshsingh There was a discussion this past Monday, mainly around whether there are sufficient CI/CD use cases to justify doing it in Tekton. That's where it stands for now.

I wanted to write about my use case. There is kaniko Task from catalog https://github.com/tektoncd/catalog/tree/v1beta1/kaniko, it's great and works fine but only for one image. What if I have multiple images (which are unknown at the time of building static Pipeline definition) to build from one repository? Then I cannot use it and I have to build images is some custom way in loop.

@GregDritschler above is an exact example - dynamic parallel steps needed to complete a task based on the output of previous step, and used in multiple CI/CD scenarios.

Great to know tekton could support this feature!

It's true out of the control in current tekton pipeline, if the loop parameters are defined by user as input parameters or it's the result of another task in the pipeline.

We may need to support both the parameters with array list, e.g [a, b, c, d] and also array list with dict inside, e.g [{'a': 1, 'b': 2}, {'a': 10, 'b': 20}]. Then only one loop param need to be considerred to support muilt loop params.

I would like to provide you the example of how it defined in an argo yaml file:

array list param:
https://github.com/kubeflow/pipelines/blob/master/sdk/python/tests/compiler/testdata/withparam_global.yaml

array list with dict param:
https://github.com/kubeflow/pipelines/blob/master/sdk/python/tests/compiler/testdata/withparam_global_dict.yaml

Another use case:

I've got a pipeline that uses a git as an input resource, reads a config file and generates a set of tekton-results containing dependent gits. i.e result-1=git-1 , result-2=git-2 etc.. then I run a git-clone task to clone git-1 , git-2 etc.. I've also got a condition that checks if result result-1 is not null before running a git-clone task.. so I'd really like a loop which runs on a set of parameters which is passed to a condition, and if the condition returns true, checks out the corresponding git..

At the moment, I've got separate conditions for each variable and hence separate git-clone tasks for each 'true' condition. Would cut my yaml file down to a few lines from a few hundreds of lines.

Does this issue has any progress?

@fenglixa I think that @GregDritschler has been continuing to iterate on the design he proposed.

@ImJasonH has also been working on a mechanism to make it easier to prototype workflow features via "custom tasks", which we may want to use to prototype this.

And we've been discussing both of these in our api working group if you are interested in joining or following along :D

hey all, I'm reposting the question I asked in slack and was directed to this issue. Totally looking forward to this feature!

curious for suggestions/techniques folks are using to solve this type of scenario: I have a statically defined Pipeline which has 1 Task that invokes Kaniko to build an image. I instantiate the PipelineRun for it via a TriggerTemplate and it works fine. In some cases however I need to build N images. Whats the best way to approach this? Effectively for some PipelineRuns i need the Pipeline that it instantiates to have N Tasks wired up instead of just the one... yet other times I want only the single build to happen. The number of builds per invocation varies based on the webhook that ultimately invokes this and I obviously want to pre-create all the possible variant Pipeline manifests, but I guess I could create the Pipeline dynamically via resourcetemplates, but I see no way to iterate over a number to generate the N Tasks within it.

That said, in addition to the simpler scenario of just being able to iterate to generate N single Tasks within a pipeline. I also have the scenario where I would like to generate (per iteration) a "set/group" of things. (i.e. a Task followed by 2 Conditions etc)

Right now I'm doing all of this by just having a statically defined single Pipeline with a Task and then delegating to code/loops within that single Task to achieve the N things I want to do. This works, but the I'd prefer the concept of a single Task does a single thing, rather than overloading it like this. Especially when viewing it in the dashboard etc, things get lost

@fenglixa I think that @GregDritschler has been continuing to iterate on the design he proposed.

@ImJasonH has also been working on a mechanism to make it easier to prototype workflow features via "custom tasks", which we may want to use to prototype this.

And we've been discussing both of these in our api working group if you are interested in joining or following along :D

Any reason that design document is private, and not public?

Any reason that design document is private, and not public?

This is mostly a google docs issue (aka most of our corporate google account do not allow us to create a fully public document.. at least I never found out how). Those are available if you are part of either tekton-dev/tekton-users mailing list (see here). And this is also why we started the TEP process.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

/remove-lifecycle stale

Was this page helpful?
0 / 5 - 0 ratings