Pipeline: Allow Tasks in Pipelines to express a dependency on workspaces 'from' previous Tasks

Created on 17 Aug 2020  路  6Comments  路  Source: tektoncd/pipeline

Feature request

In features like conditions we distinguish between different kinds of dependencies:

1) ordering (runAfter)
2) resource dependencies (when a Task relies on a result or a PipelineResource from another Task).

Depending on a workspace that has been acted on by a previous Task is a case of (2) but since workspaces do not have a "from" or similar syntax we have no way of expressing (2) and it will always look like (1).

The syntax for this could look like this:

  tasks:
    - name: task1
      taskRef:
        name: write-to-workspace
      workspaces:
        - name: output
          workspace: pipeline-ws1
    - name: task2
      taskRef:
        name: read-from-workspace
      workspaces:
        - name: src
          workspace: $(tasks.task1.workspaces.output)

Or we could use a from syntax like we do with PipelineResources:

  tasks:
    - name: task1
      taskRef:
        name: write-to-workspace
      workspaces:
        - name: output
          workspace: pipeline-ws1
    - name: task2
      taskRef:
        name: read-from-workspace
      workspaces:
        - name: src
           from: $(tasks.task1.workspaces.output)

Probably we'd want to use the first syntax to be consistent with the syntax for results. I prefer from because it's slightly more explicit but that's probably a discussion for another issue!

Use case

Any Pipeline where you want one Task to act on a workspace and another to do something with it. Some examples:

  1. TaskA clones from git to a workspace, TaskB builds the cloned code
  2. TaskA builds a binary - but conditionally, only if the files the binary builds from have changed. TaskB runs a test with that binary, but should not run if TaskA was skipped
kinfeature

Most helpful comment

I agree from might make more sense. In the first syntax snippet the user could reasonably assume that only the name of the workspace will be interpolated. But they'd be wrong - the name of the workspace will be injected _and_ there's an implicit runAfter added to the pipelinetask. from gives a clearer signal, at least to me, that there's something else going on over just the string interpolation.

In fact, thinking about it some more I'm not totally convinced that this needs to be a variable. It isn't quite right to think of it as interpolation. Maybe it should just be a dot-notation without the $(...)? from: tasks.task1.workspaces.output. The reason being: what does it mean to interpolate a _different_ variable into from? Like $(tasks.task1.results.foo)? I don't think that would work because Tekton would have to resolve that workspace name halfway through pipeline execution. I'm not sure that any other variable interpolation can be supported in the from field so I'm not sure we should make it "look" like regular variable interpolation. wdyt?

All 6 comments

I agree from might make more sense. In the first syntax snippet the user could reasonably assume that only the name of the workspace will be interpolated. But they'd be wrong - the name of the workspace will be injected _and_ there's an implicit runAfter added to the pipelinetask. from gives a clearer signal, at least to me, that there's something else going on over just the string interpolation.

In fact, thinking about it some more I'm not totally convinced that this needs to be a variable. It isn't quite right to think of it as interpolation. Maybe it should just be a dot-notation without the $(...)? from: tasks.task1.workspaces.output. The reason being: what does it mean to interpolate a _different_ variable into from? Like $(tasks.task1.results.foo)? I don't think that would work because Tekton would have to resolve that workspace name halfway through pipeline execution. I'm not sure that any other variable interpolation can be supported in the from field so I'm not sure we should make it "look" like regular variable interpolation. wdyt?

/assign

I prefer the from option as well, and agree with @sbwsg that it shouldn't be a variable but I think we don't even need the dot-notation in from. I explored some alternatives and found that we can solve this using from that references the pipelinetask that we expect to operate on that workspace before the current pipelinetask -- the syntax would look like:

tasks:
    - name: task1
      taskRef:
        name: write-to-workspace
      workspaces:
        - name: output
          workspace: pipeline-ws1
    - name: task2
      taskRef:
        name: read-from-workspace
      workspaces:
        - name: src
          workspace: pipeline-ws1
          from: task1

I implemented a poc for this design in https://github.com/tektoncd/pipeline/commit/47d216c58a87b8c9a571cb9a380a7fcd6157a999 -- the example in the poc is a modification of this example (removed the runAfter and used from instead)

@sbwsg @bobcatfish please let me know what you think before i write up a tep on this

      workspaces:
        - name: src
          workspace: pipeline-ws1
          from: task1

I think this approach makes sense. It simplifies the field to its core purpose of specifying the resource dependency without any of the variable-looking stuff. lgtm!

I also prefer from but I'd like to introduce a couple other thoughts in case they change your opinion @jerop @sbwsg :

  1. We chose _not_ to use from with results <-- slightly different because in that case we NEED variable interpolation and here we don't
  2. The only other place we have from is with pipelineresources - https://github.com/tektoncd/pipeline/blob/master/docs/pipelines.md#using-the-from-parameter - if we remove pipelineresources then we'd be keeping this "from" syntax ONLY for this workspace feature (maybe thats ok?)

I'm also wondering if we might end up with any potential ambiguity e.g. in the examples above:

      workspaces:
        - name: src
           from: $(tasks.task1.workspaces.output) # note that this is explicitly saying WHICH resource ('output') FROM task1

vs

      workspaces:
        - name: src
          workspace: pipeline-ws1
          from: task1 # this is saying 'pipeline-ws1' from task1, but 'pipeline-ws1' could be bound to multiple workspaces

This might not actually be a problem: i think with pipelineresources it would be because we have an "automagic" volume that we create to share data between pipelineresources when "from" is used, so we need to know exactly which pipelineresource to get the data "from" - with this workspace from feature we aren't (yet???) adding any of this kind of optimization so maybe we don't need that level of detail?

Whether or not this is an issue probably depends on:

  1. Do we see it being useful to have more info (e.g. the exactly workspace binding)
  2. Do we know of any other features we want to add around this (I don't think so?)

I'm trying to come up with use-cases that could lean on a precise mapping from one pipeline task's workspace to another's but I'm struggling to think of any. The edge in the DAG between the two tasks remains the same regardless of which of the two tasks' WorkspacePipelineTaskBindings are connected by their intent.

Was this page helpful?
0 / 5 - 0 ratings