Pipeline: Support for a "volume resource"-like Task.spec field

Created on 18 Oct 2019 · 25Comments · Source: tektoncd/pipeline

A "volume resource" like abstraction is valuable as it lets a Task specify the requirement for a work folder as well as a description of its purpose without declaring the specifics of the volume backing it.

In the resources meeting -- https://docs.google.com/document/d/1p6HJt-QMvqegykkob9qlb7aWu-FMX7ogy7UPh9ICzaE/edit#heading=h.vf9n4tewyadq -- we discussed different approaches for supporting the concept and decided the best approach was to introduce a volumes or similarly named Task.spec field as the best balance of having it present in the beta without forcing PipelineResources to also be present.

It turns out that volumes is actually a poor name as it both conflicts with the same field in a Pod spec and also does not capture the fact that it's actually closer to a volumeMount. The concept is for a named folder that is (at least by default) mounted as a sub-directory of /workspace so for now I'll use workspaces as the field name instead of volumes.

Each workspaces item has minimally just a name and optional description. For example:

workspaces:
  - name: first
    description: The first folder this task needs
  - name: second
    description: The second folder this task needs

In all containers in the resulting Pod /workspace/first and /workspace/second would be volumeMounts assigned by the TaskRun pointing at a particular volume and optional subPath in that volume. If a TaskRun does not provide a valid volumeMount that maps to /workspace/{workspaces.name} it should fail validation.

Other considerations to think about include: reserved names, optionality, alternate mountPaths, follow-on syntax in Pipeline, PipelineRun, and TaskRun.

areapi arerelease kinfeature

Source

skaegi

Most helpful comment

I also think once we have some POC's to try out (for this proposal and for FileSet) it'll help with our discussions! :D

bobcatfish on 11 Nov 2019

👍2

All 25 comments

+1 for an overridable mountPath.

sbwsg on 18 Oct 2019

👍1

Thanks for this.

If I understand correctly this would provide a logical name to be used in tasks for a path on a pre-provisioned PVC. I think it would be helpful to see an example task, how it would look like in a complete example.

Unfortunately I missed the meeting, and I would like to contribute my 2¢.

As far as I understand the main issue with the current solution for sharing data between tasks is that PVC can be expensive to provision (in terms of time it takes to provision and also quota limits). The benefit of the proposed volume field would be that the PVC can be pre-provisioned, which takes care of the issue of provisioning time, and also the same PVC could be partitioned and used by multiple Tasks / Pipelines, which helps with quota issues.

I think the current solution we have for artifacts PVC is quite nice, as it's mostly transparent to end-users, but it has three main limitations:

It is only accessible via output resources. It would be nice to have the ability of mount one or more paths on the artifact PVC, similar to what has been proposed here. Output resource could use a reserved path on the PVC, e.g. [pipelinerun-uuid]/tekton
It is not possible to feed a preprovisioned PVC to the pipeline run. That could be achieved via a dedicated spec field in the pipelinerun. Namespacing the storage paths using pipelinerun-uuid and making that transparent to users would allow for re-use of the same storage for multiple pipelineruns.
it is only defined for taskruns, which means tasks have no mean to provide persisting artifacts

All this is to say that I believe we could be on top of the existing artifac pvc solution to achieve something very close to what is proposed here, which would have the extra benefit of being backward compatible, i.e. it would keep the auto-provisioning feature for users that want that.

afrittoli on 20 Oct 2019

@afrittoli -- yes I agree as much as we can that we want to build on top of the existing artifact pvc concept. One of the main goals for this proposal is just to add just enough syntax to make use of the artifact pvc (or something like it) clear.

So I've take the original proposal a little further and hopefully closer to implementable. So... to start things off I want to provide a bit more on the workspace types this proposal adds.

  name [string]
  description [string] (optional)
  mountPath [string] (defaults to /workspace/{name})
  optional [boolean] (defaults to false)

WorkspaceMountBinding
  name [string]
  volumeName [string]
  volumeSubPath [string] (optional)

WorkspaceDevice
  name [string]
  description [string] (optional)
  devicePath [string] (defaults to /workspace/{name})
  optional [boolean] (defaults to false)

WorkspaceDeviceBinding
  name [string]
  volumeName [string]

WorkspaceVolume
  name [string]
  description [string] (optional)
  configMap [ConfigMapVolumeSource]
  persistentVolumeClaim [PersistentVoluemClaimVolumeSource]
  persistentVolumeClaimTemplate [PersistentVolumeClaim]
  secret: [SecretVolumeSource]
  # [exclusive] configMap, persistentVolumeClaim, persistentVolumeClaimTemplate, secret
  # [default] persistentVolumeClaimTemplate populated from current Tekton artifact pvc

-- and now modifications to existing types...
Task
  workspaceMounts [array of WorkspaceMount]
  workspaceDevices [array of WorkspaceDevice]

TaskRun
  workspaceMounts [array of WorkspaceMountBinding]
  workspaceDevices [array of WorkspaceDeviceBinding]
  workspaceVolumes [array of WorkspaceVolume]

Pipeline
  tasks[*].workspaceMounts [array of WorkspaceMountBinding]
  tasks[*].workspaceDevices [array of WorkspaceDeviceBinding]
  workspaceVolumes [array of WorkspaceVolume]

PipelineRun
  workspaceVolumes [array of WorkspaceVolume]

Here's a quick example consisting of: a task that writes a message, a task that reads a message, and a pipeline that ties the two together. (Note: I also have flattened Task params) This does not use all the bells and whistles that the above types offer, but hopefully gets the point across.

apiVersion: tekton.dev/v1alpha1
kind: Task
metadata:
  name: write-task
spec:
  params:
    - name: message
      description: the message
  steps:
    - name: message-mount-write
      image: alpine
      env:
        - name: message
          value: $(params.message)
      command: ["/bin/sh", "-c"]
      args:
        - echo $message > /workspace/messages/message;    
  workspaceMounts:
    - name: messages
      description: the folder where we write the message to
---
apiVersion: tekton.dev/v1alpha1
kind: Task
metadata:
  name: read-task
spec:
  steps:
    - name: message-mount-read
      image: alpine
      command: ["/bin/sh", "-c"]
      args:
        - cat /workspace/messages/message;
  workspaceMounts:
    - name: messages
      description: the folder where we read the message from
---
apiVersion: tekton.dev/v1alpha1
kind: Pipeline
metadata:
  name: pipeline
spec:
  params:
    - name: message
      description: the message
  tasks:
    - name: pipeline-write-task
      taskRef:
        name: write-task
      params:
        - name: message
          value: $(params.message)
      workspaceMounts:
        - name: messages
          volumeName: pipeline-volume
    - name: pipeline-read-task
      runAfter: [pipeline-write-task]
      taskRef:
        name: read-task
      workspaceMounts:
        - name: messages
          volumeName: pipeline-volume
          readOnly: true
  workspaceVolumes:
    - name: pipeline-volume
      description: The volume where we will write and then read the message
      # Note: no volume type is provided so we will create a PVC using Tekton defaults

skaegi on 21 Oct 2019

Thanks for the extra specification work!
This looks like a lot to support before beta - perhaps we could have an incremental plan where we aim for a subset of this at first?

Is this meant to fully replace the VolumePipelineResource?

afrittoli on 21 Oct 2019

We talked about this briefly in the Beta meeting but wanted to leave a note here as well - it looks like the shape of WorkspaceMount and WorkspaceDevice are very similar (only the *Path param names differ). Wonder if we can de-dupe those somehow - would be cool if a Task didn't have to care about whether a mount is a Volume or Device.

sbwsg on 21 Oct 2019

@afrittoli yes this is meant to fully replace VolumePipelineResource . There was concern that just saying that a particular PipelineResource was beta but only for type: volume or that the type field was alpha and the default type was VolumeResource was confusing. The hope though is that at least part of the VolumePipelineResource PR would still be useful for the eventual implementation.

skaegi on 21 Oct 2019

@sbwsg I had the same thought when spec'ing this out but part of my reasoning is that they are wrapping different concepts at the Pod level. In particular VolumeDevice is still relatively new and might add new fields that we might want to expose.
The other idea I had wondered about was to altogether defer on VolumeDevice support however it seemed like the work to prevent volumeMode: Block PVs wasn't worth it relative to just offering support (and yes easy to say at the spec level). I'm also sensitive to MLOps use-cases where one of the most common operations is ETL and block device access would be useful.

skaegi on 21 Oct 2019

Thanks for the detailed proposal! A couple notes:

In all your examples tasks only use one workspace mount. Are there any real-world use cases we can envision where it might need multiple?
I don't love the name WorkspaceDeviceBinding or WorkspaceDevice, what about just WorkspaceBinding and Workspace? Device isn't a formal Tekton or Kubernetes concept, and right now it just binds workspaces to volumes.
Would it be possible to allow a TaskRun/PipelineRun to optionally not pass in these mappings? We could automatically provision "ephemeral" ones in that case.

dlorenc on 22 Oct 2019

Thanks @dlorenc

So far I have only ever needed one dynamically created shared volume like the artifact pvc. On the other hand I have many real world cases where I want to share a pre-existing PVC, ConfigMap, or Secret that exist outside of the PipelineRun.
WorkspaceDevice is meant to model a block storage VolumeDevice. I had originally started just with Workspaces but then in looking at the Container spec thought we should align with the notion of volumeMounts and volumeDevices as that's what they will map into when we generate the pod definition.
I'd want to think about the PipelineRun case a bit more but for stand-alone TaskRuns I think it would definitely makes sense for the controller to allocate ephemeral storage (or maybe re-use the workspace EmptyDir volume) to match the needs of unbound WorkspaceMounts.

skaegi on 22 Oct 2019

So... what if we pared things down to really just mapping the artifact pvc into the workspace, and deferred everything else to a VolumePipelineResource when PipelineResources are ready...

  name [string]
  description [string] (optional)
  mountPath [string] (defaults to /workspace/{name})

WorkspaceBinding
  name [string]
  subPath [string] (optional)

-- and now modifications to existing types...
Task
  workspaces [array of Workspace]

TaskRun
  workspaces [array of WorkspaceBinding]

Pipeline
  tasks[*].workspaces [array of WorkspaceBinding]

Our example becomes...

apiVersion: tekton.dev/v1alpha1
kind: Task
metadata:
  name: write-task
spec:
  params:
    - name: message
      description: the message
  steps:
    - name: message-mount-write
      image: alpine
      env:
        - name: message
          value: $(params.message)
      command: ["/bin/sh", "-c"]
      args:
        - echo $message > /workspace/messages/message;    
  workspaces:
    - name: messages
      description: the folder where we write the message to
---
apiVersion: tekton.dev/v1alpha1
kind: Task
metadata:
  name: read-task
spec:
  steps:
    - name: message-mount-read
      image: alpine
      command: ["/bin/sh", "-c"]
      args:
        - cat /workspace/messages/message;
  workspaces:
    - name: messages
      description: the folder where we read the message from
---
apiVersion: tekton.dev/v1alpha1
kind: Pipeline
metadata:
  name: pipeline
spec:
  params:
    - name: message
      description: the message
  tasks:
    - name: pipeline-write-task
      taskRef:
        name: write-task
      params:
        - name: message
          value: $(params.message)
      workspaces:
        - name: messages
    - name: pipeline-read-task
      runAfter: [pipeline-write-task]
      taskRef:
        name: read-task
      workspaces:
        - name: messages
          readOnly: true

To prevent unnecessary artifact pvc creation if no Pipeline tasks[*].workspaces are specified then we don't create it. If a WorkspaceBinding is not provided ephemeral storage is allocated (or we reuse the /workspace EmptyDir)

skaegi on 24 Oct 2019

I think this is not bad and probably covers enough of the most common use-cases to be sufficient for beta. I currently use a mixture of params and PodSpec.volumes to handle configMap and secret sharing and can wait for a VolumePipelineResource without forcing the issue. WDYT?

skaegi on 24 Oct 2019

So to be clear I understand the proposal:

Add a workspaces field to Task specs. This is a way of saying that the Task needs a location that will be persisted outside of the lifecycle of the task.
Add a binding field to TaskRun specs. For Tasks that have workspaces, this is the way of binding that workspace to a real location, like a volume/subpath combo.

This sgtm. A few questions:

Do workspace bindings have to point to existing PVCs, or is there any mechanism to create/destroy them?
Can TaskRuns "opt-out" of specifying a PVC and letting the data disappear in some way? Like using an emptydir?

dlorenc on 29 Oct 2019

I thought I was following this pretty well until https://github.com/tektoncd/pipeline/issues/1438#issuecomment-545913358 but I think that might be b/c I was viewing this proposal maybe differently than you @skaegi @dlorenc

I was viewing this as a way to provide Volume info at runtime instead of in the Task spec (which seems really important to me in hindsight!!)
I think you are viewing this as a replacement for the VolumeResource

In my mind we don't actually _need_ the VolumeResource b/c today you can get all the functionality it provided (minus automatic PVC creation) by using:

The volumeMount field in a step and the volume field in a Task (e.g. like this https://github.com/tektoncd/pipeline/blob/master/examples/taskruns/secret-volume.yaml)
If you don't want to provide it all at runtime, you can actually use the podTemplate

BUT:

You can't specify a podTemplate in a PipelineRun
None of this is validated (if you don't wire it up properly Tekton itself doesnt do anything to help you)

I'm a bit confused by why in https://github.com/tektoncd/pipeline/issues/1438#issuecomment-545913358 @skaegi you want to "mapping the artifact pvc into the workspace" - is it important that it's the artifact PVC (which I take to mean the PVC tekton automatically creates when you use from (or cough all the time #1007 :sob: )), or is it that you want to be able to feed a PVC into a Run and wire it through your Tasks?

I would prefer a solution where users can provide their own PVC or whatever volume info they want vs. trying to surface and make available what imo is an implementation detail of output -> input linking (and folks might be using something other than PVCs, we currently allow for GCS upload/download instead).

Anyway we can talk more in the working group but long story short I like the way that https://github.com/tektoncd/pipeline/issues/1438#issuecomment-545913358 looks (and how much simpler it is than https://github.com/tektoncd/pipeline/issues/1438#issuecomment-544313339), but I'm not understanding how WorkspaceBinding can be quite so simple. I thought it would need to allow the Run to specify both Volumes and VolumeMounts? (So my suggestion would be to add Volumes and VolumeMounts to Workspace binding :D )

bobcatfish on 30 Oct 2019

Much clearer after your description in the working group @skaegi ! Haha I should have just waited before writing all those words, I had a feeling XD

bobcatfish on 30 Oct 2019

After my exploration in #1508, @skaegi and I discussed and

This is the latest iteration of what we discussed (similar to https://github.com/tektoncd/pipeline/issues/1438#issuecomment-544313339 but without devices):

Meta-type descriptions...
-----------------------------
Workspace
  name [string]
  description [string] (optional)
  mountPath [string] (defaults to /workspace/{name})

WorkspaceBinding
  name [string]
  readOnly [boolean] (defaults to false)
  volumeName [string]
  volumeSubPath [string] (optional)

WorkspaceVolume
  name [string]
  description [string] (optional)
  persistentVolumeClaim [PersistentVoluemClaimVolumeSource]

-- and now modifications to existing types...
Task
  workspaces [array of Workspace]

TaskRun
  workspaces [array of WorkspaceBinding]
  volumes (or workspaceVolumes TBD) [array of WorkspaceVolume]

Pipeline
  tasks[*].workspaces [array of WorkspaceBinding]
  volumes (or workspaceVolumes TBD) [array of WorkspaceVolume] # simon i think this would be something else, like just a list of volume names, we wouldnt know what actual volumes to provide until runtime

PipelineRun
  volumes (or workspaceVolumes TBD) [array of WorkspaceVolume]

Main differences (that I remember) from #1508:

Ability to provide subpaths in Pipeline
Using the term volume as well as workspace (in #1508 everything is workspace)

bobcatfish on 5 Nov 2019

@skaegi I was trying to explore your usecase for providing subPath in Pipeline, and I'm not exactly sure how you're doing it but I came up with a contrived scenario: a Pipeline where 2 Tasks write files to subdirectories on a PVC, and a 3rd Task reads those files from the root of the PVC.

I came up with a few (buggy, typo ridden) examples:

This is what the Pipeline would look like with the last design we discussed (aka the comment above)
This is the same functionality, but the subDirs are only provided at runtime (in the PipelineRun, not in the Pipeline)
Similar to 2, but there is a Task that gets access to the full PVC
Same functionality but with the not yet existing FileSet PipelineResource (#1285 - also was in @sbwsg 's design for #1076 originally)

Is it possible that (2) or (3) could meet your needs @skaegi ? This would allow us to avoid specifying subPath in Pipeline which in my mind is the most complicated part of the latest design we proposed.

I think it would look something like this (basically volumes - and their subpaths - are only specified in TaskRun or PipelineRun):

Meta-type descriptions...
-----------------------------
Workspace
  name [string]
  description [string] (optional)
  mountPath [string] (defaults to /workspace/{name})

WorkspaceBinding
  name [string]
  volumeName [string]

WorkspaceVolume
  name [string]
  description [string] (optional)
  persistentVolumeClaim [PersistentVoluemClaimVolumeSource]
  volumeSubPath [string] (optional

PipelineDeclaredWorkspace
  name [string]

-- and now modifications to existing types...
Task
  workspaces [array of Workspace]

TaskRun
  workspaces [array of WorkspaceBinding]
  volumes (or workspaceVolumes TBD) [array of WorkspaceVolume]

Pipeline
  tasks[*].workspaces [array of WorkspaceBinding]
  workspaces [array of PipelineDeclaredWorkspace]

PipelineRun
  workspaces [array of WorkspaceBinding]
  volumes (or workspaceVolumes TBD) [array of WorkspaceVolume]

The main reason I want to push back is because I think (4) is the cleanest example, and once we have the FileSet PipelineResource I'm betting you'll want to use it instead of messing with PVCs and paths directly! :crossed_fingers:

bobcatfish on 5 Nov 2019

Thanks @bobcatfish -- and agree FileSets are cool and think they abstract away most of what subPath was doing. Let me have a go at making an example using some fancy $(tasks.{name}.workspaces.{names}) syntax but I think this is starting to look good. One thing I'd like to try is using absolute /workspace paths in the Task instead of interpolation and will include that in my examples.

skaegi on 5 Nov 2019

So I played with your examples a bit and quite liked (1) although having now seen filesets agree that subPath management might be optimized. For (2) and (3) I found having the volume subPath on the volume did not feel right. A "subPath" is a property of the volumeMount and one of the things that I really liked about our earlier design was how cleanly a Workspace and WorkspaceBinding combined to produce exactly the fields needed to create a VolumeMount in the resulting pod. For (4) I liked how a FileSet hides a number of the details that in most cases are not important. In particular it seems to me that it was only the resource "consumers" who care about path details and the "producers" just wanted an arbitrary work folder.

So with that in mind I wonder if "subPath" was an optional field where if not provided used a generated value in its workspace binding. e.g. Producers wouldn't typically provide a subPath, but Consumers who need the subPath can get it via interpolation $(tasks.{producerName}.workspaces.{names}.subPath)

This example is similar to (1) but assumes a generated subPath and uses interpolation to extract the value in the Pipeline. Since the interpolation implies ordering I removed the runsAfter as I believe this can and should be computed internally. In the Tasks I use absolute /workspaces/{name} paths instead of interpolation and have two mail-box workspaces similar to (4) With that said I think there are cases where having a workspace that holds a number of output workspace folders from previous tasks is sensible.

skaegi on 6 Nov 2019

A "subPath" is a property of the volumeMount and one of the things that I really liked about our earlier design was how cleanly a Workspace and WorkspaceBinding combined to produce exactly the fields needed to create a VolumeMount in the resulting pod

That's an interesting way of looking at it - I think I've been looking at the fields we're adding less from the perspective of mapping each one perfectly to k8s concepts, and more from the perspective of who is interacting with each type at which time and what should they need to know, specifically Task and Pipeline authors vs folks people actually running the Pipelines who provide runtime information.

In my mind the path to use on a volume is runtime information - I don't really see why anyone writing a Task or a Pipeline that uses these volumes cares about _where_ on the volume the data is, they just want the data.

The exceptional case seems to be when a Pipeline wants to get data from another Task - and it seems like the syntax you are introducing in your example seems to be recreating from concept that PipelineResources (and soon hopefully output params!) use. And this has me thinking more and more that PipelineResources are really the concept that would do what you want here :S

In the Tasks I use absolute /workspaces/{name} paths instead of interpolation

Quick question: is there a specific reason why you prefer using an absolute path, or is it just for verbosity in the example? imo it's much more robust to use the interpolated value

For (4) I liked how a FileSet hides a number of the details that in most cases are not important. In particular it seems to me that it was only the resource "consumers" who care about path details and the "producers" just wanted an arbitrary work folder.

I continue to be strongly suspicious that if you had the FileSet PipelineResource we wouldn't need to add all of the functionality you are describing. @sbwsg had a great suggestion today: what if we held off on completely implementing this issue, and assuming we're able to get a working (at least prototype) of FileSet available for you to try out within a couple weeks after Kubecon, we could let you try that out and see if it meets your needs?

Even if FileSet did work for you, I think we need to improve the way we handle volumes and volume mounts, but maybe not with all the features we've explored this, so @skaegi what do you think of this plan:

We implement the API in this comment
@sbwsg and I focus on making a FileSet POC (or better!) available for you to try out ASAP
We can expand the features (e.g. adding the subPath generation and the subPath "from" syntax) after that if FileSet doesn't work for you?

bobcatfish on 7 Nov 2019

Sure, go for it. My concerns are that it means that we must deliver on PipelineResources right away and that we probably want similar resource types for ConfigMaps, Secrets, and (eventually) Block Devices. I can see us delivering on a static set of built-in PipelineResources but that would mean we need to figure out how to factor out extensibility at least short-term.

skaegi on 7 Nov 2019

Okay sounds good, let's give it a try :D

but that would mean we need to figure out how to factor out extensibility at least short-term.

The most recent design includes a pretty sweet model for extensibility that @sbwsg came up with where you can define your own PipelineResource types :D

bobcatfish on 7 Nov 2019

ok. I'll start digging into that asap and give feedback. Sorry @sbwsg but I suspect it might be worthwhile to reconvene the resource-wg next week some time...

skaegi on 8 Nov 2019

I suggest that we keep conversation of the new resource proposal to the main WG. If we start taking up inordinate amounts of time discussing it then a separate WG makes more sense to me but I really want to keep the new proposal visible in the wider community if we can. Feel free to ping me with questions on slack though - happy to talk through the design or implementation details I'm working through now, or muse about where we could take it next.

sbwsg on 11 Nov 2019

👍1

I also think once we have some POC's to try out (for this proposal and for FileSet) it'll help with our discussions! :D

bobcatfish on 11 Nov 2019

👍2

For those following along, I think the two remaining pieces here are:

configMap volume support
secret volume support

Anything else outstanding in the context of this issue? I figure we can follow up with brand new features around workspaces in new GH issues.

sbwsg on 20 Dec 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Design: Deployment build-pipeline-resource

nader-ziada · 4Comments

Install Task/TaskRun Object Internal error occurred: failed calling webhook "webhook.tekton.dev"

cnych · 4Comments

Start using error wrapping

bobcatfish · 4Comments

Pipeline doesn't allow different service accounts for different tasks

hrishin · 3Comments

Schedule Pods in resource-constrained environments

ImJasonH · 4Comments