A "volume resource" like abstraction is valuable as it lets a Task specify the requirement for a work folder as well as a description of its purpose without declaring the specifics of the volume backing it.
In the resources meeting -- https://docs.google.com/document/d/1p6HJt-QMvqegykkob9qlb7aWu-FMX7ogy7UPh9ICzaE/edit#heading=h.vf9n4tewyadq -- we discussed different approaches for supporting the concept and decided the best approach was to introduce a volumes or similarly named Task.spec field as the best balance of having it present in the beta without forcing PipelineResources to also be present.
It turns out that volumes is actually a poor name as it both conflicts with the same field in a Pod spec and also does not capture the fact that it's actually closer to a volumeMount. The concept is for a named folder that is (at least by default) mounted as a sub-directory of /workspace so for now I'll use workspaces as the field name instead of volumes.
Each workspaces item has minimally just a name and optional description. For example:
workspaces:
- name: first
description: The first folder this task needs
- name: second
description: The second folder this task needs
In all containers in the resulting Pod /workspace/first and /workspace/second would be volumeMounts assigned by the TaskRun pointing at a particular volume and optional subPath in that volume. If a TaskRun does not provide a valid volumeMount that maps to /workspace/{workspaces.name} it should fail validation.
Other considerations to think about include: reserved names, optionality, alternate mountPaths, follow-on syntax in Pipeline, PipelineRun, and TaskRun.
+1 for an overridable mountPath.
Thanks for this.
If I understand correctly this would provide a logical name to be used in tasks for a path on a pre-provisioned PVC. I think it would be helpful to see an example task, how it would look like in a complete example.
Unfortunately I missed the meeting, and I would like to contribute my 2垄.
As far as I understand the main issue with the current solution for sharing data between tasks is that PVC can be expensive to provision (in terms of time it takes to provision and also quota limits). The benefit of the proposed volume field would be that the PVC can be pre-provisioned, which takes care of the issue of provisioning time, and also the same PVC could be partitioned and used by multiple Tasks / Pipelines, which helps with quota issues.
I think the current solution we have for artifacts PVC is quite nice, as it's mostly transparent to end-users, but it has three main limitations:
[pipelinerun-uuid]/tektonAll this is to say that I believe we could be on top of the existing artifac pvc solution to achieve something very close to what is proposed here, which would have the extra benefit of being backward compatible, i.e. it would keep the auto-provisioning feature for users that want that.
@afrittoli -- yes I agree as much as we can that we want to build on top of the existing artifact pvc concept. One of the main goals for this proposal is just to add just enough syntax to make use of the artifact pvc (or something like it) clear.
So I've take the original proposal a little further and hopefully closer to implementable. So... to start things off I want to provide a bit more on the workspace types this proposal adds.
name [string]
description [string] (optional)
mountPath [string] (defaults to /workspace/{name})
optional [boolean] (defaults to false)
WorkspaceMountBinding
name [string]
volumeName [string]
volumeSubPath [string] (optional)
WorkspaceDevice
name [string]
description [string] (optional)
devicePath [string] (defaults to /workspace/{name})
optional [boolean] (defaults to false)
WorkspaceDeviceBinding
name [string]
volumeName [string]
WorkspaceVolume
name [string]
description [string] (optional)
configMap [ConfigMapVolumeSource]
persistentVolumeClaim [PersistentVoluemClaimVolumeSource]
persistentVolumeClaimTemplate [PersistentVolumeClaim]
secret: [SecretVolumeSource]
# [exclusive] configMap, persistentVolumeClaim, persistentVolumeClaimTemplate, secret
# [default] persistentVolumeClaimTemplate populated from current Tekton artifact pvc
-- and now modifications to existing types...
Task
workspaceMounts [array of WorkspaceMount]
workspaceDevices [array of WorkspaceDevice]
TaskRun
workspaceMounts [array of WorkspaceMountBinding]
workspaceDevices [array of WorkspaceDeviceBinding]
workspaceVolumes [array of WorkspaceVolume]
Pipeline
tasks[*].workspaceMounts [array of WorkspaceMountBinding]
tasks[*].workspaceDevices [array of WorkspaceDeviceBinding]
workspaceVolumes [array of WorkspaceVolume]
PipelineRun
workspaceVolumes [array of WorkspaceVolume]
Here's a quick example consisting of: a task that writes a message, a task that reads a message, and a pipeline that ties the two together. (Note: I also have flattened Task params) This does not use all the bells and whistles that the above types offer, but hopefully gets the point across.
apiVersion: tekton.dev/v1alpha1
kind: Task
metadata:
name: write-task
spec:
params:
- name: message
description: the message
steps:
- name: message-mount-write
image: alpine
env:
- name: message
value: $(params.message)
command: ["/bin/sh", "-c"]
args:
- echo $message > /workspace/messages/message;
workspaceMounts:
- name: messages
description: the folder where we write the message to
---
apiVersion: tekton.dev/v1alpha1
kind: Task
metadata:
name: read-task
spec:
steps:
- name: message-mount-read
image: alpine
command: ["/bin/sh", "-c"]
args:
- cat /workspace/messages/message;
workspaceMounts:
- name: messages
description: the folder where we read the message from
---
apiVersion: tekton.dev/v1alpha1
kind: Pipeline
metadata:
name: pipeline
spec:
params:
- name: message
description: the message
tasks:
- name: pipeline-write-task
taskRef:
name: write-task
params:
- name: message
value: $(params.message)
workspaceMounts:
- name: messages
volumeName: pipeline-volume
- name: pipeline-read-task
runAfter: [pipeline-write-task]
taskRef:
name: read-task
workspaceMounts:
- name: messages
volumeName: pipeline-volume
readOnly: true
workspaceVolumes:
- name: pipeline-volume
description: The volume where we will write and then read the message
# Note: no volume type is provided so we will create a PVC using Tekton defaults
Thanks for the extra specification work!
This looks like a lot to support before beta - perhaps we could have an incremental plan where we aim for a subset of this at first?
Is this meant to fully replace the VolumePipelineResource?
We talked about this briefly in the Beta meeting but wanted to leave a note here as well - it looks like the shape of WorkspaceMount and WorkspaceDevice are very similar (only the *Path param names differ). Wonder if we can de-dupe those somehow - would be cool if a Task didn't have to care about whether a mount is a Volume or Device.
@afrittoli yes this is meant to fully replace VolumePipelineResource . There was concern that just saying that a particular PipelineResource was beta but only for type: volume or that the type field was alpha and the default type was VolumeResource was confusing. The hope though is that at least part of the VolumePipelineResource PR would still be useful for the eventual implementation.
@sbwsg I had the same thought when spec'ing this out but part of my reasoning is that they are wrapping different concepts at the Pod level. In particular VolumeDevice is still relatively new and might add new fields that we might want to expose.
The other idea I had wondered about was to altogether defer on VolumeDevice support however it seemed like the work to prevent volumeMode: Block PVs wasn't worth it relative to just offering support (and yes easy to say at the spec level). I'm also sensitive to MLOps use-cases where one of the most common operations is ETL and block device access would be useful.
Thanks for the detailed proposal! A couple notes:
Thanks @dlorenc
So... what if we pared things down to really just mapping the artifact pvc into the workspace, and deferred everything else to a VolumePipelineResource when PipelineResources are ready...
name [string]
description [string] (optional)
mountPath [string] (defaults to /workspace/{name})
WorkspaceBinding
name [string]
subPath [string] (optional)
-- and now modifications to existing types...
Task
workspaces [array of Workspace]
TaskRun
workspaces [array of WorkspaceBinding]
Pipeline
tasks[*].workspaces [array of WorkspaceBinding]
Our example becomes...
apiVersion: tekton.dev/v1alpha1
kind: Task
metadata:
name: write-task
spec:
params:
- name: message
description: the message
steps:
- name: message-mount-write
image: alpine
env:
- name: message
value: $(params.message)
command: ["/bin/sh", "-c"]
args:
- echo $message > /workspace/messages/message;
workspaces:
- name: messages
description: the folder where we write the message to
---
apiVersion: tekton.dev/v1alpha1
kind: Task
metadata:
name: read-task
spec:
steps:
- name: message-mount-read
image: alpine
command: ["/bin/sh", "-c"]
args:
- cat /workspace/messages/message;
workspaces:
- name: messages
description: the folder where we read the message from
---
apiVersion: tekton.dev/v1alpha1
kind: Pipeline
metadata:
name: pipeline
spec:
params:
- name: message
description: the message
tasks:
- name: pipeline-write-task
taskRef:
name: write-task
params:
- name: message
value: $(params.message)
workspaces:
- name: messages
- name: pipeline-read-task
runAfter: [pipeline-write-task]
taskRef:
name: read-task
workspaces:
- name: messages
readOnly: true
To prevent unnecessary artifact pvc creation if no Pipeline tasks[*].workspaces are specified then we don't create it. If a WorkspaceBinding is not provided ephemeral storage is allocated (or we reuse the /workspace EmptyDir)
I think this is not bad and probably covers enough of the most common use-cases to be sufficient for beta. I currently use a mixture of params and PodSpec.volumes to handle configMap and secret sharing and can wait for a VolumePipelineResource without forcing the issue. WDYT?
So to be clear I understand the proposal:
This sgtm. A few questions:
I thought I was following this pretty well until https://github.com/tektoncd/pipeline/issues/1438#issuecomment-545913358 but I think that might be b/c I was viewing this proposal maybe differently than you @skaegi @dlorenc
In my mind we don't actually _need_ the VolumeResource b/c today you can get all the functionality it provided (minus automatic PVC creation) by using:
volumeMount field in a step and the volume field in a Task (e.g. like this https://github.com/tektoncd/pipeline/blob/master/examples/taskruns/secret-volume.yaml) BUT:
podTemplate in a PipelineRunI'm a bit confused by why in https://github.com/tektoncd/pipeline/issues/1438#issuecomment-545913358 @skaegi you want to "mapping the artifact pvc into the workspace" - is it important that it's the artifact PVC (which I take to mean the PVC tekton automatically creates when you use from (or cough all the time #1007 :sob: )), or is it that you want to be able to feed a PVC into a Run and wire it through your Tasks?
I would prefer a solution where users can provide their own PVC or whatever volume info they want vs. trying to surface and make available what imo is an implementation detail of output -> input linking (and folks might be using something other than PVCs, we currently allow for GCS upload/download instead).
Anyway we can talk more in the working group but long story short I like the way that https://github.com/tektoncd/pipeline/issues/1438#issuecomment-545913358 looks (and how much simpler it is than https://github.com/tektoncd/pipeline/issues/1438#issuecomment-544313339), but I'm not understanding how WorkspaceBinding can be quite so simple. I thought it would need to allow the Run to specify both Volumes and VolumeMounts? (So my suggestion would be to add Volumes and VolumeMounts to Workspace binding :D )
Much clearer after your description in the working group @skaegi ! Haha I should have just waited before writing all those words, I had a feeling XD
After my exploration in #1508, @skaegi and I discussed and
This is the latest iteration of what we discussed (similar to https://github.com/tektoncd/pipeline/issues/1438#issuecomment-544313339 but without devices):
Meta-type descriptions...
-----------------------------
Workspace
name [string]
description [string] (optional)
mountPath [string] (defaults to /workspace/{name})
WorkspaceBinding
name [string]
readOnly [boolean] (defaults to false)
volumeName [string]
volumeSubPath [string] (optional)
WorkspaceVolume
name [string]
description [string] (optional)
persistentVolumeClaim [PersistentVoluemClaimVolumeSource]
-- and now modifications to existing types...
Task
workspaces [array of Workspace]
TaskRun
workspaces [array of WorkspaceBinding]
volumes (or workspaceVolumes TBD) [array of WorkspaceVolume]
Pipeline
tasks[*].workspaces [array of WorkspaceBinding]
volumes (or workspaceVolumes TBD) [array of WorkspaceVolume] # simon i think this would be something else, like just a list of volume names, we wouldnt know what actual volumes to provide until runtime
PipelineRun
volumes (or workspaceVolumes TBD) [array of WorkspaceVolume]
Main differences (that I remember) from #1508:
Pipelinevolume as well as workspace (in #1508 everything is workspace)@skaegi I was trying to explore your usecase for providing subPath in Pipeline, and I'm not exactly sure how you're doing it but I came up with a contrived scenario: a Pipeline where 2 Tasks write files to subdirectories on a PVC, and a 3rd Task reads those files from the root of the PVC.
I came up with a few (buggy, typo ridden) examples:
subDirs are only provided at runtime (in the PipelineRun, not in the Pipeline)FileSet PipelineResource (#1285 - also was in @sbwsg 's design for #1076 originally)Is it possible that (2) or (3) could meet your needs @skaegi ? This would allow us to avoid specifying subPath in Pipeline which in my mind is the most complicated part of the latest design we proposed.
I think it would look something like this (basically volumes - and their subpaths - are only specified in TaskRun or PipelineRun):
Meta-type descriptions...
-----------------------------
Workspace
name [string]
description [string] (optional)
mountPath [string] (defaults to /workspace/{name})
WorkspaceBinding
name [string]
volumeName [string]
WorkspaceVolume
name [string]
description [string] (optional)
persistentVolumeClaim [PersistentVoluemClaimVolumeSource]
volumeSubPath [string] (optional
PipelineDeclaredWorkspace
name [string]
-- and now modifications to existing types...
Task
workspaces [array of Workspace]
TaskRun
workspaces [array of WorkspaceBinding]
volumes (or workspaceVolumes TBD) [array of WorkspaceVolume]
Pipeline
tasks[*].workspaces [array of WorkspaceBinding]
workspaces [array of PipelineDeclaredWorkspace]
PipelineRun
workspaces [array of WorkspaceBinding]
volumes (or workspaceVolumes TBD) [array of WorkspaceVolume]
The main reason I want to push back is because I think (4) is the cleanest example, and once we have the FileSet PipelineResource I'm betting you'll want to use it instead of messing with PVCs and paths directly! :crossed_fingers:
Thanks @bobcatfish -- and agree FileSets are cool and think they abstract away most of what subPath was doing. Let me have a go at making an example using some fancy $(tasks.{name}.workspaces.{names}) syntax but I think this is starting to look good. One thing I'd like to try is using absolute /workspace paths in the Task instead of interpolation and will include that in my examples.
So I played with your examples a bit and quite liked (1) although having now seen filesets agree that subPath management might be optimized. For (2) and (3) I found having the volume subPath on the volume did not feel right. A "subPath" is a property of the volumeMount and one of the things that I really liked about our earlier design was how cleanly a Workspace and WorkspaceBinding combined to produce exactly the fields needed to create a VolumeMount in the resulting pod. For (4) I liked how a FileSet hides a number of the details that in most cases are not important. In particular it seems to me that it was only the resource "consumers" who care about path details and the "producers" just wanted an arbitrary work folder.
So with that in mind I wonder if "subPath" was an optional field where if not provided used a generated value in its workspace binding. e.g. Producers wouldn't typically provide a subPath, but Consumers who need the subPath can get it via interpolation $(tasks.{producerName}.workspaces.{names}.subPath)
This example is similar to (1) but assumes a generated subPath and uses interpolation to extract the value in the Pipeline. Since the interpolation implies ordering I removed the runsAfter as I believe this can and should be computed internally. In the Tasks I use absolute /workspaces/{name} paths instead of interpolation and have two mail-box workspaces similar to (4) With that said I think there are cases where having a workspace that holds a number of output workspace folders from previous tasks is sensible.
A "subPath" is a property of the volumeMount and one of the things that I really liked about our earlier design was how cleanly a Workspace and WorkspaceBinding combined to produce exactly the fields needed to create a VolumeMount in the resulting pod
That's an interesting way of looking at it - I think I've been looking at the fields we're adding less from the perspective of mapping each one perfectly to k8s concepts, and more from the perspective of who is interacting with each type at which time and what should they need to know, specifically Task and Pipeline authors vs folks people actually running the Pipelines who provide runtime information.
In my mind the path to use on a volume is runtime information - I don't really see why anyone writing a Task or a Pipeline that uses these volumes cares about _where_ on the volume the data is, they just want the data.
The exceptional case seems to be when a Pipeline wants to get data from another Task - and it seems like the syntax you are introducing in your example seems to be recreating from concept that PipelineResources (and soon hopefully output params!) use. And this has me thinking more and more that PipelineResources are really the concept that would do what you want here :S
In the Tasks I use absolute /workspaces/{name} paths instead of interpolation
Quick question: is there a specific reason why you prefer using an absolute path, or is it just for verbosity in the example? imo it's much more robust to use the interpolated value
For (4) I liked how a FileSet hides a number of the details that in most cases are not important. In particular it seems to me that it was only the resource "consumers" who care about path details and the "producers" just wanted an arbitrary work folder.
I continue to be strongly suspicious that if you had the FileSet PipelineResource we wouldn't need to add all of the functionality you are describing. @sbwsg had a great suggestion today: what if we held off on completely implementing this issue, and assuming we're able to get a working (at least prototype) of FileSet available for you to try out within a couple weeks after Kubecon, we could let you try that out and see if it meets your needs?
Even if FileSet did work for you, I think we need to improve the way we handle volumes and volume mounts, but maybe not with all the features we've explored this, so @skaegi what do you think of this plan:
FileSet POC (or better!) available for you to try out ASAPSure, go for it. My concerns are that it means that we must deliver on PipelineResources right away and that we probably want similar resource types for ConfigMaps, Secrets, and (eventually) Block Devices. I can see us delivering on a static set of built-in PipelineResources but that would mean we need to figure out how to factor out extensibility at least short-term.
Okay sounds good, let's give it a try :D
but that would mean we need to figure out how to factor out extensibility at least short-term.
The most recent design includes a pretty sweet model for extensibility that @sbwsg came up with where you can define your own PipelineResource types :D
ok. I'll start digging into that asap and give feedback. Sorry @sbwsg but I suspect it might be worthwhile to reconvene the resource-wg next week some time...
I suggest that we keep conversation of the new resource proposal to the main WG. If we start taking up inordinate amounts of time discussing it then a separate WG makes more sense to me but I really want to keep the new proposal visible in the wider community if we can. Feel free to ping me with questions on slack though - happy to talk through the design or implementation details I'm working through now, or muse about where we could take it next.
I also think once we have some POC's to try out (for this proposal and for FileSet) it'll help with our discussions! :D
For those following along, I think the two remaining pieces here are:
Anything else outstanding in the context of this issue? I figure we can follow up with brand new features around workspaces in new GH issues.
Most helpful comment
I also think once we have some POC's to try out (for this proposal and for FileSet) it'll help with our discussions! :D