Tekton should be able to limit the number of PipelineRuns running concurrently on a namespace. PipelineRuns on a namespace should be queued if they exceed the configured limit. The limit should cover all PipelineRuns on the namespace regardless of the Pipeline that they reference.
@gorkem :
If the goal is to limit the underlying resource utilization in each namespace then isn't it better to use resource-quotas instead of limit on PipelineRuns.
Another idea would be apply quota limits on Pod. Since TaskRuns CRD constructs k8s pod resource as part of reconciliation you can also apply limit on the number of pods(taskruns) in each namespace for similar effect.
Are there any use cases which are not covered by either of those explanations? If so could you please explain the reasoning or thought behind this request?
The main idea is to avoid starving pipelines when resource-quotas are applied. I do not think Pipelines check whether there is enough resources to run a PipelineRun through therefore multiple pipelines that are started on the same namespace can starve each other and fail.
Hey there @gorkem you're right that Pipelines don't check whether there are enough available resources for a Task to execute in a namespace. We're tracking that problem in https://github.com/tektoncd/pipeline/issues/734 and I'm about ~80% of the way through development. You can see my latest commit to retry pod creation in the face of ResourceQuota errors.
I'm going to close this as a duplicate of #734. Feel free to reopen if you think this issue is describing a different problem or drop comments on the design doc or ping me on slack (user Scott). Cheers!
One other interesting observation around this:
When a pipeline is running and a task is unable to fit on the _node_, then the Pod is held in a Pending state until space is freed up on the node or the task times out. However, when a task is unable to fit due to a _resource quota_, the Pod is rejected immediately and the task fails. I find this kubernetes behaviour slightly confusing - why does one kind of resource limit (node limit) cause a pending+retry behaviour while another limit (resource quota) cause immediate rejection? Anyway, working on this now.
I have another use case for this, which is to do with cross-talk between concurrent runs. In an integration test scenario, for example, the tasks depend on an external resource. If that resource is stateful (like a database), some tasks are rebuilding the database while others might be executing tests which use the database. I'd love to be able to single-thread pipeline runs through the integration test phase.
I guess https://github.com/tektoncd/pipeline/issues/2828 is a newer version of this request.
Most helpful comment
@gorkem :
If the goal is to limit the underlying resource utilization in each namespace then isn't it better to use
resource-quotasinstead of limit onPipelineRuns.Another idea would be apply quota limits on Pod. Since
TaskRunsCRD constructs k8s pod resource as part of reconciliation you can also apply limit on the number of pods(taskruns) in each namespace for similar effect.Are there any use cases which are not covered by either of those explanations? If so could you please explain the reasoning or thought behind this request?