A Task can have multiple steps with different resources requests/limits values, where the n-1 step gets higher values than the latest step.
While it works with Tekton Pipelines 0.8.0 and 0.9.2, since the 0.10.0 it is not possible to deploy a Task which defines a step 2 with higher values than the latest step.
An error message like below is thrown (in that case the spec.containers[2].resources.limit value is 512Mi but the spec.containers[3].resources.limitis 128Mi)
Message
Invalid TaskSpec: Pod "resource-request-bug-mi-pod-db877" is invalid: spec.containers[2].resources.requests: Invalid value: "256Mi": must be less than or equal to memory limit
The following TaskRun is used to reproduce the problem:
apiVersion: tekton.dev/v1alpha1
kind: TaskRun
metadata:
name: resource-request-issue
spec:
taskSpec:
steps:
- name: minimal-resources-values
image: ubuntu
script: |
#!/usr/bin/env bash
set -euxo pipefail
echo "Hello from Bash using memory request 64Mi and limit to 128Mi!"
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "200m"
- name: maximal-resources-values
image: ubuntu
script: |
#!/usr/bin/env bash
set -euxo pipefail
echo "Hello from Bash using memory request 256Mi and limit to 512Mi!"
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "200m"
- name: re-minimal-resources-values
image: ubuntu
script: |
#!/usr/bin/env bash
set -euxo pipefail
echo "Hello from Bash using memory request 64Mi and limit to 128Mi!"
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "200m"
kubectl apply -f https://github.com/tektoncd/pipeline/releases/download/v0.10.0/release.yaml
kubectl apply -f https://gist.githubusercontent.com/mgreau/2e34b5e535134ee97e3a0aa3b1e3b248/raw/2a43ad6d023aa838a8e869b73f049cec0ef413e1/tekton-resource-request-issue.yaml
taskrun.tekton.dev/resource-request-issue created
tkn taskrun describe resource-request-issue
Name: resource-request-issue
Namespace: default
Status
STARTED DURATION STATUS
35 seconds ago --- Failed(CouldntGetTask)
Message
Invalid TaskSpec: Pod "resource-request-issue-pod-lmh64" is invalid: spec.containers[2].resources.requests: Invalid value: "256Mi": must be less than or equal to memory limit
Input Resources
No resources
Output Resources
No resources
Params
No params
Steps
No steps
$ kubectl delete namespace tekton-pipelines
namespace "tekton-pipelines" deleted
$ tkn taskrun delete resource-request-issue
Are you sure you want to delete taskrun "resource-request-issue" (y/n): y
TaskRun deleted: resource-request-issue
kubectl apply -f https://github.com/tektoncd/pipeline/releases/download/v0.9.2/release.yaml
kubectl apply -f https://gist.githubusercontent.com/mgreau/2e34b5e535134ee97e3a0aa3b1e3b248/raw/2a43ad6d023aa838a8e869b73f049cec0ef413e1/tekton-resource-request-issue.yaml
taskrun.tekton.dev/resource-request-issue created
tkn taskrun describe resource-request-issue
Name: resource-request-issue
Namespace: default
Status
STARTED DURATION STATUS
12 seconds ago 12 seconds Succeeded
Input Resources
No resources
Output Resources
No resources
Params
No params
Steps
NAME STATUS
minimal-resources-values Completed
maximal-resources-values Completed
re-minimal-resources-values Complete
tkn taskrun logs resource-request-issue
[minimal-resources-values] + echo 'Hello from Bash using memory request 64Mi and limit to 128Mi!'
[minimal-resources-values] Hello from Bash using memory request 64Mi and limit to 128Mi!
[maximal-resources-values] Hello from Bash using memory request 256Mi and limit to 512Mi!
[maximal-resources-values] + echo 'Hello from Bash using memory request 256Mi and limit to 512Mi!'
[re-minimal-resources-values] Hello from Bash using memory request 64Mi and limit to 128Mi!
[re-minimal-resources-values] + echo 'Hello from Bash using memory request 64Mi and limit to 128Mi!'
it might be related to this update https://github.com/tektoncd/pipeline/pull/1655 but I'm not sure.
This issue comes from where container requests are handled for a TaskRun, which is resource_request.go.
Basically, what is happening is that the last container or step for a TaskRun is set to the maximum values for ResourceCPU, ResourceMemory, and ResourceEphemeralStorage of all the containers passed in. So, in your scenario, the final container request is set to the following:
ResourceMemory: 256Mi
ResourceCPU: 100m
ResourceEphemeralStorage: 0 (This occurs since all values are initially set to 0 unless something higher is found)
Since step 3 receives a request that is higher than its limit (memory: "128Mi"), the TaskRun fails.
Because the max value is placed in the last container without thinking about the limit, it exceeds the limit. It seems like the implementation doesn't honor limits of the last step.
/kind bug
Prior to #1655, it looks like the last step index wasn't used to hold the max request. It looks like what was being done was finding the index of the highest resource request for memory, cpu, and ephemeral storage. So, instead of throwing everything as a single request in to the last container, it was leaving the max resource request in the index where it originally was while zeroing out anything that is not the max.
This would make sense as then it would honor its limit. So my thought here is that the function should find the index and max value of memory, cpu, and ephemeral storage and leave those values where they are and zero out everything else.
There is still the issue of #1045, but that I can handle separately as the container requests should factor in the minimum for the limit range for containers.
/assign
Most helpful comment
Prior to #1655, it looks like the last step index wasn't used to hold the max request. It looks like what was being done was finding the index of the highest resource request for memory, cpu, and ephemeral storage. So, instead of throwing everything as a single request in to the last container, it was leaving the max resource request in the index where it originally was while zeroing out anything that is not the max.
This would make sense as then it would honor its limit. So my thought here is that the function should find the index and max value of memory, cpu, and ephemeral storage and leave those values where they are and zero out everything else.
There is still the issue of #1045, but that I can handle separately as the container requests should factor in the minimum for the limit range for containers.