I recently went through the pipelines-tutorial in the OpenShift org and encountered the following error message:
Missing or invalid Task pipelines-tutorial/s2i-java-8: pods
"petclinic-deploy-pipelinerun-qwq6k-build-hnq2g-pod-f9e7df" is
forbidden: [minimum memory usage per Container is 6Mi, but request is
0., minimum memory usage per Container is 6Mi, but request is 0.,
minimum memory usage per Container is 6Mi, but request is 0., minimum
memory usage per Container is 6Mi, but request is 0., minimum memory
usage per Container is 6Mi, but request is 0., minimum memory usage per
Container is 6Mi, but request is 0., minimum memory usage per Container
is 6Mi, but request is 0.]
This occurred because my OpenShift project had a LimitRange with the following specifications:
kind: LimitRange
apiVersion: v1
metadata:
name: pipelines-tutorial-core-resource-limits
namespace: pipelines-tutorial
selfLink: >-
/api/v1/namespaces/pipelines-tutorial/limitranges/pipelines-tutorial-core-resource-limits
uid: 492a15c6-9d90-11e9-ae59-0a8e513d9bd0
resourceVersion: '1063283'
creationTimestamp: '2019-07-03T12:44:23Z'
spec:
limits:
- type: Container
max:
memory: 6Gi
min:
memory: 6Mi
default:
cpu: 500m
memory: 1536Mi
defaultRequest:
cpu: 50m
memory: 256Mi
- type: Pod
max:
memory: 12Gi
min:
memory: 6Mi
Since there were no specifications on the container cpu of resource request for the pipeline, the resource request was set to 0. This caused the pipelinerun as part of the tutorial to fail.
If no amount had been requested, the LimitRange's specifications would have been used for the tasks' containers, and I think this should be the approach moving forward. I believe the source in question can be found here.
In general, I think the error message should be improved to better reflect the situation. Missing or invalid Task pipelines-tutorial/s2i-java-8 is not what was wrong with the pipelinerun. The issue was due to the cpu resource request being 0, which was less than the LimitRange's min value.
More on how I ran into this issue can be found here.
Error message should not have Missing or invalid Task pipelines-tutorial/s2i-java-8 as reason pipelinerun failed. It should focus more on the LimitRange aspect of the issue.
The default value for a resource request should be the minimum of a LimitRange instead of 0.
Error message in issue description.
LimitRange minimum of greater than 0 on an OpenShift 4 cluster namespace and trigger a pipelinerun with the default resource request.This was run on a Red Hat Product Demo System (RHPDS) OpenShift 4.1 cluster.
In the documentation it states:
The CPU, memory, and ephemeral storage resource requests will be set to zero if the container image does not have the largest resource request out of all container images in the Task. This ensures that the Pod that executes the Task will only request the resources necessary to execute any single container image in the Task, rather than requesting the sum of all of the container image's resource requests.
Obviously if the platform has a LimitRange set that will prevent 0 to be the minimum the task will not run.
This should be fixed by either setting the request as the same value as the container image with the largest resource request or by not setting a value at all, if there's a LimitRange defining a minimum, let the LimitRange apply it.
In the example above, none of the images have a request defined, but it seems the LimitRange has applied to value to the first container created and hence the second will be set to 0.
cc/ @siamaksade
IIRC, the reason we set the requests to 0 explicitly in the fix for #598 is that if you leave the request unset, the Pod still ends up summing the default values for the containers, and values used for limits become the values for requests (see https://github.com/tektoncd/pipeline/issues/598#issuecomment-479604163 for some discussion of this).
I think the best fix here would be to load the LimitRange for Containers somewhere in pod.go, and then change zeroNonMaxResourceRequests so that if no container sets an explicit request, either the default or minimum request values for the LimitRange are used on one of the containers (I don't think it matters which container is chosen).
That would be good, as if a Task runs all the steps as concurrent containers all the memory will be added up.
I've tried manually setting the resources requests in the Tasks for every image, but what I've found when created a Pipeline and the PipelineRun is that there were more containers defined in the Pod for which I could not easily set the memory.requests.
Some of them are using this image quay.io/openshift-pipeline/tektoncd-pipeline-imagedigestexporter:v0.4.0, this other one quay.io/openshift-pipeline/tektoncd-pipeline-git-init:v0.4.0 and also this other one quay.io/openshift-pipeline/tektoncd-pipeline-nop:v0.4.0 (although for the last one there's no request set to 0 as there is for the other 2) and for those I can not set the requests, so the pipeline can't run with my LimitRange no matter what.
I'm using the Pipeline defined in this tutorial, so it's really easy to reproduce.
All requests you set on steps in a Task other than the largest one will be replaced with 0.
Depending on exactly how the minimum values for a LimitRange for a Container are enforced (per-container, or per-pod?) It might mean that we actually need to set the minimum on _every_ container. If we did that, then to still have #598 fixed, we'd need to change the largest request from whatever it is (say X), to X - (minimum * (number of tasks - 1)) so that we don't request more resources than necessary.
At that point this all starts looking pretty complicated, so I wonder if there is a better way to control how Pod-level resource requests are computed to fix #598 without having to modify what users have specified.
All requests you set on steps in a Task other than the largest one will be replaced with 0.
I've verified I can set the requests on every container on every step to prevent them to be calculated and henced zeroed. Also, I've verified that the containers the framework adds (and hence I can not explicitly set the requests) are all zeroed, except for the last one I described quay.io/openshift-pipeline/tektoncd-pipeline-nop:v0.4.0 which in the final pod has no request set, even not a zero.
Here's the pod definition and in line 333 you can see the last container without this computed request. I guess this is a bug.
LimitRanges are enforced by cluster-administrators to make sure that the resources are not wasted. Even without discussion whether having a minimum request is a good or bad idea, the reality is that it happens and there's nothing that we can do about it other than understand this fact and compute resources for the TaskRun pod in a mindful way.
I've verified I can set the requests on every container on every step to prevent them to be calculated and henced zeroed
I think they would all be zeroed except for the one with the largest request. In your example pod definition, all requests are zeroed except for the one container which requested 6mi of memory. Is that what you mean?
Here's the pod definition and in line 333 you can see the last container without this computed request. I guess this is a bug.
Yeah, I'm not sure why no requests are set for the nop container, or why the nop container is even present. I think I can see why that would have been broken before https://github.com/tektoncd/pipeline/commit/ace3ab670b4c6a2e78fec0c4ac1dfa0caba1e316#diff-9177f963c9c62cb8e53097a81c571619L293, since the nop container was added after the resource requests were modified, but it looks like the nop container is no longer added as of that commit, released in 0.5.0. What version of Tekton are you running?
Even without discussion whether having a minimum request is a good or bad idea, the reality is that it happens and there's nothing that we can do about it other than understand this fact and compute resources for the TaskRun pod in a mindful way.
I think we are in agreement here, if you have any thoughts on how to fix it without regressing #598, they would be welcome. Right now the only idea I have is what I mentioned in https://github.com/tektoncd/pipeline/issues/1045#issuecomment-510206550.
LimitRange is used and although a best practice for namespaces where Tekton Tasks are running is to definitely not set a minimum we should still try to honor the request.
I agree that we could certainly get fancy wrt calculating an upper limit on request e.g. maximum - (minimum * (number of tasks - 1)) but I would tend to keep things simple and leave this at maximum at least for now.
For the lower limit though we should update our request zeroing to explicitly set the request to the LimitRange minimum or 0 if not present. Leaving it blank will use either the LimitRange default or an internal implementation specific default which typically are higher.
I think it鈥檚 a good idea to honor a LimitRange minimum before requesting the default value, but I disagree with leaving the default request as 0. I think, despite the LimitRange default typically being higher than is needed, we need to honor the default value and can鈥檛 assume 0.
I think it鈥檚 a better approach to have the default be empty and trust the LimitRange defaults have been set appropriately by the user. I think users will continue to encounter issues with the 0 request.
The problem is that if set the request to the LimitRange minimum or the default request, #598 becomes a problem again, because now the default/minimum values will be combined to compute the resources required for the Pod even though the containers will only execute one at a time, so the pod requires more resources than it will ever use at a single point in time.
That might be fine if you use the minimum values, because the additional resources required will probably not be significantly different than the maximum value. For example, with that change, say we have 6 steps for a Task, the max memory request is 1Gi, and the LimitRange minimum is 10Mi, then the Pod for the Task will request 6.05Gi even though it only requires 6Gi. If you use the _default_ request values, not the minimums, then the excess could be a lot worse, since the default is probably significantly higher than the minimum.
It seems like the fundamental issue is that we are doing tricky things that are opaque to Kubernetes to make Container act kind of like InitContainer. Because of that, the default behavior for computing Pod resource requests doesn't work well for our use case. I wonder if there is any reasonable upstream change we could make so we don't have to manipulate resource requests to get the behavior we want.
My apologies. I didn鈥檛 fully go through the discussions in #598 and now better understand the proposal from @skaegi.
If the goal here is a short term fix, I think it sounds reasonable. I think a question is if it makes sense to require resources be defined for a Task. However, this seems like it would be a poor user experience. Not sure if there had been any discussions around that.
IMHO there should be the ability for the user to explicitly set this values
in every container that tekton adds to the pod, so he can fix the problem
if it arises. Limit range is something that typically a user/developer
doesn't control and a problem not being able to schedule a pipeline can be
really tricky to fix. The user/developer should be able to fix these as
much as possible.
I think that as long as the user can explicitly set this value instead of
renting on the LimitRange in case of problems, then leaving the request as
0 or non-0 should not be a big issue, as it's solvable by the user in case
of problems.
/assign
Most helpful comment
IMHO there should be the ability for the user to explicitly set this values
in every container that tekton adds to the pod, so he can fix the problem
if it arises. Limit range is something that typically a user/developer
doesn't control and a problem not being able to schedule a pipeline can be
really tricky to fix. The user/developer should be able to fix these as
much as possible.
I think that as long as the user can explicitly set this value instead of
renting on the LimitRange in case of problems, then leaving the request as
0 or non-0 should not be a big issue, as it's solvable by the user in case
of problems.