Part of https://github.com/kubernetes/test-infra/issues/18551
This is a policy action Item out of [Policies to improve Kubernetes CI] discussed at SIG-Testing yesterday.
Checklist of release-blocking jobs: (h/t @tpepper)
Checklist of merge-blocking jobs (suggestions are based on metrics explorer, check against resource requests too!)
For release-blocking jobs:
For merge-blocking jobs:
(Punted "decide how we're going to measure success" to https://github.com/kubernetes/test-infra/issues/18785)
How to make a guess:
How to see resources (note: this only works for jobs that are running in k8s-infra-prow-build)
Once the above has been completed, we can move on to the next step: migrating everything to a dedicated cluster.
/cc
Additionally: This should be enforced by test-infra presubmit, to prevent regressions.
If I can find time I will try to generate a list of jobs that need updates and their paths, throw it up in a sheet with check boxes so that people can claim them to avoid duplicating efforts.
Though not yet on community owned infra, build-master and build-master-fast both experienced timeouts this morning, likely due to resource constraints.
Example from build-master-fast: https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-build/1288690796073062402
Example from build-master: https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-build-fast/1288815869782134786
I added some notes around how to identify which jobs we're talking about here to the OP.
/cc
@BenTheElder I'm a bit confused here. For this: https://github.com/kubernetes/test-infra/issues/18159 you said that we don't really want presubmit in testgrid. Could you explain more? Thanks
@ZhiFeng1993 that's not related to this issue. Currently because they _are_ in testgrid, that's a quick way to find a lot of the jobs.
This issue is about how the jobs are configured, not about testgrid. See e.g. https://github.com/kubernetes/test-infra/pull/18471 mentioned in the original post.
Let's try to keep discussion here on-topic, lots of people are interested in this issue, it's going to take a lot of work, and github does not handle lots of comments well. :sweat_smile:
@tpepper you can subscribe to a github issue by clicking "subscribe" on the righthand side of the web UI :upside_down_face:
sig-release-master-blocking are:
config/jobs/kubernetes-sigs/kind/kind-release-blocking.yaml
config/jobs/kubernetes/sig-cli/sig-cli-config.yaml
config/jobs/kubernetes/sig-cloud-provider/gcp/gce-conformance.yaml
config/jobs/kubernetes/sig-cloud-provider/gcp/gcp-gce.yaml
config/jobs/kubernetes/sig-cloud-provider/gcp/gpu/gpu-gce.yaml
config/jobs/kubernetes/sig-network/sig-network-misc.yaml
config/jobs/kubernetes/sig-node/node-kubelet.yaml
config/jobs/kubernetes/sig-release/kubernetes-builds.yaml
config/jobs/kubernetes/sig-scalability/sig-scalability-release-blocking-jobs.yaml
config/jobs/kubernetes/sig-testing/bazel-build-test.yaml
config/jobs/kubernetes/sig-testing/conformance-e2e.yaml
config/jobs/kubernetes/sig-testing/integration.yaml
config/jobs/kubernetes/sig-testing/verify.yaml
sig-release-1.19-blocking are:
config/jobs/kubernetes/generated/generated.yaml
config/jobs/kubernetes/sig-release/release-branch-jobs/1.19.yaml
sig-release-1.18-blocking are:
config/jobs/kubernetes/generated/generated.yaml
config/jobs/kubernetes/sig-release/release-branch-jobs/1.18.yaml
sig-release-1.17-blocking are:
config/jobs/kubernetes/generated/generated.yaml
config/jobs/kubernetes/sig-release/release-branch-jobs/1.17.yaml
sig-release-1.16-blocking are:
config/jobs/kubernetes/generated/generated.yaml
config/jobs/kubernetes/sig-release/release-branch-jobs/1.16.yaml
Thanks for the lists tim, I tried to consolidate in description
/cc
https://github.com/kubernetes/test-infra/pull/18556 enforces the policy in test form, but only logs instead of errors
I will start adding request/limit entries for release block based on the added guesses in the description on Monday. Ran out of time today :
I hope you like issues. Because I made some issues.
... AND they're already in the project board 🥇
re: from the description above
#18580 - ci-kubernetes-e2e-gci-gce-ingress (TODO? are there not release-branch variants of this job ?)
config/jobs/kubernetes/generated/generated.yaml has ci-kubernetes-e2e-gce-cos-[k8sbeta|k8sstable1|k8sstable2|k8sstable3]-ingress and all of these already have requests and limits.
Based on
Seems like ci-kubernetes-gce-conformance-* may need higher limits?
Alright, we've got limits set on all of the release-blocking jobs! I'm going to flip this test to fail instead of log
spiffxp@spiffxp-macbookpro:test-infra (master %)$ go test -v -count=1 ./config/tests/jobs
# ...
=== RUN TestKubernetesReleaseBlockingJobsShouldHavePodQOSGuaranteed
--- PASS: TestKubernetesReleaseBlockingJobsShouldHavePodQOSGuaranteed (0.02s)
# ...
The merge-blocking situation is still pretty incomplete. I suspect we've at least missed a few release-branch jobs
spiffxp@spiffxp-macbookpro:test-infra (master %)$ go test -v -count=1 ./config/tests/jobs
# ...
=== RUN TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-build ([]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-build ([]): container '' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-build ([]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-build ([]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-build ([]): container '' should have resources.requests[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-build ([]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.19]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.19]): container '' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.19]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.19]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.19]): container '' should have resources.requests[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.19]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.18]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.18]): container '' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.18]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.18]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.18]): container '' should have resources.requests[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.18]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.16]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.16]): container '' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.16]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.16]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.16]): container '' should have resources.requests[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.16]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([]): container '' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([]): container '' should have resources.requests[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.17]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.17]): container '' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.17]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.17]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.17]): container '' should have resources.requests[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.17]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-dependencies ([]): container 'main' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-dependencies ([]): container 'main' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-dependencies ([]): container 'main' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-dependencies ([]): container 'main' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-dependencies ([]): container 'main' should have resources.requests[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-dependencies ([]): container 'main' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-gce-network-proxy-grpc ([]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-gce-network-proxy-grpc ([]): container '' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-gce-network-proxy-grpc ([]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-gce-network-proxy-grpc ([]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-gce-network-proxy-grpc ([]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.18]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.18]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.18]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.18]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.19]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.19]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.19]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.19]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind-ipv6 ([]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind-ipv6 ([]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind-ipv6 ([]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind-ipv6 ([]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-files-remake ([release-1.19]): container 'main' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-files-remake ([release-1.19]): container 'main' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-files-remake ([release-1.19]): container 'main' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-files-remake ([release-1.19]): container 'main' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-files-remake ([release-1.19]): container 'main' should have resources.requests[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-files-remake ([release-1.19]): container 'main' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.17]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.17]): container '' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.17]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.17]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.17]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.19]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.19]): container '' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.19]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.19]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.19]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([]): container '' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.16]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.16]): container '' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.16]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.16]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.16]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.18]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.18]): container '' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.18]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.18]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.18]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.17]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.17]): container '' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.17]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.17]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.17]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.19]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.19]): container '' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.19]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.19]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.19]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.16]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.16]): container '' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.16]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.16]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.16]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([]): container '' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.18]): container '' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.18]): container '' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.18]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.18]): container '' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.18]): container '' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.19]): container 'main' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.19]): container 'main' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.19]): container 'main' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.19]): container 'main' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.19]): container 'main' should have resources.requests[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.19]): container 'main' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.17]): container 'main' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.17]): container 'main' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.17]): container 'main' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.17]): container 'main' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.17]): container 'main' should have resources.requests[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.17]): container 'main' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.18]): container 'main' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.18]): container 'main' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.18]): container 'main' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.18]): container 'main' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.18]): container 'main' should have resources.requests[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.18]): container 'main' resources.limits[memory] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.16]): container 'main' should have resources.limits[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.16]): container 'main' should have resources.requests[cpu] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.16]): container 'main' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.16]): container 'main' should have resources.limits[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.16]): container 'main' should have resources.requests[memory] specified
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.16]): container 'main' resources.limits[memory] should be non-zero
--- PASS: TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed (0.00s)
# ...
Looking at this test this morning (filtered on CPU not being zero so as to count the number of job file edits required to finish this out)
go test -v -run TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed | grep "resources.limits\[cpu\] should be non-zero"
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.18]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.19]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind-ipv6 ([]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.17]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.19]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.16]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([]): container '' resources.limits[cpu] should be non-zero
TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.18]): container '' resources.limits[cpu] should be non-zero
Will ping @spiffxp later about me doing this work.
@RobertKielty you'll need to address comments on https://github.com/kubernetes/test-infra/pull/18668 and then that should take care of the node jobs
https://github.com/kubernetes/test-infra/pull/18691 is in flight for the kind jobs
I mentioned earlier today during SIG Testing meeting, but I suspect any of the issues that have been held open for soak time, making sure things are still running ok, etc. Can now probably be closed. @RobertKielty mentioned he was going to take a look at some. I will take a pass at some point but it may not be until Thursday at the rate I'm going
Anecdotally, while attempting to push some last minute PR's through the door for patch releases, it sure seems like merge-blocking presubmits are still flaking pretty badly.
@spiffxp can you include your anecdata?
there is still some https://prow.k8s.io/?repo=kubernetes%2Fkubernetes&type=presubmit&state=error, we still ought to do #18226
It's not as straightforward as jobs hitting "error" state, though there are those (I'm just as willing to chalk that up to "now that we'e asking for resources, we're discovering they're not available instead of finding out the hard way")
I'll see if I can find a better way to measure/express this. But it's the fact that humans have sat on PR's hitting "/test" or "/retest" continually. Here’s a quick scan of PRs that have merged recently in release-1.16, release-1.17, release-1.18 and master. Is this worse or better than before? I’m not sure. Is this sort of thing worth scripting and generating a report/metric? Maybe
https://github.com/kubernetes/kubernetes/pull/93927
Release-1.16
3 /test comments in the past 24h (integration, verify, gce-100)
https://github.com/kubernetes/kubernetes/pull/93813
Release-1.16
One bot /retest comment in the past 24h
https://github.com/kubernetes/kubernetes/pull/93924
Release-1.17
2 /test comments in past 24h (integration)
https://github.com/kubernetes/kubernetes/pull/93696
Release-1.17
1 /test comment in past 24h (kubemark)
https://github.com/kubernetes/kubernetes/pull/93812
Release-1.17
3 /retest comments in past 24h
https://github.com/kubernetes/kubernetes/pull/93754
Release-1.18
1 /test comment in past 24h (e2e-gce)
https://github.com/kubernetes/kubernetes/pull/93695
Release-1.18
1 /retest comment in past 24h
1 /test comment in past 24h (kind)
https://github.com/kubernetes/kubernetes/pull/93811
Release-1.18
1 /retest comment in past 24h
https://github.com/kubernetes/kubernetes/pull/93929
Master
2 /retest comments in past 24h
https://github.com/kubernetes/kubernetes/pull/93829
Master
0 /retest or /test comments, woo!
https://github.com/kubernetes/kubernetes/pull/93857
Master
3 /retest comments in past 24h
https://github.com/kubernetes/kubernetes/pull/93907
Master
1 /retest comment in past 24h
https://github.com/kubernetes/kubernetes/pull/93521
Master
1 /retest comment in past 24h
1 /test comment in past 24h (kind ipv6)
https://github.com/kubernetes/kubernetes/pull/93895
Master
0 /retest or /test comments in past 24h, woo!
https://github.com/kubernetes/kubernetes/pull/93893
Master
3 /retest comments in past 24h
https://github.com/kubernetes/kubernetes/pull/93831
Master
3 /retest comments in past 24h
Test flake fixes aren't always back ported to older releases, and I had some concerns about some recent CPU limits being set lower on older branches ...
Flakes on the master branch would be my greatest concern.
The first one I sampled had https://github.com/kubernetes/kubernetes/pull/93929#issuecomment-672893041
that's a failure to download things in bazel WORKSPACE, which doesn't really retry sufficiently. It's unrelated to the work here and not new.
Alright, we've got limits set on all of the merge-blocking jobs! I'm going to flip this test to fail instead of log
$ go test -v -count=1 ./config/tests/jobs/jobs_test.go
=== RUN TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed
--- PASS: TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed (0.00s)
What remains is:
CSV report generated by https://github.com/kubernetes/test-infra/tree/master/experiment/prowjob-report then imported into google sheets
https://docs.google.com/spreadsheets/d/1dgSxfa0jdaYk76S6DqJeazl2TKVAvZcCg-7Yx3U0j-I/edit#gid=1043959272
/close
Discussed during SIG Testing meeting today, we're calling this done!
@spiffxp: Closing this issue.
In response to this:
/close
Discussed during SIG Testing meeting today, we're calling this done!
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
I hope you like issues. Because I made some issues.