Test-infra: Kubernetes CI Policy: critical jobs must be Guaranteed Pod QOS

Created on 29 Jul 2020  Â·  34Comments  Â·  Source: kubernetes/test-infra

Part of https://github.com/kubernetes/test-infra/issues/18551

This is a policy action Item out of [Policies to improve Kubernetes CI] discussed at SIG-Testing yesterday.

  • Why it’s necessary:

    • We believe jobs are getting starved of the resources they require, because the scheduler is not able to effectively place them on a node with sufficient resources

    • We believe that jobs that declare resource requests have a better chance at being placed on a node with sufficient resources for them (at _scheduling_ time)

    • We believe that jobs that declare resource limits are going to succeed because they have accurately declared resources, not because they are getting lucky and bursting above their requested resources

  • “Critical” here is release-blocking and merge-blocking for kubernetes/kubernetes

    • trying to avoid needing to tackle the long tail of O(hundreds) of job configurations that do not specify resource requests/limits today, and focus on the release/merge blocking jobs first

    • beyond Kubernetes release blocking / presubmit blocking tests figuring out priority fairly would be tricky. We've already started moving these specifically to CNCF owned GCP projects, so the tentative "priority" plan here is to just have dedicated cluster(s) for these workloads, versus general CI. If not for that, I think we should definitely consider it on those tasks at least.

  • Guaranteed QOS means

Checklist of release-blocking jobs: (h/t @tpepper)

Checklist of merge-blocking jobs (suggestions are based on metrics explorer, check against resource requests too!)

For release-blocking jobs:

  • [x] replace the dashboard list with actual checklist of job names, along with checkboxes for which ones have been done
  • [x] https://github.com/kubernetes/test-infra/pull/18556 - implement test that enforces this policy at presubmit time - log instead of error
  • [x] all individual release-blocking job issues listed above have been fixed (resources declared, jobs still pass after some soak time, jobs aren't declaring overly egregious resources)
  • [x] https://github.com/kubernetes/test-infra/pull/18751 - flip policy enforcement test from logging to erroring
  • [ ] declare that we have finished tuning release-blocking jobs

For merge-blocking jobs:

  • [x] decide that we are comfortable enough with the move to go1.15 to introduce the instability in CI signal that setting/tuning resource limits may introduce -
  • [x] replace the dashboard list with actual checklist of job names, along with checkboxes for which ones have been done
  • [x] https://github.com/kubernetes/test-infra/pull/18556 - implement test that enforces this policy at presubmit time - log instead of error
  • [x] all individual release-blocking job issues listed above have been fixed (resources declared, jobs still pass after some soak time, jobs aren't declaring overly egregious resources)
  • [x] https://github.com/kubernetes/test-infra/pull/18834 - flip policy enforcement test from logging to erroring
  • [x] declare that we have finished

(Punted "decide how we're going to measure success" to https://github.com/kubernetes/test-infra/issues/18785)

How to make a guess:

  • if the job is running in k8s-infra-prow-build, go look at its historical resource usage (see "how to see resources" below)
  • if the job has resources declared already, try using that as the limit
  • if there's a suggestion, try using that
  • see if there is a similar look release-blocking job (name: ci-...) and compare limits
  • merge-blocking are going to take more cpu/memory than release-blocking since they need to build/compile
  • jobs that want as much cpu as possible on a 8cpu node can't ask for 8 cpu's: not all is allocatable, and some overhead must be budgeted for 100m init containers that are patched onto the pod by prow (I see 7300m-7500m used in some places, would start on the low end)
  • NOTE: unfortunately we have no resource utilization history available to the community for jobs running in k8s-prow-build (@spiffxp added guesses based on ~same metric-explorer approach for k8s-prow-build)

How to see resources (note: this only works for jobs that are running in k8s-infra-prow-build)

Once the above has been completed, we can move on to the next step: migrating everything to a dedicated cluster.

help wanted kinbug prioritimportant-soon

Most helpful comment

I hope you like issues. Because I made some issues.

All 34 comments

/cc

Additionally: This should be enforced by test-infra presubmit, to prevent regressions.

If I can find time I will try to generate a list of jobs that need updates and their paths, throw it up in a sheet with check boxes so that people can claim them to avoid duplicating efforts.

Though not yet on community owned infra, build-master and build-master-fast both experienced timeouts this morning, likely due to resource constraints.

Example from build-master-fast: https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-build/1288690796073062402

Example from build-master: https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-build-fast/1288815869782134786

I added some notes around how to identify which jobs we're talking about here to the OP.

/cc

@BenTheElder I'm a bit confused here. For this: https://github.com/kubernetes/test-infra/issues/18159 you said that we don't really want presubmit in testgrid. Could you explain more? Thanks

@ZhiFeng1993 that's not related to this issue. Currently because they _are_ in testgrid, that's a quick way to find a lot of the jobs.
This issue is about how the jobs are configured, not about testgrid. See e.g. https://github.com/kubernetes/test-infra/pull/18471 mentioned in the original post.

Let's try to keep discussion here on-topic, lots of people are interested in this issue, it's going to take a lot of work, and github does not handle lots of comments well. :sweat_smile:

@tpepper you can subscribe to a github issue by clicking "subscribe" on the righthand side of the web UI :upside_down_face:

sig-release-master-blocking are:

config/jobs/kubernetes-sigs/kind/kind-release-blocking.yaml

  • [x] "kind-master-parallel" aka ci-kubernetes-kind-e2e-parallel

    • [x] has limits

    • [x] has requests

  • [x] "kind-ipv6-master-parallel" aka ci-kubernetes-kind-ipv6-e2e-parallel

    • [x] has limits

    • [x] has requests

config/jobs/kubernetes/sig-cli/sig-cli-config.yaml

  • [x] "skew-cluster-latest-kubectl-stable1-gce" aka ci-kubernetes-e2e-gce-master-new-gci-kubectl-skew

    • [x] has limits

    • [x] has requests

config/jobs/kubernetes/sig-cloud-provider/gcp/gce-conformance.yaml

  • [x] "Conformance - GCE - master" aka ci-kubernetes-gce-conformance-latest

    • [x] has limits

    • [x] has requests

config/jobs/kubernetes/sig-cloud-provider/gcp/gcp-gce.yaml

  • [x] "gce-cos-master-default" aka ci-kubernetes-e2e-gci-gce

    • [x] has limits

    • [x] has requests

  • [x] "gce-ubuntu-master-containerd" aka ci-kubernetes-e2e-ubuntu-gce-containerd

    • [x] has limits

    • [x] has requests

  • [x] "gce-cos-master-alpha-features" aka ci-kubernetes-e2e-gci-gce-alpha-features

    • [x] has limits

    • [x] has requests

  • [x] "gce-cos-master-reboot" aka ci-kubernetes-e2e-gci-gce-reboot

    • [x] has limits

    • [x] has requests

config/jobs/kubernetes/sig-cloud-provider/gcp/gpu/gpu-gce.yaml

  • [ ] "gce-device-plugin-gpu-master" aka ci-kubernetes-e2e-gce-device-plugin-gpu

    • [ ] needs limits (adding in #18621)

    • [ ] needs requests (adding in #18621)

config/jobs/kubernetes/sig-network/sig-network-misc.yaml

  • [] "gci-gce-ingress" aka ci-kubernetes-e2e-gci-gce-ingress

    • [] needs limits (adding in #18627)

    • [] needs requests (adding in #18627)

config/jobs/kubernetes/sig-node/node-kubelet.yaml

  • [x] "node-kubelet-master" aka ci-kubernetes-node-kubelet

    • [x] has limits

    • [x] has requests

config/jobs/kubernetes/sig-release/kubernetes-builds.yaml

  • [ ] "build-master" aka ci-kubernetes-build

    • [x] has requests

    • [ ] needs limits (adding in # (adding in #18620)

  • [x] "build-master-fast" aka ci-kubernetes-build-fast

    • [x] has requests

    • [x] has limits (added in #18619)

config/jobs/kubernetes/sig-scalability/sig-scalability-release-blocking-jobs.yaml

  • [x] "gce-cos-master-scalability-100" aka ci-kubernetes-e2e-gci-gce-scalability

    • [x] has requests

    • [x] has limits

config/jobs/kubernetes/sig-testing/bazel-build-test.yaml

  • [ ] "bazel-build-master" aka periodic-kubernetes-bazel-build-master

    • [ ] needs requests (adding in #18629)

    • [ ] needs limits (adding in #18629)

  • [ ] "bazel-test-master" aka periodic-kubernetes-bazel-test-master

    • [ ] needs requests (adding in #18630)

    • [ ] needs limits (adding in #18630)

config/jobs/kubernetes/sig-testing/conformance-e2e.yaml

  • [x] "conformance-ga-only" aka k8s-infra-prow-build

    • [x] has requests

    • [x] has limits

config/jobs/kubernetes/sig-testing/integration.yaml

  • [x] "integration-master" aka ci-kubernetes-integration-master

    • [x] has requests

    • [x] has limits

config/jobs/kubernetes/sig-testing/verify.yaml

  • [x] "verify-master" aka ci-kubernetes-verify-master

    • [x] has requests

    • [x] has limits

sig-release-1.19-blocking are:

config/jobs/kubernetes/generated/generated.yaml

  • [x] all have requests and limits

config/jobs/kubernetes/sig-release/release-branch-jobs/1.19.yaml

  • [x] have requests and limits

    • [x] "Conformance - GCE - 1.19" aka ci-kubernetes-gce-conformance-latest-1-19

    • [x] "node-kubelet-1.19" aka ci-kubernetes-node-kubelet-1-19

    • [x] "gce-cos-1.19-scalability-100" aka ci-kubernetes-e2e-gci-gce-scalability-beta

    • [x] "integration-1.19" aka ci-kubernetes-integration-beta

    • [x] "verify-1.19" aka ci-kubernetes-verify-beta

    • [x] "kind-1.19-parallel" aka ci-kubernetes-kind-e2e-parallel-1-19

    • [x] "kind-ipv6-1.19-parallel" aka ci-kubernetes-kind-ipv6-e2e-parallel-1-19

  • [ ] do NOT have requests and do NOT have limits

    • [ ] "gce-device-plugin-gpu-1.19" aka ci-kubernetes-e2e-gce-device-plugin-gpu-beta (adding in #18621)

    • [ ] "bazel-build-1.19" aka periodic-kubernetes-bazel-build-1-19 (adding in #18629)

    • [ ] "bazel-test-1.19" aka periodic-kubernetes-bazel-test-1-19 (adding in #18630)

  • [ ] have requests but do NOT have limits

    • [ ] "build-1.19" aka ci-kubernetes-build-1-19 (adding in #18620)

sig-release-1.18-blocking are:

config/jobs/kubernetes/generated/generated.yaml

  • [x] all have requests and limits

config/jobs/kubernetes/sig-release/release-branch-jobs/1.18.yaml

  • [x] have requests and limits

    • [x] "skew-cluster-latest-kubectl-beta-gce" aka ci-kubernetes-e2e-gce-master-new-gci-kubectl-skew-stable1

    • [x] "Conformance - GCE - 1.18" aka ci-kubernetes-gce-conformance-latest-1-18

    • [x] "node-kubelet-1.18" aka ci-kubernetes-node-kubelet-1-18

    • [x] "gce-cos-1.18-scalability-100" aka ci-kubernetes-e2e-gci-gce-scalability-stable1

    • [x] "integration-1.18" aka ci-kubernetes-integration-stable1

    • [x] "verify-1.18" aka ci-kubernetes-verify-stable1

    • [x] "kind-1.18-parallel" aka ci-kubernetes-kind-e2e-parallel-1-18

    • [x] "kind-ipv6-1.18-parallel" aka ci-kubernetes-kind-ipv6-e2e-parallel-1-18

  • [ ] do NOT have requests and do NOT have limits

    • [ ] "gce-device-plugin-gpu-1.18" aka ci-kubernetes-e2e-gce-device-plugin-gpu-stable1 (adding in #18621)

    • [ ] "bazel-build-1.18" aka periodic-kubernetes-bazel-build-1-18 (adding in #18629)

    • [ ] "bazel-test-1.18" aka periodic-kubernetes-bazel-test-1-18 (adding in #18630)

  • [ ] have requests but do NOT have limits

    • [ ] "build-1.18" aka ci-kubernetes-build-stable1 (adding in #18620)

sig-release-1.17-blocking are:

config/jobs/kubernetes/generated/generated.yaml

  • [x] all have requests and limits

config/jobs/kubernetes/sig-release/release-branch-jobs/1.17.yaml

  • [x] have requests and limits

    • [x] "skew-cluster-latest-kubectl-k8s-stable1-gce" aka ci-kubernetes-e2e-gce-master-new-gci-kubectl-skew-stable2

    • [x] "Conformance - GCE - 1.17" aka ci-kubernetes-gce-conformance-latest-1-17

    • [x] "node-kubelet-1.17" aka ci-kubernetes-node-kubelet-1-17

    • [x] "gce-cos-1.17-scalability-100" aka ci-kubernetes-e2e-gci-gce-scalability-stable2

    • [x] "integration-1.17" aka ci-kubernetes-integration-stable2

    • [x] "verify-1.17" aka ci-kubernetes-verify-stable2

    • [x] "kind-1.17-parallel" aka ci-kubernetes-kind-e2e-parallel-latest-1-17

    • [x] "kind-ipv6-1.17-parallel" aka ci-kubernetes-kind-ipv6-e2e-parallel-latest-1-17

  • [ ] do NOT have requests and do NOT have limits

    • [ ] "gce-device-plugin-gpu-1.17" aka ci-kubernetes-e2e-gce-device-plugin-gpu-stable2 (adding in #18621)

    • [ ] "bazel-build-1.17" aka periodic-kubernetes-bazel-build-1-17 (adding in #18629)

    • [ ] "bazel-test-1.17" aka periodic-kubernetes-bazel-test-1-17 (adding in #18630)

  • [ ] have requests but do NOT have limits

    • [ ] "build-1.17" aka ci-kubernetes-build-stable2 (adding in #18620)

sig-release-1.16-blocking are:

config/jobs/kubernetes/generated/generated.yaml

  • [x] all have requests and limits

config/jobs/kubernetes/sig-release/release-branch-jobs/1.16.yaml

  • [x] have requests and limits

    • [x] "Conformance - GCE - 1.16" aka ci-kubernetes-gce-conformance-latest-1-16

    • [x] "node-kubelet-1.16" aka ci-kubernetes-node-kubelet-1-16

    • [x] "gce-cos-1.16-scalability-100" aka ci-kubernetes-e2e-gci-gce-scalability-stable3

    • [x] "integration-1.16" aka ci-kubernetes-integration-stable3

    • [x] "verify-1.16" aka ci-kubernetes-verify-stable3

    • [x] "kind-1.16-parallel" aka ci-kubernetes-kind-e2e-parallel-latest-1-16

    • [x] "kind-ipv6-1.16-parallel" aka ci-kubernetes-kind-ipv6-e2e-parallel-latest-1-16

  • [ ] do NOT have requests and do NOT have limits

    • [ ] "gce-device-plugin-gpu-1.16" aka ci-kubernetes-e2e-gce-device-plugin-gpu-stable3 (adding in #18621)

    • [ ] "bazel-build-1.16" aka periodic-kubernetes-bazel-build-1-16 (adding in #18629)

    • [ ] "bazel-test-1.16" aka periodic-kubernetes-bazel-test-1-16 (adding in #18630)

  • [ ] have requests but do NOT have limits

    • [ ] "build-1.16" aka ci-kubernetes-build-stable3 (adding in #18620)

Thanks for the lists tim, I tried to consolidate in description

/cc

https://github.com/kubernetes/test-infra/pull/18556 enforces the policy in test form, but only logs instead of errors

I will start adding request/limit entries for release block based on the added guesses in the description on Monday. Ran out of time today :

I hope you like issues. Because I made some issues.

... AND they're already in the project board 🥇

re: from the description above

 #18580 - ci-kubernetes-e2e-gci-gce-ingress (TODO? are there not release-branch variants of this job ?)

config/jobs/kubernetes/generated/generated.yaml has ci-kubernetes-e2e-gce-cos-[k8sbeta|k8sstable1|k8sstable2|k8sstable3]-ingress and all of these already have requests and limits.

Alright, we've got limits set on all of the release-blocking jobs! I'm going to flip this test to fail instead of log

spiffxp@spiffxp-macbookpro:test-infra (master %)$ go test -v -count=1 ./config/tests/jobs
# ...
=== RUN   TestKubernetesReleaseBlockingJobsShouldHavePodQOSGuaranteed
--- PASS: TestKubernetesReleaseBlockingJobsShouldHavePodQOSGuaranteed (0.02s)
# ...

The merge-blocking situation is still pretty incomplete. I suspect we've at least missed a few release-branch jobs

spiffxp@spiffxp-macbookpro:test-infra (master %)$ go test -v -count=1 ./config/tests/jobs
# ...
=== RUN   TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-build ([]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-build ([]): container '' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-build ([]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-build ([]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-build ([]): container '' should have resources.requests[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-build ([]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.19]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.19]): container '' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.19]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.19]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.19]): container '' should have resources.requests[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.19]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.18]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.18]): container '' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.18]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.18]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.18]): container '' should have resources.requests[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.18]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.16]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.16]): container '' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.16]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.16]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.16]): container '' should have resources.requests[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.16]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([]): container '' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([]): container '' should have resources.requests[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.17]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.17]): container '' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.17]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.17]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.17]): container '' should have resources.requests[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-bazel-test ([release-1.17]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-dependencies ([]): container 'main' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-dependencies ([]): container 'main' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-dependencies ([]): container 'main' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-dependencies ([]): container 'main' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-dependencies ([]): container 'main' should have resources.requests[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-dependencies ([]): container 'main' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-gce-network-proxy-grpc ([]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-gce-network-proxy-grpc ([]): container '' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-gce-network-proxy-grpc ([]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-gce-network-proxy-grpc ([]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-gce-network-proxy-grpc ([]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.18]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.18]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.18]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.18]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.19]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.19]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.19]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.19]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind-ipv6 ([]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind-ipv6 ([]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind-ipv6 ([]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind-ipv6 ([]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-files-remake ([release-1.19]): container 'main' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-files-remake ([release-1.19]): container 'main' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-files-remake ([release-1.19]): container 'main' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-files-remake ([release-1.19]): container 'main' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-files-remake ([release-1.19]): container 'main' should have resources.requests[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-files-remake ([release-1.19]): container 'main' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.17]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.17]): container '' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.17]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.17]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.17]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.19]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.19]): container '' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.19]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.19]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.19]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([]): container '' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.16]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.16]): container '' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.16]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.16]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.16]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.18]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.18]): container '' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.18]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.18]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-kubemark-e2e-gce-big ([release-1.18]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.17]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.17]): container '' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.17]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.17]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.17]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.19]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.19]): container '' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.19]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.19]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.19]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.16]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.16]): container '' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.16]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.16]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.16]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([]): container '' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.18]): container '' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.18]): container '' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.18]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.18]): container '' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.18]): container '' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.19]): container 'main' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.19]): container 'main' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.19]): container 'main' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.19]): container 'main' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.19]): container 'main' should have resources.requests[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.19]): container 'main' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.17]): container 'main' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.17]): container 'main' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.17]): container 'main' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.17]): container 'main' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.17]): container 'main' should have resources.requests[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.17]): container 'main' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.18]): container 'main' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.18]): container 'main' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.18]): container 'main' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.18]): container 'main' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.18]): container 'main' should have resources.requests[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.18]): container 'main' resources.limits[memory] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.16]): container 'main' should have resources.limits[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.16]): container 'main' should have resources.requests[cpu] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.16]): container 'main' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.16]): container 'main' should have resources.limits[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.16]): container 'main' should have resources.requests[memory] specified
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-typecheck ([release-1.16]): container 'main' resources.limits[memory] should be non-zero
--- PASS: TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed (0.00s)
# ...

Looking at this test this morning (filtered on CPU not being zero so as to count the number of job file edits required to finish this out)

go test -v -run TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed  | grep  "resources.limits\[cpu\] should be non-zero"
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.18]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind ([release-1.19]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-e2e-kind-ipv6 ([]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.17]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.19]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.16]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([]): container '' resources.limits[cpu] should be non-zero
    TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed: jobs_test.go:1035: pull-kubernetes-node-e2e ([release-1.18]): container '' resources.limits[cpu] should be non-zero

Will ping @spiffxp later about me doing this work.

@RobertKielty you'll need to address comments on https://github.com/kubernetes/test-infra/pull/18668 and then that should take care of the node jobs

https://github.com/kubernetes/test-infra/pull/18691 is in flight for the kind jobs

I mentioned earlier today during SIG Testing meeting, but I suspect any of the issues that have been held open for soak time, making sure things are still running ok, etc. Can now probably be closed. @RobertKielty mentioned he was going to take a look at some. I will take a pass at some point but it may not be until Thursday at the rate I'm going

Anecdotally, while attempting to push some last minute PR's through the door for patch releases, it sure seems like merge-blocking presubmits are still flaking pretty badly.

@spiffxp can you include your anecdata?
there is still some https://prow.k8s.io/?repo=kubernetes%2Fkubernetes&type=presubmit&state=error, we still ought to do #18226

It's not as straightforward as jobs hitting "error" state, though there are those (I'm just as willing to chalk that up to "now that we'e asking for resources, we're discovering they're not available instead of finding out the hard way")

I'll see if I can find a better way to measure/express this. But it's the fact that humans have sat on PR's hitting "/test" or "/retest" continually. Here’s a quick scan of PRs that have merged recently in release-1.16, release-1.17, release-1.18 and master. Is this worse or better than before? I’m not sure. Is this sort of thing worth scripting and generating a report/metric? Maybe

https://github.com/kubernetes/kubernetes/pull/93927
Release-1.16
3 /test comments in the past 24h (integration, verify, gce-100)

https://github.com/kubernetes/kubernetes/pull/93813
Release-1.16
One bot /retest comment in the past 24h

https://github.com/kubernetes/kubernetes/pull/93924
Release-1.17
2 /test comments in past 24h (integration)

https://github.com/kubernetes/kubernetes/pull/93696
Release-1.17
1 /test comment in past 24h (kubemark)

https://github.com/kubernetes/kubernetes/pull/93812
Release-1.17
3 /retest comments in past 24h

https://github.com/kubernetes/kubernetes/pull/93754
Release-1.18
1 /test comment in past 24h (e2e-gce)

https://github.com/kubernetes/kubernetes/pull/93695
Release-1.18
1 /retest comment in past 24h
1 /test comment in past 24h (kind)

https://github.com/kubernetes/kubernetes/pull/93811
Release-1.18
1 /retest comment in past 24h

https://github.com/kubernetes/kubernetes/pull/93929
Master
2 /retest comments in past 24h

https://github.com/kubernetes/kubernetes/pull/93829
Master
0 /retest or /test comments, woo!

https://github.com/kubernetes/kubernetes/pull/93857
Master
3 /retest comments in past 24h

https://github.com/kubernetes/kubernetes/pull/93907
Master
1 /retest comment in past 24h

https://github.com/kubernetes/kubernetes/pull/93521
Master
1 /retest comment in past 24h
1 /test comment in past 24h (kind ipv6)

https://github.com/kubernetes/kubernetes/pull/93895
Master
0 /retest or /test comments in past 24h, woo!

https://github.com/kubernetes/kubernetes/pull/93893
Master
3 /retest comments in past 24h

https://github.com/kubernetes/kubernetes/pull/93831
Master
3 /retest comments in past 24h

Test flake fixes aren't always back ported to older releases, and I had some concerns about some recent CPU limits being set lower on older branches ...
Flakes on the master branch would be my greatest concern.

The first one I sampled had https://github.com/kubernetes/kubernetes/pull/93929#issuecomment-672893041
that's a failure to download things in bazel WORKSPACE, which doesn't really retry sufficiently. It's unrelated to the work here and not new.

Alright, we've got limits set on all of the merge-blocking jobs! I'm going to flip this test to fail instead of log

$ go test -v -count=1 ./config/tests/jobs/jobs_test.go
=== RUN   TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed
--- PASS: TestKubernetesMergeBlockingJobsShouldHavePodQOSGuaranteed (0.00s)

What remains is:

  • verifying a few of the merge-blocking jobs are still healthy with their limits in place
  • jobs aren't declaring overly egregious resources (I have spot checked this a little bit, and am inclined to defer doing this properly until after moving merge-blocking jobs to k8s-infra, so non-googlers can help with this)
  • declaring victory

/close
Discussed during SIG Testing meeting today, we're calling this done!

@spiffxp: Closing this issue.

In response to this:

/close
Discussed during SIG Testing meeting today, we're calling this done!

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings