Test-infra: Re-evaluate set of merge-blocking jobs for kubernetes/kubernetes

Created on 7 Aug 2020 · 16Comments · Source: kubernetes/test-infra

Pulling this out of https://github.com/kubernetes/kubernetes/issues/92937#issuecomment-662178519

Related to but not explicitly part of https://github.com/kubernetes/test-infra/issues/18551

One thing that's come up in discussion over why kubernetes PR's are so hard to merge (https://github.com/kubernetes/kubernetes/issues/92937) is whether we really need so many jobs to run for each and every single PR that is opened against kubernetes/kubernetes. There is a desire to see if we can trim the number of jobs down without sacrificing (too much) coverage.

Reasons for this are:

if we assume each job has a non-zero chance of flaking, fewer jobs means fewer chances for a PR to encounter a flake
if we assume jobs are flaking due to resource contention, fewer jobs running means more resources available for jobs to consume

/priority important-soon
/area jobs
/sig testing
/sig release

FYI @BenTheElder @liggitt @kubernetes/ci-signal

arejobs prioritimportant-longterm sirelease sitesting

Source

spiffxp

👍1

All 16 comments

Comment from @liggitt

I'd divide those into categories for improvement like this:

immediate: should probably be post-submits or periodics set to notify relevant owners:

pull-kubernetes-dependencies-canary

pull-kubernetes-e2e-gce-device-plugin-gpu (could keep a presubmit with always_run: false for manual triggering/testing if we want)

pull-kubernetes-files-remake

significantly overlapping tests ... can we move one to post-submit + notification:

pull-kubernetes-e2e-gce (cos+docker)

pull-kubernetes-e2e-gce-ubuntu-containerd (ubuntu+containerd ... already have containerd coverage via kind presubmits)

pair of performance tests (can we have just one performance presubmit?):

pull-kubernetes-kubemark-e2e-gce-big

pull-kubernetes-e2e-gce-100-performance

several tests that seem to just check ability to build... can we collapse these somehow:

pull-kubernetes-bazel-build

pull-kubernetes-typecheck

pull-kubernetes-cross

spiffxp on 7 Aug 2020

👍1

https://github.com/kubernetes/test-infra/pull/18728 - to address pull-kubernetes-e2e-gce-device-plugin-gpu

spiffxp on 7 Aug 2020

https://github.com/kubernetes/test-infra/pull/18649 - proposes demoting pull-kubernetes-e2e-gce in favor of pull-kubernetes-e2e-gce-ubuntu-containerd

spiffxp on 8 Aug 2020

https://github.com/kubernetes/test-infra/pull/18612 - moved pull-kubernetes-cross to optional and manually triggered

spiffxp on 8 Aug 2020

@mm4tt - re scalability related. We briefly talked about that offline too, and looking into past statistics there aren't many things that were discovered by kubemark-500, especially recently. So indeed it might make sense to keep it just as periodic (for faster detection comparing to 5k-node jobs).

wojtek-t on 10 Aug 2020

Yep, I agree
@spiffxp, do you need help with removing the kubemark-500 presubmit? Let me know

mm4tt on 10 Aug 2020

Opened https://github.com/kubernetes/test-infra/pull/18788, leaves the presubmit around but manually triggered / optional. If you'd rather remove entirely or do something else let me know

spiffxp on 12 Aug 2020

I think changing it to optional and manually triggered is much better - thanks!

wojtek-t on 12 Aug 2020

Still need to dedupe pull-kubernetes-e2e-gce / pull-kubernetes-e2e-gce-ubuntu-containerd, I dropped the ball on this one.

BenTheElder on 13 Aug 2020

Brief update, here's one snapshot as I try to find the right way to slice this:

From 2020-07-01 to today, we've gone from 14 to 12 merge-blocking jobs running for every PR against the main branch of kubernetes/kubernetes:

pull-kubernetes-bazel-build
pull-kubernetes-bazel-test
pull-kubernetes-conformance-kind-ga-only-parallel
pull-kubernetes-dependencies
~pull-kubernetes-e2e-gce~ (dropped by https://github.com/kubernetes/test-infra/pull/18832)
pull-kubernetes-e2e-gce-100-performance
pull-kubernetes-e2e-gce-ubuntu-containerd
pull-kubernetes-e2e-kind
pull-kubernetes-e2e-kind-ipv6 (added by https://github.com/kubernetes/test-infra/pull/18718)
~pull-kubernetes-files-remake~ (dropped by https://github.com/kubernetes/test-infra/pull/18524)
pull-kubernetes-integration
~pull-kubernetes-kubemark-e2e-gce-big~ (dropped by https://github.com/kubernetes/test-infra/pull/18788)
pull-kubernetes-node-e2e
pull-kubernetes-typecheck
pull-kubernetes-verify

spiffxp on 20 Aug 2020

From 2020-07-01 to today, we've gone from 18 to 12 always-run jobs running for every PR against the main branch of kubernetes/kubernetes:

pull-kubernetes-bazel-build
pull-kubernetes-bazel-test
pull-kubernetes-conformance-kind-ga-only-parallel
pull-kubernetes-dependencies
~pull-kubernetes-dependencies-canary~ (dropped by https://github.com/kubernetes/test-infra/pull/18421)
~pull-kubernetes-e2e-gce~ (dropped by https://github.com/kubernetes/test-infra/pull/18832)
pull-kubernetes-e2e-gce-100-performance
~pull-kubernetes-e2e-gce-device-plugin-gpu~ (dropped by https://github.com/kubernetes/test-infra/pull/18728)
pull-kubernetes-e2e-gce-ubuntu-containerd
pull-kubernetes-e2e-kind
pull-kubernetes-e2e-kind-ipv6 (added by https://github.com/kubernetes/test-infra/pull/18718)
~pull-kubernetes-files-remake~ (dropped by https://github.com/kubernetes/test-infra/pull/18524)
pull-kubernetes-integration
~pull-kubernetes-kubemark-e2e-gce-big~ (dropped by https://github.com/kubernetes/test-infra/pull/18788)
pull-kubernetes-node-e2e
~pull-kubernetes-node-e2e-containerd~ (dropped by https://github.com/kubernetes/test-infra/pull/18356)
pull-kubernetes-typecheck
pull-kubernetes-verify

spiffxp on 20 Aug 2020

👍1

pull-kubernetes-dependencies could certainly be faster but is relatively cheap overall and has a pretty excellent pass rate.
pull-kubernetes-typecheck is reasonably expensive, similar to compiling though cheaper than actually cross compiling, but quick enough and very reliable.

The rest of these involve non-trivial building at least for the beginning of the job, and tend to be have higher flake risk (actually flake rates vary quite a bit with time...).

None of them are terribly obvious candidate to remove at the moment, IMHO.

pull-kubernetes-verify is probably a candidate to parallelize better.
EDIT: previously having had pull-kubernetes-typecheck and pull-kubernetes-dependencies split out of it, it's still rather slow.

BenTheElder on 21 Aug 2020

👀1

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 19 Nov 2020

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 19 Dec 2020

/remove-lifecycle rotten

BenTheElder on 6 Jan 2021

related: https://github.com/kubernetes/test-infra/issues/6380 - document what the criteria are for merge-blocking

spiffxp on 8 Jan 2021

Was this page helpful?

0 / 5 - 0 ratings