In particular soak-gci-gce-1.X job is blocking for stable2, but is not blocking on stable1 and stable3. As a result soak job will be blocking for just a part of each release lifecycle. We should have a consistent set of blocking jobs for all patch releases for a single branch. I think it would be more correct if we just have a specific set of blocking jobs defined for each release, rather than have rolling stable1-3.
If that's not possible we should at least make sure the set of jobs is the same for all three (the downside of that is that it will be hard to change the set of blocking jobs).
cc: @krzyzacy @BenTheElder @kubernetes/sig-release-bugs
cc @spiffxp
they are mostly same except for, say, 1.12 doesn't have kubeadm upgrade jobs set up
we can probably bring in a presubmit to enforce this
I'll remove the soak-stable2 job from release blocking dashboard.
/area jobs
We absolutely should have the same jobs. The only time I can see that differing is if we add new blocking jobs in the current release that couldn't work against older releases.
A presubmit sounds good.
I haven't done a thorough audit, but last I checked we were also missing a scalability-related stable3 job.
/assign
I'll create a doc under sig-release to define a list of release-blocking jobs, and we can use that as a source of truth in the future.
/milestone v1.13
FYI @cjwagner @jberkus
The plan is to accomplish this before 2018-10-23 aka Enhancements Freeze of v1.13 release cycle:
I've done some comparison of master-blocking and 1.12-blocking. Here's where they don't match. Note that I'm using the actual job names below instead of the label you see in testgrid, because it's hard to figure out which job it is from the label:
Tests that are in 1.12-blocking with no equivalent in master-blocking:
Tests that are in master-blocking with no equivalent in 1.12-blocking
Also, note that several of the test jobs for 1.12-blocking are named "beta" instead of "1.12", which suggests that those may not be version-specific.
I unify the name of master-blocking dashboard with 1.12 a bit
for the no equivalent, if we take 1.12 as a source of truth, we have both postsubmit|periodic bazel job so we are probably fine with only keep one of them. @neolit123 might want to add a latest-release-on-master kubeadm job for consistency?
also do we really want all conformance tests from cloud providers to block release?
Also, note that several of the test jobs for 1.12-blocking are named "beta" instead of "1.12", which suggests that those may not be version-specific.
the release channels are defined at https://github.com/kubernetes/test-infra#release-branch-jobs--image-validation-jobs, so for each new release, we can rename the testgrid dashboard without remaking all the jobs
@neolit123 might want to add a latest-release-on-master kubeadm job for consistency?
i can add kubeadm-gce-stable-on-master in sig-release-master-blocking to make it consistent with -1.12-blocking.
also do we really want all conformance tests from cloud providers to block release?
that was something raised as a question last week with @spiffxp and @BenTheElder .
/milestone v1.14
The jobs aren't yet identical. I think we can close this out as we split up dashboards into -blocking/-informing etc. ref: kubernetes/sig-release#347
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle rotten
/milestone v1.15
ref: https://github.com/kubernetes/test-infra/issues/11977
/unassign
/sig release
/milestone v1.16
/cc @jberkus
Apparently "the gce reboot job" is informing in master, and blocking on all other branches:
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
AFAIK, this issue has not been resolved.
@Katharine @BenTheElder ?
/remove-lifecycle stale
I'm pretty over capacity and don't remember what we wanted here.
It's entirely reasonable imo to have different sets of jobs per release.
The config forker pretty much ensures that at each release we copy over jobs from master to that release.
/assign
I was the last to touch config-forker, so I'll try and close the loop on this.
/milestone v1.18
/area release-eng
@BenTheElder @justaugustus
For the 1.16 release, the config-forker did not copy from master to the release; it copied a different, template set of tests based on the 1.13 test set. That may have been fixed since; if it has, that's one solution to this issue.
The core problem was that the config-forker was copying a set of jobs that was based on no SIG-release determined set, but was instead historical and impossible to update.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
Where are we on this? @Katharine ?
This should've been resolved long ago, and the last set of inconsistent jobs expired out.
As for this comment:
For the 1.16 release, the config-forker did _not_ copy from master to the release; it copied a different, template set of tests based on the 1.13 test set. That may have been fixed since; if it has, that's one solution to this issue.
This is (or should be, to my knowledge) impossible, unless some awful misconfiguration happened. Why do you think that happened?
Because the set of jobs when 1.16 branch was created was different from the set of jobs in master, that's why. And when I asked about it, that's what test-infra folks said was why it happened.
Particularly, the slow performance tests had been moved from blocking to informing before the branch, but where back in blocking in the 1.16 set.
If it's copying from master now, then that's all good. I just wanted to check that it was.
I scanned through the release-blocking dashboards:
Once these three are closed, I'm going to call this closed unless there are any objections
/close
Please re-open if you think there's anything left to do here
@spiffxp: Closing this issue.
In response to this:
/close
Please re-open if you think there's anything left to do here
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.