Test-infra: Implement tide and replace the submit queue

Created on 2 Aug 2017  ยท  46Comments  ยท  Source: kubernetes/test-infra

Design doc here, requires you to be a member of either kubernetes-dev or kubernetes-sig-testing. Please comment there with overall design decisions rather than on this issue.

Steps:

  • [x] Implement tide's core loop for serial jobs. Start it up in dry run mode where it simply prints actions it would take. #4113
  • [x] Implement batches. #4479
  • [x] Turn it on in test-infra.
  • [x] Implement the user interface in cmd/deck.
  • [x] Use it in test-infra for a week or so. Demo with contribex and convince the world that it's a good idea to switch over the rest.
  • [x] Turn down the submit queue everywhere in favor of tide.

๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ

areprow

Most helpful comment

retest-not-required-docs-only labeling can be turned off since tide doesn't support this concept (turned off in https://github.com/kubernetes/test-infra/pull/9452)

All 46 comments

I expect plenty of ๐ŸŒŠ๐ŸŒŠ๐ŸŒŠ :smile:

@kubernetes/sig-contributor-experience-proposals @kubernetes/sig-testing-proposals

@spxtr seems like we also need to have in place the finer-grained management of PRs that the submit queue is doing today where specific status contexts are required as opposed to looking through all prowjobs, ie. all jobs that ever run for a PR.

Related: https://github.com/kubernetes/test-infra/issues/4387

If we need to do that then we can add a config option. That's also necessary if the repo is using travis or something. We will definitely ignore jobs that are set to skip reporting.

Can we make sure that this piece of configuration is global? That is, you set it in one place, but the dashboard, tide, etc. all look at the same place?

Yes, it will all live in prow/config.yaml.

Want to make sure we support https://github.com/kubernetes/sig-release/issues/26

Other blocks that came up during sig-testing meeting:

  • ~A front end for tide that shows merge requirements and history~ (that's the web ui checklist item above, natch)
  • We need to either stop reporting non-required contexts to github or make tide aware of which contexts are required.

We need to either stop reporting non-required contexts to github or make tide aware of which contexts are required.

ref: https://github.com/kubernetes/test-infra/issues/4932

The issue with tide not being aware of which contexts are required is when leftover contexts that are not required are broken or when a new blocking job is added and its context is missing from all PRs. I think fixing #4932 and making sure tide knows what jobs are required will get us what we want.

We need to either stop reporting non-required contexts to github or make tide aware of which contexts are required.

I would like to live in a world where all green checkmarks on a PR means it will merge, and any red X means it will not. #4932 moves us there :)

making sure tide knows what jobs are required will get us what we want.

One question is whether we want to build in a separate list of required contexts, or simply use always-run non-skip-reporting jobs from the config only.

One question is whether we want to build in a separate list of required contexts, or simply use always-run non-skip-reporting jobs from the config only.

I like the idea of just using the config, but we have some contexts that are not controlled by Prow, but should still be required. Notably the CLA context.

At some point I think we ought to block on run_if_changed + non-skip_report
as well (IE cross)

On Oct 24, 2017 14:43, "Joe Finney" notifications@github.com wrote:

We need to either stop reporting non-required contexts to github or make
tide aware of which contexts are required.

I would like to live in a world where all green checkmarks on a PR means it
will merge, and any red X means it will not. #4932
https://github.com/kubernetes/test-infra/issues/4932 moves us there :)

making sure tide knows what jobs are required will get us what we want.

One question is whether we want to build in a separate list of required
contexts, or simply use always-run non-skip-reporting jobs from the config
only.

โ€”
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/test-infra/issues/3866#issuecomment-339141854,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA4Bq-ZwVlhYt7qjuF1ILmtli9Q_sruhks5svloCgaJpZM4OrnfG
.

simply use always-run non-skip-reporting jobs from the config only.

Should be enough but needs to be documented.

@kargakis What about contexts like the CLA context? It should still be required, but isn't in the config.

Why does tide need to care about the cla context? I thought that it was checking for the cla label.

Why does tide need to care about the cla context? I thought that it was checking for the cla label.

The CLA bot bugs and people manually mark the CLA ok? If tide only pays attention to specific contexts this isn't a problem, if it expects all contexts to be green this becomes more annoying.
I'd prefer all contexts to be green, but things like this might be worth considering.

I think support additional contexts required to merge is a useful feature. I've used it in the past to require a travis context to pass for example, and it's a super convenient integration point for external tools to have a say in the merge process.

Could we automatically require all that have skip_report: false, as well as support an additional_required_contexts field or something?

Could we automatically require all that have skip_report: false, as well as support an additional_required_contexts field or something?

I'm not sure I understand why this is needed or what this would do exactly? Right now tide blocks on all contexts on GitHub being green regardless of source, in the future we might want to have a white or blacklist that can be configured in Tide for which contexts exactly should be green. Even then though the prow jobs shouldn't need to set anything themselves and there can be more than one prow job configuration per context (eg for different branches).

Ah - that makes sense. I thought tide simply checks the prow config for jobs that have skip_report: false and merged when all of those were green. Checking the GitHub API and expecting all to be green makes a lot more sense!

Yeah, you can see where it does that here with a TODO to look at the jobs, which we still seem to be coming to consensus on how to handle :-)
https://github.com/kubernetes/test-infra/blob/70f5bea97c31be846b999a58a9828dc4fda9568b/prow/tide/tide.go#L170-L188

We discussed during sig-testing yesterday, and are planning to wait until after the 1.9 release to roll out tide for kubernetes/kubernetes. We will continue to canary it elsewhere, as it's been going fine in kubernetes/test-infra. There was some question over whether we've done adequate load testing, which openshift might be willing to help us with.

Discussed during sig-testing yesterday, we're working on making the tide status less opaque, and would like to demo tide to the community before turning on for k/k. In the meantime, I've been rolling tide out to more kubernetes repos.

The submit queue has been turned down everywhere except for kubernetes/kubernetes, #6417 tracks that effort

As I'm going through our labels it occurs to me that tide's not going to support the retest-not-required or retest-not-required-docs-only workflow that some contributors are used to https://github.com/kubernetes/test-infra/issues/5334

@spiffxp OTOH, tide only triggers retests when the branch is out-of-date, whereas the submit queue always triggered retests.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale
/milestone 1.12
This didn't happen for 1.11. Are we going to be able to make it happen for 1.12?

/milestone 1.12
... now that the milestone plugin is actually enabled for this repo

Discussed this at sig-contribex yesterday, and discussing during community meeting today. Would like to notify kubernetes-dev afterward

@cblecker is using the migratestatus tool to retire the Submit Queue status with a note to look at the tide status instead, since GitHub doesn't allow us to delete existing statuses on PR's

https://submit-queue.k8s.io/ currently timing out, can we redirect this somehow

@spiffxp we could probably make it redirect to tide, yes

The "Submit Queue" context sorted to the top, "tide" does not, maybe we should rename the tide context: https://github.com/kubernetes/test-infra/issues/9324

https://submit-queue.k8s.io now redirects to the Tide status page.

Current status:

  • kubernetes-misc-mungers is running with mungers: stale-green-ci,comment-deleter
  • kubernetes-cherrypick is running with mungers: cherrypick-must-have-milestone,cherrypick-clear-after-merge,cherrypick-queue (cherrypick.k8s.io is up)

I implied we were going to turn all of those off in the k-dev e-mail, is this something we can get to today? I'm going to survey cherrypick usage in the meantime.

Opened https://github.com/kubernetes/test-infra/issues/9336 to track turning down the remaining mungers

Noticed tide's UI could stand to display more query info: https://github.com/kubernetes/test-infra/issues/9335

This is done!

@qhuynh96 is also working on improving the query info :-) PRs are in flight

We missed the fact that a submit-queue badge was on k/k https://github.com/kubernetes/kubernetes/pull/68642

retest-not-required-docs-only labeling can be turned off since tide doesn't support this concept (turned off in https://github.com/kubernetes/test-infra/pull/9452)

We needed to update tide to block merges on needs-kind and needs-sig labels for v1.12 https://github.com/kubernetes/test-infra/pull/9364

We need to redo the dashboard that was driven by MGH stats https://github.com/kubernetes/test-infra/issues/9458

We missed removing splice as part of removing MGH https://github.com/kubernetes/test-infra/pull/9520

Was this page helpful?
0 / 5 - 0 ratings

Related issues

chaosaffe picture chaosaffe  ยท  3Comments

benmoss picture benmoss  ยท  3Comments

lavalamp picture lavalamp  ยท  3Comments

BenTheElder picture BenTheElder  ยท  4Comments

fejta picture fejta  ยท  4Comments