Test-infra: tide: do not stop merging when one PR is unmergable

Created on 2 Nov 2018  路  11Comments  路  Source: kubernetes/test-infra

test-infra stopped merging for three hours while trying to squash merge when squash merging was blocked, #9890 #9976 #9997 #9999 #10000 were all passing, but #9890 was first and unmergable

/kind bug
/area prow/tide

cc @cjwagner @stevekuznetsov

areprotide kinbug lifecyclrotten

Most helpful comment

One option -- have tide comment on the PR with an explanation of the error and /hold to pull the PR out of the queue

All 11 comments

How can we both appease this issue and continue to promise that older PRs are merged first? In the issue you described it was an invalid merge strategy configured for the PR in question, right? I feel like we should alert an admin in this case and fix the issue, since the merges are failing for a infra-level problem usually.

We can always attempt to merge the older PR first but when a serial merge fails _because the merge failed_ we can attempt to merge another ready serial PR before testing again.

We definitely need to raise this to admins, but there's not a super great mechanism for that right now.

I don't think we should promise that older PRs are merged first, it should be best effort. This same situation can occur with github API flakes or.... We should not get into a situation where this sits for 3 hours doing nothing.

Even _if_ we had alerted the prow admins, what about a EG a weekend? The prow admins don't necessarily have the same productive hours as the users, we should avoid tide stalling out for any reason.

Sounds good, let's think through the implications of this best-effort in terms of the current workflow to make sure it doesn't lead to weird states we had not considered before.

One option -- have tide comment on the PR with an explanation of the error and /hold to pull the PR out of the queue

One option -- have tide comment on the PR with an explanation of the error and /hold to pull the PR out of the queue

That seems reasonable and is a lot simpler than adding state to tide to track failed merges, but there is the down side of not retrying the merge before giving up meaning that transient failures could kick PRs out of the merge pool.
In particular if a repo requires the tide status context (which isn't recommended but does have use case) we typically fail to merge the PR on the first attempt because the status controller hasn't set the tide status to passing yet. We would need to synchronize Tide's status and sync controllers to avoid the race that causes that issue: https://github.com/kubernetes/test-infra/issues/6289

This also assumes that the hold plugin is enabled on the repo.

checkconfig will error if you require the tide context and it's considered harmful to do so, so why should we support users that do it?

Good point that this requires the hold plugin, I guess we could add a tide-specific label that it won't merge into or choose one at random from the list of invalid labels in the query ;)

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sjenning picture sjenning  路  4Comments

cjwagner picture cjwagner  路  3Comments

spzala picture spzala  路  4Comments

chaosaffe picture chaosaffe  路  3Comments

BenTheElder picture BenTheElder  路  3Comments