Cockroach: build: stop breaking master (a.k.a. implement Bors)

Created on 8 Feb 2018  Β·  16Comments  Β·  Source: cockroachdb/cockroach

Let's stop breaking master.

The Not Rocket Science Rule Of Software Engineering (source)

automatically maintain a repository of code that always passes all the tests

Why

Breaking master is a bad thing:

  • People should be able to reliably build a fresh pull from master.
  • Rebasing onto the latest master is the right thing to do, this should be reliable.
  • Borked commits in our master history break git bisect.
  • Noise in our build notifications can hide real problems that come up.

(Also:) tiny changes should have a merge-on-green option so you don't have to sit there waiting for TC to complete.

What

Bors-NG.

  • A continuous integration bot.
  • Awaits reviewer approval on PRs, builds them as though they were merged, only merges on green.
  • Updates PR with merge status.

How

  • [x] πŸ— Setup TeamCity to build Bors' staging and trying branches
  • [x] :robot: Set up hosting for Bors

    • [x] :octocat: Create GitHub app for our self-hosted Bors

    • [x] :ship: Deploy container to GCE

    • [x] πŸ“‡ Setup DNS

    • [x] πŸ” Switch to use HTTPS

  • [x] πŸ“„ Add a bors.toml file to the each branch:

    • [x] πŸ“ƒ master branch (#24100 and #24420)

    • [x] πŸ“ release-2.0 (#24323 and #24446)

    • [x] πŸ—ƒ release-1.1 (#24324 and #24447)

    • [x] πŸ—„ release-1.0 (#24325 and #24448)

  • [x] β˜‘οΈ Validate that everything's working as expected

    • [x] πŸ€• Try to merge a broken PR (#24294)

    • [x] πŸŽ‹ Try to merge something to a target branch other than master (#24298)

    • [x] ⛔️ Try to merge a PR with the label do-not-merge (#24100)

  • [x] πŸ“– Document Bors and the new workflow (#24292)
  • [x] πŸ‘¨β€πŸ‘§β€πŸ‘¦ Add every user as a reviewer
  • [x] ℹ️ Inform the team about the new workflow
  • [x] 🚯 Disable manual merges to each branch

    • [x] 🚳 master

    • [x] πŸ”• release-2.0

    • [x] 🚫 release-1.1

    • [x] πŸ™…β€β™‚οΈ release-1.0

  • πŸ’Έ Profit

Open Questions

  • Release branches?

~On the one hand, cherry-picks to them are infrequent enough that it's not as critical. On the other hand, many of the same considerations apply as much or more to release branches, particularly right before picking a release candidate. Let's punt it for now, but it's probably a good idea to do as a next step.~

We get them out of the box, they will require two extra steps per branch: cherry-picking #24100 to each one and then disabling of manual merges - see https://github.com/cockroachdb/cockroach/issues/22499#issuecomment-367446109.

  • Flaky tests?

This won't fix those, but it surely won't make them worse. Perhaps it will even give people more of an incentive to root out any issues with flakiness.

C-cleanup

Most helpful comment

Woohoo! Bors has merged its first PR!

All 16 comments

cc @benesch @jordanlewis @knz

Nice write up! I'm all for trying this out.

One (relatively minor) downside that I think should be mentioned is that average time-to-merge may actually go up as a result of this strategy, because each PR will have to be built against the most recent master. Let's say I've submitted a small PR. The reviewer waits until CI is green to review (cycle 1), reviews, and hits the LGTM button soon after. In the intervening moments, let's say another PR got merged to master. At that point, I will have to ask the bot to rebuild the branch against master if it's changed (cycle 2).

I'm not trying to say this delay is worse than occasionally broken master, but just wanted to point it out for the record!

Could you elaborate on your first point - "Setup TeamCity to build staging and trying branches"? I'm not sure what that means, and I think it might already be done.

You've got a couple of steps out of order: you have to inform the team about the new workflow before disabling manual merges.

Also, as to timing: Don't change anything before 2.0 is released. We don't want to disrupt everyone's workflow at this point in the development cycle.

A lot of our recent master breaks have come from the fact that we don't run the full test suite in race mode on PRs. Just turning that back on will improve the situation without changing workflow (although one advantage of the move to bors would be that by moving when the CI build happens relative to the decision to merge, we don't have to be as sensitive to the time it takes to build)

@jordanlewis I believe your concern about delay is mitigated by Bors' batching strategy. All the PRs approved in the past x minutes get merged together onto the staging branch. If the batch fails, Bors bisects the various PRs to identify which ones can cleanly merge and gets those on. (I'm not sure I understand your point about waiting until CI is green to review, perhaps we should chat in person).

@bdarnell ack on the wait for next week. I had expected those last two steps would happen roughly simultaneously, but it's probably worth reordering them. Done.

Oh, and @jordanlewis yes, I expect TeamCity is already building the bors branches by default, but wanted to include it here to make sure that when I went to start checking things off I make sure to validate that's the case.

I'm not sure I understand your point about waiting until CI is green to review

Maybe I'm the only one but if I get to a review and it's got a failed build that doesn't look like a flake, I will sometimes not review it until later. But yeah I suppose I was exaggerating there - I don't think people typically wait for a full cycle to complete.

I expect TeamCity is already building the bors branches by default

If bors keeps its own branches that it expects to get built, that aren't PR branches, then we will in fact need to configure TC to do something different. Thanks for clarifying!

ack on the wait for next week.

Wait for the release of 2.0, not just the cutting of the branch.

I'm looking at this again with the hope of getting it going as soon as we can. @bdarnell what do you think about going through the first four steps of the process before the release date? I'm ready to go with those, and as far as I can tell those steps won't impact anyone else.

I bugged Ben about this at dinner tonight and it sounds like we have the go
ahead to start experimenting with Bors provided it doesn’t break anyone’s
workflow or take us away from more important 2.0 work. Apologies if I’m
misrepresenting your words, Ben.

On Tue, Mar 20, 2018 at 10:16 PM Andrew Couch notifications@github.com
wrote:

I'm looking at this again with the hope of getting it going as soon as we
can. @bdarnell https://github.com/bdarnell what do you think about
going through the first four steps of the process before the release date?
I'm ready to go there, and as far as I can tell those steps won't impact
anyone else.

β€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/cockroachdb/cockroach/issues/22499#issuecomment-374816758,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA15IMCWPbRFI2oKL207C5wP6cDh4l94ks5tgbgUgaJpZM4R-iHY
.

I've noticed two open Bors issues that mildly affect usability:

  • bors-ng/bors-ng#165, which suggests supporting teams as reviewers. without it, we'll be manually adding folks to the reviewers list.
  • bors-ng/bors-ng#163, which observes that it's not sufficient for issue numbers to be in the commit message, they must be in the PR description to be closed when Bors merges the PR.

Yeah, you can go ahead and start working on this.

Woohoo! Bors has merged its first PR!

I think people understand why breaking master is bad; can you explain more about how bors prevents merge skew? (which is what usually breaks master)

Definitely excited for this if it fixes merge skew!

Bors prevents merge skew because it tests the results of the merge and not the HEAD of the PR branch. Or rather, bors prevents merge skew which results in test failures. If you have merge skew that has everything pass, nobody can detect it πŸ™ˆ But this is very rare.

Bors serializes all merges and does so with some optimizations to save on CI runs (roll-ups of multiple changes into one merge, etc).

Hey, it's all done. Hooray! :tada:

Was this page helpful?
0 / 5 - 0 ratings