Go: x/build: add "slowbots" support

Created on 24 Sep 2019  ·  32Comments  ·  Source: golang/go

We aim to make TryBots be only 5 minutes, but sometimes people want long, deeper tests over all architectures.

We've heard from the compiler/runtime team (@martisch, et al) that a more broad TryBot mode would be useful.

I thought we had a bug for this already but can't find it. (Maybe it was a comment on another bug)

The plan I thought writing about elsewhere was that when clicking "+1 TryBot", you could write some magic syntax in the comment field on Gerrit to specify which extra bots run. Some other tool can help you construct that magic comment & do the Gerrit comment for you, like git-codereview does (when you specify git-codereview mail -trybot)

Builders FeatureRequest NeedsFix

Most helpful comment

I wrote the start of some docs:
https://github.com/golang/go/wiki/SlowBots

All 32 comments

Related:

  • #9858: x/build: trybots should rebase when testing Builders
  • #12482 x/build: optional auto-submit on +2 when trybots (with rebase) pass

Related:

Also related:

  • #34297 (codegen trybots)

Heh, @bradfitz used the term “SlowBots” in https://github.com/golang/go/issues/29239#issuecomment-447438842 too.

This would be fantastic for testing changes on the slow and difficult to set up mobile devices. It would have helped avoid the 4 part fix https://go-review.googlesource.com/c/go/+/193847/

Maybe we fold this into an umbrella issue for a Commit Queue that would cover a lot of the issues above?

Let's not. They're different enough and I was actually working on this one last week.

Change https://golang.org/cl/201204 mentions this issue: maintner/maintnerd: return TRY= comment directives in Run-TryBot votes

Change https://golang.org/cl/201338 mentions this issue: cmd/coordinator: add SlowBots support, opt-in to different slow trybots

It's alive! (on https://go-review.googlesource.com/c/go/+/171822/22#message-1627ac4750a897104245e2dc87cc20f838c0b057)

Screen Shot 2019-10-15 at 11 08 22 PM

/cc @ianlancetaylor

This is really helpful!

How do we know what tokens we can list in "TRY"? Is there an "all" alias?

The details of the TRY= directive syntax is mostly undefined/undesigned at this point, so opinions welcome. I went with some quick implementation for now just so I could implement something.

The tokens can currently be any builder name, or a handful of defined terms that map to the best builder for that term. They currently always add to the default trybot step.

The CL's dashboard/builders.go defines the well-known aliases at the top, but that's subject to change.

I'm somewhat reluctant to add "all", as that's pretty large and overkill, but maybe. I'd at least want to add some "mostly-all" or "each" (one of each GOOS GOARCH) too, which would be different in that "all" would include really everything, like, linux-386-387 and linux-386-nocgo and weirdo things, but "each" wouldn't? I'm not sure.

It would be great to be able to specify GOOSes and GOARCHes and get a reasonable set of builders for those (perhaps all of the "first-class ports" that match). It's generally pretty clear what GOOSes and GOARCHes you touched, and harder to manually map those to a useful set of builders.

One problem I've run into is that specifying a broken builder appears to leave to trybots stuck for a really long time. E.g., on CL 201402 I listed netbsd-arm-bsiegert, and now that I look at the dashboard, that builder clearly just isn't up. But the trybot run has been dutifully waiting an hour and twenty minutes at this point to get a spot on that builder, which means I don't get the trybot report back on the CL. I know these are slowbots, but maybe it should give up after, say, an hour?

This is fantastic ! Yes, a few handful of predefined terms which map to different levels of builder coverage would be very helpful.

I've been using this a bunch on my non-cooperative preemption changes since they touch a lot of architecture code and it's been incredibly helpful.

A few things that have caught me by surprise (these aren't necessarily wrong, but should probably be noted in any documentation):

  1. TRY= must be specified in the same comment that sets Run-TryBot+1.
  2. If there's already a Run-TryBot+1 comment for the latest PS, adding a TRY comment, even with Run-TryBot+1 in the same comment, won't start the slowbots. I think the only way to get out of this state is to upload a new PS or clear TryBot-Result (which requires special bits). This one is probably more of a bug than a feature.
  3. If there's a TRY comment on an older PS, setting Run-TryBot+1 on a new PS without specifying TRY again will run whatever the last TRY set specified was. This is probably the right behavior, but surprised me because of surprise 1. My mental model had been "whatever TRY is specified on the Run-TryBot+1 that triggered the trybots".

I got sidetracked by other stuff, but I just made a minor update tonight:

  • For the TRY= terms, you can now use any GOARCH, GOOS, or GOOS-GOARCH value, without having to know the exact builder name, or the best builder name. It now has a mapping of what the best one is. A GOARCH by itself means linux in general. (amd64p32 might mean nacl on release branches, but I forgot to include it and my tests don't catch it because it runs go tool dist list with my tip Go)
  • It will report a bit more on what it did. It's still terrible, but it's something. There's another bug open for making TryBot result pages permanent after they complete.

I wrote the start of some docs:
https://github.com/golang/go/wiki/SlowBots

Spit-balling some additional TRY= token matching ideas: (the existing ones would continue to work)

  • TRY=all-os (at least one of each GOOS, auto pick best GOARCH for each)
  • TRY=all-arch (at least one of each GOARCH, auto pick best GOOS for each)
  • TRY=all-os-arch and/or TRY=all-ports (at least one of each GOOS/GOARCH combo, auto pick best specific builder config for each)
  • TRY=all-os-arch-variants (literally everything. But spelled annoyingly long to make not reach for it too easily which I fear they would if it were called all)
  • TRY=windows-*-* all variants of Windows
  • TRY=dragonfly-amd64-* all variants of DragonFly (including bare dragonfly-amd64; it's not technically a glob match)

Thoughts?

I like the idea of biasing toward fewer builders. (Better than my suggestion to use build constraints, at any rate.)

Maybe wildcards consistently instead of all?

  • TRY=*-386 (one of each GOOS for GOARCH=386)
  • TRY=linux-* (one of each GOARCH for GOOS=linux)
  • TRY=dragonfly-amd64-* (all configurations of the named GOOS/GOARCH)
  • TRY=*-*-race (all race builders)
  • TRY=*-*-* (literally everything; not as annoyingly long but that Shift key is a bit of a reach)

@bcmills, that's exactly what I started with but I had some hesitation that it was too cryptic, especially with the one you didn't include:

  • TRY=*-* (one of each port; a GOOS/GOARCH combo)

Unlike TRY=*-*-*, I expect *-* to be somewhat common and reasonable, so I was afraid it was too ugly and would confuse people and all-ports would be more readable

But if you're cool with it, I'm fine with it.

I'm fine with *-*-race too but that's somewhat of a special case. That's the only well-known/magic third component name.

That's the only well-known/magic third component name.

At least, until we address #29641 and #26529. 😉

I expect *-* to be somewhat common and reasonable, so I was afraid it was too ugly and would confuse people and all-ports would be more readable

I suspect that *-* will be too expensive to use much in practice. (If we used it consistently, it would likely saturate the reverse builders for a lot of platforms that don't actually need that much bot coverage.)

So a bit of ugliness there may be a feature.

@bcmills, oh, the other reservation I had about using the * globby patterns is how do you say "one of each GOOS, regardless of GOARCH?" (e.g. we only have aix for ppc64, not amd64, and nacl for amd64p32/386/arm)

TRY=* doesn't sound right. And using a placeholder like _ for "any" like TRY=*-_ is cryptic.

Change https://golang.org/cl/203421 mentions this issue: dashboard: add SlowBot alias for freebsd-arm64-dmgk

Talking to myself,

TRY=* doesn't sound right. And using a placeholder like _ for "any" like TRY=*-_ is cryptic.

I suppose ? is a better globby pattern instead of _.

So TRY=*-? would mean all OSes on any (the best relevant) architecture.

Too cryptic?

TRY=geese would be cuter.

Change https://golang.org/cl/227778 mentions this issue: cmd/coordinator: consider explicitly requested builders as SlowBots

In order to resolve #38279 and make SlowBots more predictable, I am considering changing them slightly so that whenever a builder is explicitly requested via the TRY= syntax, it is always considered a SlowBot.

Previously, if a builder was explicitly requested via TRY= but also was a part of the default trybot set for a given CL, then it wasn't considered to be a SlowBot. So it was possible to write TRY=linux-amd64 and see "TryBots are starting ..." instead of "SlowBots are starting ...".

I've sent CL 227778 for review that implements this change. Please let me know if there's any feedback.

Change https://golang.org/cl/227780 mentions this issue: cmd/coordinator: consider explicitly requested builders as SlowBots

Change https://golang.org/cl/227779 mentions this issue: cmd/coordinator, dashboard: don't skip cmd/dist tests for SlowBots

Slowbots seem to be stickier than I imagined. See CL 238077. I requested a slowbot early on in development. I'd now like to run the normal trybot set, but it seems to still be picking up my original slowbot request, even though that request is now several patch sets in the past.

@randall77 That is intended behavior, documented at https://golang.org/wiki/SlowBots:

When running TryBots again later, the most recent TRY= comment on the current patchset is used. To turn it off set TRY= with an empty string after the equals sign. If the current patchset doesn't have a TRY= comment, the most recent non-empty TRY= comment is used.

If you already know that and would like to suggest a change, please open a separate issue about it.

I'm going to close this high level tracking issue as done.

By now, SlowBots are widely available and very useful for development purposes. We intend to continue to support them (or an alternative mechanism that provides equivalent functionality) as long as they're deemed useful, and we'll continue to improve them over time together with the rest of the build infrastructure. For all remaining bugs and usability improvements related to SlowBots, please open new issues with "x/build/cmd/coordinator:" prefix and SlowBots in the title.

Thank you @bradfitz and everyone else who contributed to making SlowBots a reality!

Was this page helpful?
0 / 5 - 0 ratings