In our bazel conversion experiment we have moved a small subset of apps, libraries, and tests over to bazel. In running head-to-head in CI, we get nearly 2x wall time on bazel. Some of that we can optimize away, however we end up with 73 of 75 tests running, on BUCK, in under 100ms, whereas bazel runs them in 1-3 seconds each. This leads to 3 minutes of Buck, 3 minutes of Gradle, and 6 minutes of Bazel executing the same build set.
While we have not fully tweaked all the optimization settings (some bits of the build we need make use workers, and sandboxing is expensive) our investigation accounted for that - the biggest cost is in test execution taking around 10x per test on Bazel vs. Gradle or Buck in our situation.
A few relevant pieces of the puzzle include:
Implement persistent worker support for the TestRunner, to avoid the setup/teardown costs associated with invoking a new TestRunner.
Make a project, make a bunch of tests. I don't have a repro setup at present, but will be working one up.
CentOS and MacOS (different numbers, same proportions)
bazel info release?INFO: Invocation ID: 329bc936-43a5-48ba-b3b3-17ea3f158122
release 0.22.0
In looking, there have been discussions of experimental persistent worker supporting TestRunner, but the code seems to have been deleted, and none of the instructions work anymore.
Example (redacted) test run iwth 16 tests.
internal:AllTest PASSED in 2.7s
internal:01Test PASSED in 0.8s
internal:02Test PASSED in 1.4s
internal:03Test PASSED in 1.9s
internal:04Test PASSED in 1.4s
internal:05Test PASSED in 1.3s
internal:06Test PASSED in 1.0s
internal:07Test PASSED in 1.7s
internal:08Test PASSED in 0.9s
internal:09Test PASSED in 1.9s
internal:10Test PASSED in 1.7s
internal:11Test PASSED in 1.5s
internal:12Test PASSED in 1.0s
internal:13Test PASSED in 1.4s
internal:14Test PASSED in 1.9s
internal:15Test PASSED in 0.8s
internal:16Test PASSED in 1.6s
Buck equivalent (didn't run the AllTest in this codebase)
[2019-02-26 02:29:04] PASS <100ms 3 Passed 0 Skipped 0 Failed 01Test
[2019-02-26 02:29:04] PASS <100ms 8 Passed 0 Skipped 0 Failed 02Test
[2019-02-26 02:29:04] PASS <100ms 7 Passed 0 Skipped 0 Failed 03Test
[2019-02-26 02:29:04] PASS <100ms 12 Passed 0 Skipped 0 Failed 04Test
[2019-02-26 02:29:04] PASS <100ms 16 Passed 0 Skipped 0 Failed 05Test
[2019-02-26 02:29:04] PASS <100ms 10 Passed 0 Skipped 0 Failed 06Test
[2019-02-26 02:29:04] PASS 109ms 9 Passed 0 Skipped 0 Failed 07Test
[2019-02-26 02:29:04] PASS <100ms 14 Passed 0 Skipped 0 Failed 08Test
[2019-02-26 02:29:04] PASS <100ms 20 Passed 0 Skipped 0 Failed 09Test
[2019-02-26 02:29:04] PASS <100ms 10 Passed 0 Skipped 0 Failed 10Test
[2019-02-26 02:29:04] PASS <100ms 2 Passed 0 Skipped 0 Failed 11Test
[2019-02-26 02:29:04] PASS <100ms 9 Passed 0 Skipped 0 Failed 12Test
[2019-02-26 02:29:04] PASS 120ms 9 Passed 0 Skipped 0 Failed 13Test
[2019-02-26 02:29:04] PASS <100ms 18 Passed 0 Skipped 0 Failed 14Test
[2019-02-26 02:29:04] PASS <100ms 2 Passed 0 Skipped 0 Failed 15Test
[2019-02-26 02:29:04] PASS 148ms 30 Passed 0 Skipped 0 Failed 16Test
Interestingly, there was an implementation of this but then it was deleted (0c9f2d4c15b761e3f3b863658b6d5c65bde6db22).
/cc @meisterT
Worth noting, one mitigation is auto-generation of per-package or per-top-level-package "AllTest" classes that contain @Suite stuff, and replacing any other java_test statements. That definitely reduces test execution times, but at the cost of smooshing together all the dependencies and removing any ability to do "affected test" filtering.
@meisterT Hmm, I was about to move this to team-Performance but you have just done the opposite. I think I heard from you that test setup is one of the major penalties we have now, performance-wise? Also, while this sounds like a "local execution request", is it really? I mean, is there something to change in the worker code in Bazel or what we really have to do is modify the workers themselves to support this request? Or maybe I don't understand the goal of team-Performance properly...
Pinging this again. Now that we have thousands of tests in our corpus, doing a per-package aggregate generated test suite (so as to run all junit test cases in one single test target) has shaved about 1/4-1/2 of our build time off, just by itself. Not having some way to avoid the extra tax of non-persistent worker invocation is a pretty big deal, when you don't have a build farm.
To give it some meat, an example run on a commit from yesterday (doing full builds, not "affected test" query magic) did this:
| | aggregate | individual |
|---|---|---|
| count | 396 | 1826 |
| nocache | 01:02:32 | 02:00:12 (timeout) |
| cache | 49:03 | 01:49:04 |
Now, times vary a lot, and we're working to reduce the deviation, but this is representative of build times, on these machines.
Pinging @lberki, author of https://github.com/bazelbuild/bazel/commit/0c9f2d4c15b761e3f3b863658b6d5c65bde6db22 - do you have more context / background on why the experiment didn't work out?
Huh, that was a while ago... more context is on (internal) b/25006099, but the short version is that it proved to be difficult to be both correct enough and fast enough. The following issues come to mind:
If I had to try again, I would try either jury-rigging something with native-image or CRIU so that the JVM startup and the test runner is AOT compiled, then dynamically load the actual code under test (handwave-handwave). That way, we'd get easily provable correctness without incurring (most of the) overhead.
That wouldn't help with JIT compiling potentially large amounts of code under test, though.
From 02f8da948bf3955304a4cef9399bd3907430bbc4, it seems this idea has made a comeback?
Indeed, Irina is giving it another try.
What's the status of this?
There's a working version in blaze, but not in bazel.
@iirina any chance this could be ported to bazel as well?
Didn't this just get released as an experimental flag?