Bazel: Bazel performance regression caused by b321444

Created on 28 Aug 2019  路  31Comments  路  Source: bazelbuild/bazel

Description of the problem / feature request:

The nightly benchmarking today showed a regression in MacOS env from commit e2a9e4a to 242acd6:
https://perf.bazel.build/bazel/2019/08/27_orig/report.html

After running the benchmarking again for the range of commits that potentially regressed Bazel performance, there's significant evidence to suggest that b321444 caused a regression of 4-7% when doing build //src:bazel:

https://perf.bazel.build/bazel/2019/08/27/report.html

(it's clearest for Mac environment).

The run on buildkite: https://buildkite.com/bazel-trusted/bazel-bench-bisect/builds/1#9f5c44f7-4955-4bd5-a8e4-704e8a34ccc3
I've extracted the result tables here:

RESULTS:
Bazel commit: e2a9e4a3fc99d018e04753f702a6498a4099d2ae, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  127.753s             127.437s               1.742s
  system:    9.230s               9.200s               0.231s
  memory:   88.500s              89.000s               1.628s
     cpu:   88.381s              88.375s               3.245s

Bazel commit: 177731fe7cee35bf86f047af4f49042ad176962a, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  128.879s ( +0.88%)   129.864s ( +1.90%)     3.043s    0.68715
  system:    9.355s ( +1.35%)     9.455s ( +2.77%)     0.323s    0.88916
  memory:   89.100s ( +0.68%)    89.000s ( +0.00%)     0.300s    0.32492
     cpu:   88.834s ( +0.51%)    88.800s ( +0.48%)     3.881s    0.02521

Bazel commit: b321444237467f8730d01db634d43c8b0ea8d91b, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  132.352s ( +2.69%)   132.454s ( +1.99%)     2.899s    0.88916
  system:    9.260s ( -1.02%)     9.225s ( -2.43%)     0.268s    0.68715
  memory:   89.000s ( -0.11%)    89.000s ( +0.00%)     0.632s    0.02521
     cpu:   88.941s ( +0.12%)    88.250s ( -0.62%)     5.643s    0.32492

Bazel commit: 242acd6c07ccfe380bddf8d5a719c1cdfd6fdfb9, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  131.656s ( -0.53%)   131.805s ( -0.49%)     1.609s    0.02521
  system:    9.445s ( +2.00%)     9.425s ( +2.17%)     0.290s    0.68715
  memory:   89.000s ( +0.00%)    89.000s ( +0.00%)     0.000s    0.02521
     cpu:   88.999s ( +0.07%)    88.865s ( +0.70%)     3.346s    0.32492
================================================================================================

macos

RESULTS:
Bazel commit: e2a9e4a3fc99d018e04753f702a6498a4099d2ae, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  309.548s             309.460s               1.494s
  system:   71.081s              71.339s               0.566s
  memory:   87.100s              87.000s               0.300s
     cpu:   82.248s              81.194s               4.033s

Bazel commit: 177731fe7cee35bf86f047af4f49042ad176962a, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  310.194s ( +0.21%)   310.045s ( +0.19%)     1.742s    0.32492
  system:   70.743s ( -0.48%)    70.582s ( -1.06%)     0.652s    0.88916
  memory:   87.000s ( -0.11%)    87.000s ( +0.00%)     0.000s    0.00000
     cpu:   80.573s ( -2.04%)    81.925s ( +0.90%)     2.718s    0.68715

Bazel commit: b321444237467f8730d01db634d43c8b0ea8d91b, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  332.208s ( +7.10%)   332.197s ( +7.14%)     1.000s    0.99998
  system:   70.697s ( -0.07%)    70.609s ( +0.04%)     0.610s    0.02521
  memory:   87.000s ( +0.00%)    87.000s ( +0.00%)     0.000s    0.00000
     cpu:   81.668s ( +1.36%)    81.578s ( -0.42%)     2.602s    0.32492

Bazel commit: 242acd6c07ccfe380bddf8d5a719c1cdfd6fdfb9, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  331.374s ( -0.25%)   331.298s ( -0.27%)     2.149s    0.68715
  system:   70.640s ( -0.08%)    70.630s ( +0.03%)     0.405s    0.32492
  memory:   87.000s ( +0.00%)    87.000s ( +0.00%)     0.000s    0.00000
     cpu:   82.442s ( +0.95%)    81.965s ( +0.47%)     4.983s    0.32492

================================================================================================

RBE (ubuntu16, JDK8)

RESULTS:
Bazel commit: e2a9e4a3fc99d018e04753f702a6498a4099d2ae, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  130.368s             129.868s               2.859s
  system:    9.331s               9.255s               0.262s
  memory:   89.100s              89.000s               0.300s
     cpu:   97.250s              97.645s               2.393s

Bazel commit: 177731fe7cee35bf86f047af4f49042ad176962a, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  129.073s ( -0.99%)   128.737s ( -0.87%)     1.623s    0.32492
  system:    9.305s ( -0.28%)     9.270s ( +0.16%)     0.433s    0.32492
  memory:   88.900s ( -0.22%)    89.000s ( +0.00%)     0.300s    0.00000
     cpu:   93.273s ( -4.09%)    92.355s ( -5.42%)     2.708s    0.96895

Bazel commit: b321444237467f8730d01db634d43c8b0ea8d91b, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  134.438s ( +4.16%)   133.944s ( +4.04%)     2.091s    0.99998
  system:    9.376s ( +0.76%)     9.250s ( -0.22%)     0.394s    0.02521
  memory:   88.800s ( -0.11%)    89.000s ( +0.00%)     0.400s    0.00000
     cpu:   95.020s ( +1.87%)    96.560s ( +4.55%)     5.693s    0.68715

Bazel commit: 242acd6c07ccfe380bddf8d5a719c1cdfd6fdfb9, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  133.231s ( -0.90%)   133.382s ( -0.42%)     1.327s    0.32492
  system:    9.533s ( +1.67%)     9.420s ( +1.84%)     0.376s    0.88916
  memory:   88.900s ( +0.11%)    89.000s ( +0.00%)     0.700s    0.02521
     cpu:   94.191s ( -0.87%)    95.625s ( -0.97%)     3.352s    0.68715

================================================================================================

ubuntu18

RESULTS:
Bazel commit: e2a9e4a3fc99d018e04753f702a6498a4099d2ae, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  136.049s             137.951s               6.202s
  system:   10.050s              10.050s               0.538s
  memory:   88.300s              89.000s               1.487s
     cpu:   96.344s              94.405s               6.372s

Bazel commit: 177731fe7cee35bf86f047af4f49042ad176962a, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  127.523s ( -6.27%)   126.273s ( -8.47%)     4.430s    0.99878
  system:    9.095s ( -9.50%)     8.865s (-11.79%)     0.580s    0.99310
  memory:   88.200s ( -0.11%)    89.000s ( +0.00%)     1.470s    0.00000
     cpu:   91.227s ( -5.31%)    89.835s ( -4.84%)     4.968s    0.88916

Bazel commit: b321444237467f8730d01db634d43c8b0ea8d91b, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  136.405s ( +6.96%)   134.654s ( +6.64%)     4.864s    0.99983
  system:    9.467s ( +4.09%)     9.320s ( +5.13%)     0.550s    0.96895
  memory:   88.800s ( +0.68%)    89.000s ( +0.00%)     0.400s    0.02521
     cpu:   92.182s ( +1.05%)    89.760s ( -0.08%)     7.173s    0.02521

Bazel commit: 242acd6c07ccfe380bddf8d5a719c1cdfd6fdfb9, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval
    wall:  134.061s ( -1.72%)   133.913s ( -0.55%)     2.584s    0.68715
  system:    9.441s ( -0.27%)     9.365s ( +0.48%)     0.317s    0.02521
  memory:   88.700s ( -0.11%)    89.000s ( +0.00%)     0.458s    0.00000
     cpu:   91.768s ( -0.45%)    91.045s ( +1.43%)     5.100s    0.32492

A high pval suggests strong statistical evidence that there's a regression. From the report seems like most of the increase in wall time comes from the Execution phase.

P1 release blocker team-Rules-Java

Most helpful comment

Thanks Irina!

It looks much better again:
tPQP9S1xTiN

All 31 comments

@iirina could you help triage this?
cc @meisterT

Thanks for the report @joeleba. I can think of either https://github.com/bazelbuild/bazel/pull/8964 or https://github.com/bazelbuild/bazel/commit/10db812ec3d911e904717f4a3f9e1cf0389ff3ff#diff-15230e5dad94ffaefa7b526c0c1e89e5 that could cause performance regressions from java_tools. How can I reproduce this? Do you think this is a release blocker for bazel 1.0?

cc @benjaminp @cushon

To reproduce, you first need to clone the benchmarking tool: https://github.com/bazelbuild/bazel-bench.
Then from the bazel-bench project's root, run the following:

bazel run :benchmark \
--  \
--bazel_source=https://github.com/bazelbuild/bazel.git \
--project_source=https://github.com/bazelbuild/bazel.git \
--collect_memory \
--data_directory=/tmp/.bazel-bench/out \
--csv_file_name=perf_data.csv \
--runs=10 \
--bazel_commits=e2a9e4a3fc99d018e04753f702a6498a4099d2ae,177731fe7cee35bf86f047af4f49042ad176962a,b321444237467f8730d01db634d43c8b0ea8d91b,242acd6c07ccfe380bddf8d5a719c1cdfd6fdfb9 \
--project_commits=756096de03e165ef95b56728ae31afa2833caf5c \
-- build //src:bazel

The tool builds a bazel binary for each bazel commit, then performs bazel build //src:bazel with those binaries.

Thanks! Is there a way I can point to my local bazel repository? And what's the difference between project_source and bazel_source?

bazel_source can either be a local bazel repo or a remote one, from which you build your bazel binaries.
project_source is a local/remote repo on which you run benchmarking. In this case, project_source=https://github.com/bazelbuild/bazel.git because I'm benchmarking the performance of build //src:bazel.

I think this is a release blocker for Bazel 1.0. cc @dslomov as the release manager to decide.

I would like to understand what exactly caused the regression before making a call on 1.0 blocking

I'm running the performance script. It takes a while. I'll post the findings when it's done.

I did look at the action times in the JSON profile (on Mac). Some actions are faster, some are slower, but the sum of differences is amounts to ~250s compared to the faster build. Divided by the number of threads (12) this is roughly the slowdown that we see.

I agree with Irina that https://github.com/bazelbuild/bazel/pull/8964 is the most likely culprit.

The script finished but it doesn't show me the results for all the commits I wanted.

Here is the command I ran:

bazel run :benchmark -- --bazel_source=/usr/local/google/home/elenairina/bazel --project_source=https://github.com/bazelbuild/bazel.git  --collect_memory --data_directory=/tmp/.bazel-bench/out --csv_file_name=perf_data.csv --runs=10 --bazel_commits=580038ec43b52c00ec704b1ff139924a6575ba20,403496540806f3f5404e20919202e28fb4b68f18,23f6295eaa724f5ce32dcce2f749978c0afd7bfd,523183e183b1e26b2e8e5c898ebd79949ae2f846 --project_commits=756096de03e165ef95b56728ae31afa2833caf5c -- build //src:bazel

I actually wanted results for the last 2 commits I specified, which are my local commits using two different java_tools versions. Is there a way I can debug and see what went wrong?

@joeleba helped me to debug the script and it doesn't work with branches other than master. I'll rebase into my local master and re-run the script. I hope we'll get results by the end of day.

I managed to get some results o/

RESULTS:
Bazel commit: e6f2f5b17c85956db9c2318788b4e715767b7aca, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval   
    wall:  286.434s             285.822s               6.220s            
  system:   19.618s              19.580s               0.222s            
  memory:   85.000s              85.000s               0.000s            
     cpu:   59.353s              59.395s               2.665s            


Bazel commit: 0bd7273c74b0385baee503a0199e4fac70df8468, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval   
    wall:  262.399s ( -8.39%)   260.001s ( -9.03%)     7.569s    0.99983 
  system:   19.155s ( -2.36%)    19.005s ( -2.94%)     0.352s    0.99310 
  memory:   85.000s ( +0.00%)    85.000s ( +0.00%)     0.000s    0.00000 
     cpu:   57.724s ( -2.74%)    58.345s ( -1.77%)     2.109s    0.32492 


Bazel commit: 79892e8a6aba1a880e68c214edce628fb63dfd13, Project commit: 756096de03e165ef95b56728ae31afa2833caf5c, Project source: https://github.com/bazelbuild/bazel.git
  metric          mean                median          stddev      pval   
    wall:  255.801s ( -2.51%)   255.778s ( -1.62%)     2.652s    0.96895 
  system:   19.110s ( -0.23%)    19.105s ( +0.53%)     0.206s    0.32492 
  memory:   85.000s ( +0.00%)    85.000s ( +0.00%)     0.000s    0.00000 
     cpu:   58.186s ( +0.80%)    58.165s ( -0.31%)     2.890s    0.02521 

0bd7273c74b0385baee503a0199e4fac70df8468 is the commit with java_tools built without Turbine update.
79892e8a6aba1a880e68c214edce628fb63dfd13 is the commit with java_tools built without G1 change.
I'm not sure the report is helpful :confused:

There's a high pval for both commits.

I think I should have compared them individually with a baseline.

The commits are pushed here https://github.com/bazelbuild/bazel/pull/9318

What is e6f2f5b17c85956db9c2318788b4e715767b7aca?

A high pval (0.99983) means there's strong evidence that there's a difference in the distribution of the wall time measured between 0bd7273c74b0385baee503a0199e4fac70df8468 and e6f2f5b17c85956db9c2318788b4e715767b7aca. In this case, 0bd7273c74b0385baee503a0199e4fac70df8468 is ~9.03% faster (median).

So you can say that removing the Turbine update made bazel build //src:bazel faster by ~9.03%.

Similarly, removing G1 and Turbine made bazel build //src:bazel faster by ~1.62% compared to just removing Turbine.

Also note that running benchmarking locally brings about a certain degree of noise to the data, as it is not an isolated environment.

e6f2f5b17c85956db9c2318788b4e715767b7aca is the commit that was at head when I started the script (e6f2f5b17c85956db9c2318788b4e715767b7aca)

cc @cushon

Thanks Joe for the details! If I didn't mess anything up when performing the benchmark, the Turbine update in 10db812ec3d911e904717f4a3f9e1cf0389ff3ff made building bazel slower. Liam, do you know of any potential reasons why that might happen?

Btw, with bazelbuild/bazel-bench#66, you can now specify pre-built bazel binaries to benchmark (as an alternative to the rebasing-your-local-change-onto-master workflow). Example:

bazel run :benchmark \
-- \
--bazel_source=/usr/local/google/home/elenairina/bazel \
--project_source=https://github.com/bazelbuild/bazel.git  \
--collect_memory \
--data_directory=/tmp/.bazel-bench/out \
--csv_file_name=perf_data.csv \
--runs=10 \
--bazel_commits=580038ec43b52c00ec704b1ff139924a6575ba20,403496540806f3f5404e20919202e28fb4b68f18 \
--bazel_binaries=/path/to/custom/binary/23f6295eaa724f5ce32dcce2f749978c0afd7bfd,/path/to/custom/binary/523183e183b1e26b2e8e5c898ebd79949ae2f846 \
--project_commits=756096de03e165ef95b56728ae31afa2833caf5c \
-- build //src:bazel

That sounds great @joeleba! That should also make the script faster. Do you still need to specify bazel_commits when bazel_binaries are set?

In general:

  • --bazel_commits is for commits available in branch master of whichever repo you specified in --bazel_source. The script takes care of building the binaries from those commits.
  • --bazel_binaries can be used for any pre-built binaries (it could very well be a WIP, doesn't have to be associated with any commit). This provides more flexibility, but requires you to build the binaries yourself beforehand.

In this case, you can use a combination of these 2 flags: --bazel_commits for commits available on master, and --bazel_binaries for binaries built from your local changes.

Note that since bazelbuild/bazel-bench#66, if a commit is not available on master (like what happened in https://github.com/bazelbuild/bazel/issues/9270#issuecomment-527347313), the benchmarking script would fail.

Nice! Thanks for making these improvements!

https://github.com/bazelbuild/bazel/commit/872c11ae8d1fe9b74aaaf8874201ab7e0f2906df is the actual culprit.

on a Mac (with 6 cores聽+ HT):

  • before: 327s
  • after: 351s (+7.3%)

on a beefy Linux workstation (with 36 cores + HT):

  • before: 69s
  • after: 70s (+1.4%, but statistically insignificant)

on my specialist workstation (restricting to 6 virtual cores):

  • before: 247s
  • after: 257s (+4.0%)

Potential release blocker for 1.0 (#8573)

Should I pick 872c11a for 1.0? (#8573)

(Oh I see it is not complete yet...)

@dslomov I'm starting a java_tools patch release. I'll let you know when the cherry pick is ready.

@dslomov I'm starting a java_tools patch release. I'll let you know when the cherry pick is ready.

thanks!

Bazel was updated with a java_tools patch release that disables G1. @meisterT @joeleba Can you please confirm the performance is improved?

Thanks Irina!

It looks much better again:
tPQP9S1xTiN

That's great, thanks for checking!

Was this page helpful?
0 / 5 - 0 ratings