macOS has a concept of thread QoS service classes. Bazel at the moment uses the default class when run from the command line. The documentation strongly recommends defining QoS classes for proper operation instead of relying on the default one. See https://developer.apple.com/library/archive/documentation/Performance/Conceptual/power_efficiency_guidelines_osx/PrioritizeWorkAtTheTaskLevel.html
We have found that the use of the default class causes two problems: first, Bazel can make the system unresponsive; and, second, Bazel can become very slow if it relies on system services that run at a lower service class (e.g. think of FUSE file systems).
I have been experimenting with this by explicitly declaring Bazel to run at the Utility level and this makes both problems go away. I think we should do this change, but then account for the many threads we run. In particular, it'd be nice if we defaulted to Utility but made the UI thread and the Bazel client higher priority to ensure snappier console output. (This latter detail is nice, but not a requirement in my opinion.) Of course, we must measure if this causes a performance regression.
I also tried setting Bazel to Background level. This made the system even more snappy but also prevented Bazel from fully utilizing all CPUs.
One last thing to note is that changing the QoS class of a program after it has started, even if it has just one thread, does not modify the class for the main thread. This means that any further spawned threads or subprocesses do not respect the class change and instead use whatever was set at the main thread level. The only way I've found so far of changing the class of the main level is by using posix_spawnattr_set_qos_class_np, which means we have to exec a process using posix_spawn. This makes the change slightly more complicated, but given that the Bazel client is already spawning the server, we can probably fit this in that code path.
I think we can consider this done for now. There are a couple of things that could be looked into to make this better, but I don't think they are critical:
The changes so far only modify the QoS of the server when started in the background, but have no effect on batch mode, the client, or whatever is run via bazel run.
Set QoS levels on a thread basis within the Bazel server. This would likely require some JNI glue and an audit of what each threadpool does and what it deserves. Or maybe just make sure that whatever thread is handling the UI is of a higher priority and leave everything else happen at the utility level.
I had to roll back the default QoS change. I'm now adding support for this feature behind a flag and will evaluate behavior on different classes (Utility vs. User-initiated).
Too bad about the rollback. Do you think that this is likely to make it into 0.25.0?
The default change was rolled back, but I added a --macos_qos_class startup flag that you can pass to re-enable this feature and customize Bazel to whichever QoS level you want.
At this point I don't think we should change the default for the whole process given that this resulted in slower builds for some people. But you can change it based on your observations, or on whether you prefer faster builds vs. a more responsive system.
Thanks. I see that's in the master version of the command-line reference, so I'll try it out with 0.25 or just build master if I get impatient. :)
Something I'd like us to do is have a "performance tuning" page where we document the various knobs in Bazel to adjust performance characteristics -- and documenting the impact of this new flag on system responsiveness would nicely fit there.
FTR most of the trouble we observed with QoS settings was because the services Bazel was using internally (think FUSE daemons) were running at a lower class than Bazel. We fixed this by raising the priority of those services instead.
That said, as I don't foresee us doing anything else in this area right now, I'm closing this bug.
This page isn't exactly what you're referring to, but it still might be good to add a reference here - https://docs.bazel.build/versions/master/skylark/performance.html
Edit: maybe not, actually...
That page is seemingly focused on rule authors (based on the path and the contents). We need something that's user-facing. @meisterT
I take it that change didn't make it into 0.25.2? I can't seem to find that flag:
ERROR: Unrecognized option: --macos_qos_class=utility
bazel version returns:
Build label: 0.25.2
Build target: bazel-out/darwin-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri May 10 20:50:40 2019 (1557521440)
Build timestamp: 1557521440
Build timestamp as int: 1557521440
It appears to be in the tag https://github.com/bazelbuild/bazel/blob/0.25.2/src/main/java/com/google/devtools/build/lib/runtime/BlazeServerStartupOptions.java#L487-L499 note that it is a startup option, not a build option
Gah. That's likely my mistake. Thanks!