Running a clean build on Mac OS X Sierra hangs for 10-20 seconds before actually executing. Running jstack shows that it is stuck in NetUtil.findShortHostName, trying to resolve the localhost name. The following blog post shows that this is a common issue, and suggests adding entries to /etc/hosts which fixes the problem. I suggest that possibly the result of this lookup should be cached by the blaze server?
Detailed description of this issue: https://thoeni.io/post/macos-sierra-java/
Bazel pauses for ~20 seconds before printing anything:
$ ~/bazel/bin/bazel build //prpcpython:test_prpc
INFO: Found 1 target...
Target //prpcpython:test_prpc up-to-date:
bazel-bin/prpcpython/test_prpc
INFO: Elapsed time: 20.563s, Critical Path: 0.00s
The java.log showing the 20 second pause:
170818 13:54:31.431:I 1711 [com.google.devtools.build.lib.skyframe.LegacyLoadingPhaseRunner.execute] Target pattern evaluation finished
170818 13:54:31.431:I 1711 [com.google.devtools.build.lib.runtime.CacheFileDigestsModule.logStats] Accumulated cache stats before command: hit count=5, miss count=365, hit rate=0.013513513513513514, eviction count=0
170818 13:54:51.515:I 1711 [com.google.devtools.build.lib.buildtool.BuildTool.buildTargets] Configurations created
170818 13:54:51.516:I 1711 [com.google.devtools.build.lib.analysis.BuildView.update] Starting analysis
Unfortunately, I'm not entirely sure exactly what configuration triggers this problem since my co-worker has the same OS version and doesn't run into this. However for me the following things reproduce this issue:
ping computername.local and observe that it also takes > 10 seconds./etc/hosts to add a name and observe that the problem goes away.bazel info release): release 0.5.4rc3I can confirm this happened for two coworkers and me. We resolved by having Screen Sharing turned on, but obviously that's not a good fix for the underlying problem.
+1 Interested if someone can track this down!
/cc @c-parsons can you take a look?
also cc @jmmv for perf on mac
@mhlopko looked into this a while ago and I thought fixed it via https://bazel-review.googlesource.com/c/bazel/+/5432, but there may still be something left to do. If you still remember about this, do you have any other details?
Hmm I only fixed it for Workspace status action, but checking the callers of NetUtil.findShortHostName I see that it's being called from other places, in this context only LocalSpawnRunner seems relevant. I've never touched that part of the codebase, and a quick (<5 min) search failed to reveal any per-blaze-invocation calls to findShortHostName. So I don't know what's wrong here off the bat. I bet @philwo will know :)
I think we should resolve the local hostname once on server startup and print a warning message if that takes longer than e.g. 5 seconds, to make sure that people don't blame Bazel for what is essentially a local configuration or network issue. Then we should use that hostname consistently during server lifetime.
Rats, I thought I posted the entire stack trace, but apparently not. If it is helpful I can dig it up again. FWIW: I think @philwo 's suggestion sounds perfect.
As @philwo educated me, LocalSpawnRunner is instantiated for every blaze invocation, so we found the source of the problem.
Does anybody have a nicer solution than a static field in NetUtil caching the hostname (and removal of the same approach from Workspace status action)? Plus the time measuring logic and a warning in case of slow lookup.
Can hostname be added to CommandEnvironment?
We had this problem in Selenium, and did this:
(Effectively @philwo's suggesting of caching the lookup)
Setting the machine name in System Preferences appears to have resolved this issue for me. But only temporarily.
Great to see this fixed! What release do think it'll make it into?
It is not in the current rc so it will be in next month release. With our
coming change to our release process it will be cut on 10/01 and release
around the end or October (this is subject to change)
On Fri, Sep 15, 2017, 8:23 AM Sebastian Ärleryd notifications@github.com
wrote:
Great to see this fixed! What release do think it'll make it into?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/3586#issuecomment-329693031,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADjHf8FLTXZj5gnrF3JuGm_TRlM0VLsFks5sihfRgaJpZM4O7lY0
.
For anybody who have added entry to /etc/hosts but still see hanging, add ab ipv6 entry as well. like this
127.0.0.1 MacBook-Pro.local
::1 MacBook-Pro.local
There was a bug in the first commit to resolve this...make sure you have https://github.com/bazelbuild/bazel/commit/0bc9b3e14f305706d72180371f73a98d6bfcdf35 if you're still seeing the issue.
@jgavris sorry for not making it clear. I meant if somebody still use released version which don't have the bug fixed. The /etc/hosts workaround requires both ipv4 and ipv6 addresses added.
Most helpful comment
I think we should resolve the local hostname once on server startup and print a warning message if that takes longer than e.g. 5 seconds, to make sure that people don't blame Bazel for what is essentially a local configuration or network issue. Then we should use that hostname consistently during server lifetime.