Bazel: Mac OS X: builds hang for 10-20 seconds due to localhost name resolution

Created on 18 Aug 2017  Â·  17Comments  Â·  Source: bazelbuild/bazel

Running a clean build on Mac OS X Sierra hangs for 10-20 seconds before actually executing. Running jstack shows that it is stuck in NetUtil.findShortHostName, trying to resolve the localhost name. The following blog post shows that this is a common issue, and suggests adding entries to /etc/hosts which fixes the problem. I suggest that possibly the result of this lookup should be cached by the blaze server?

Detailed description of this issue: https://thoeni.io/post/macos-sierra-java/

Bazel logs showing 20 seconds to do nothing

Bazel pauses for ~20 seconds before printing anything:

$ ~/bazel/bin/bazel build //prpcpython:test_prpc
INFO: Found 1 target...
Target //prpcpython:test_prpc up-to-date:
  bazel-bin/prpcpython/test_prpc
INFO: Elapsed time: 20.563s, Critical Path: 0.00s

The java.log showing the 20 second pause:

170818 13:54:31.431:I 1711 [com.google.devtools.build.lib.skyframe.LegacyLoadingPhaseRunner.execute] Target pattern evaluation finished
170818 13:54:31.431:I 1711 [com.google.devtools.build.lib.runtime.CacheFileDigestsModule.logStats] Accumulated cache stats before command: hit count=5, miss count=365, hit rate=0.013513513513513514, eviction count=0
170818 13:54:51.515:I 1711 [com.google.devtools.build.lib.buildtool.BuildTool.buildTargets] Configurations created
170818 13:54:51.516:I 1711 [com.google.devtools.build.lib.analysis.BuildView.update] Starting analysis

How to reproduce

Unfortunately, I'm not entirely sure exactly what configuration triggers this problem since my co-worker has the same OS version and doesn't run into this. However for me the following things reproduce this issue:

  1. Run a clean build of a single target, observe the elapsed time is always > 10 seconds.
  2. Run ping computername.local and observe that it also takes > 10 seconds.
  3. Add an entry in /etc/hosts to add a name and observe that the problem goes away.
  4. Turn on any of the sharing services in the system control panel and observe that the issue goes away.

Environment info

  • Operating System: Mac OS X Sierra (10.12.6)
  • Bazel version (output of bazel info release): release 0.5.4rc3
under investigation

Most helpful comment

I think we should resolve the local hostname once on server startup and print a warning message if that takes longer than e.g. 5 seconds, to make sure that people don't blame Bazel for what is essentially a local configuration or network issue. Then we should use that hostname consistently during server lifetime.

All 17 comments

I can confirm this happened for two coworkers and me. We resolved by having Screen Sharing turned on, but obviously that's not a good fix for the underlying problem.

+1 Interested if someone can track this down!

/cc @c-parsons can you take a look?

also cc @jmmv for perf on mac

@mhlopko looked into this a while ago and I thought fixed it via https://bazel-review.googlesource.com/c/bazel/+/5432, but there may still be something left to do. If you still remember about this, do you have any other details?

Hmm I only fixed it for Workspace status action, but checking the callers of NetUtil.findShortHostName I see that it's being called from other places, in this context only LocalSpawnRunner seems relevant. I've never touched that part of the codebase, and a quick (<5 min) search failed to reveal any per-blaze-invocation calls to findShortHostName. So I don't know what's wrong here off the bat. I bet @philwo will know :)

I think we should resolve the local hostname once on server startup and print a warning message if that takes longer than e.g. 5 seconds, to make sure that people don't blame Bazel for what is essentially a local configuration or network issue. Then we should use that hostname consistently during server lifetime.

Rats, I thought I posted the entire stack trace, but apparently not. If it is helpful I can dig it up again. FWIW: I think @philwo 's suggestion sounds perfect.

As @philwo educated me, LocalSpawnRunner is instantiated for every blaze invocation, so we found the source of the problem.

Does anybody have a nicer solution than a static field in NetUtil caching the hostname (and removal of the same approach from Workspace status action)? Plus the time measuring logic and a warning in case of slow lookup.

Can hostname be added to CommandEnvironment?

We had this problem in Selenium, and did this:

https://github.com/SeleniumHQ/selenium/blob/master/java/client/src/org/openqa/selenium/WebDriverException.java#L42-L110

(Effectively @philwo's suggesting of caching the lookup)

Setting the machine name in System Preferences appears to have resolved this issue for me. But only temporarily.

Great to see this fixed! What release do think it'll make it into?

It is not in the current rc so it will be in next month release. With our
coming change to our release process it will be cut on 10/01 and release
around the end or October (this is subject to change)

On Fri, Sep 15, 2017, 8:23 AM Sebastian Ärleryd notifications@github.com
wrote:

Great to see this fixed! What release do think it'll make it into?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/bazelbuild/bazel/issues/3586#issuecomment-329693031,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADjHf8FLTXZj5gnrF3JuGm_TRlM0VLsFks5sihfRgaJpZM4O7lY0
.

For anybody who have added entry to /etc/hosts but still see hanging, add ab ipv6 entry as well. like this
127.0.0.1 MacBook-Pro.local
::1 MacBook-Pro.local

There was a bug in the first commit to resolve this...make sure you have https://github.com/bazelbuild/bazel/commit/0bc9b3e14f305706d72180371f73a98d6bfcdf35 if you're still seeing the issue.

@jgavris sorry for not making it clear. I meant if somebody still use released version which don't have the bug fixed. The /etc/hosts workaround requires both ipv4 and ipv6 addresses added.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

philwo picture philwo  Â·  70Comments

laurentlb picture laurentlb  Â·  101Comments

philwo picture philwo  Â·  88Comments

dslomov picture dslomov  Â·  84Comments

meisterT picture meisterT  Â·  98Comments