When the remote cache times-out, I would have expected the --remote_local_fallback flag to fall back to building if the remote cache times out. However that is not what I'm seeing.
My .bazelrc:
build --spawn_strategy=standalone
build --genrule_strategy=standalone
build --experimental_objc_enable_module_maps
build --features swift.no_generated_module_map
build --symlink_prefix=build/
build --xcode_version 9.4.1
build --remote_local_fallback_strategy=local
build --remote_http_cache=$URL
and I'm building by running:
time bazel build --jobs 128 //:Learning
With bazel from GitHub as of today (4cba428ddb). But I can also reproduce with 0.17.1.
The error I get is:
ERROR: /Users/obonilla/r/learning-ios_trunk/Pods/LISemaphoreLib/BUILD.bazel:2:1: C++ compilation of rule '//Pods/LISemaphoreLib:LISemaphoreLib_ObjC' failed: Unexpected IO error.: Exhausted retry attempts (0)
Target //:Learning failed to build
We're seeing this on CI a lot, too and it's causing jobs to fail.
Example:
https://buildkite.com/bazel/google-bazel-presubmit/builds/9435#b624626c-2ed5-442d-9da6-e1fbaf3f7358
Couldn't build file src/bazel: Executing genrule //src:bazel-bin failed: Unexpected IO error.: Exhaused retry attempts (0)
@buchgr We might have to disable remote caching on CI if we can't come up with a fix tomorrow. :(
Ping @ola-rozenfeld - any idea what might cause this and/or how we could fix it?
Yes, this is my bad, I'm very sorry, this was regressed by https://github.com/bazelbuild/bazel/pull/5917. Note how in this change we rethrow https://github.com/bazelbuild/bazel/pull/5917/files#diff-1bc490926cdfbdaf2b6d5292238dfb97R190 all exceptions that are not cache misses, instead of warning on them. Will send a fix right away, sorry! (FYI @werkt )
Btw, local fallback flags are not relevant to local execution mode, since we always execute locally if we execute at all.
It is up to the bazel team to decide what classes of exceptions are not to be silently ignored and funnelled into cache misses. And to that end, it may be a part of delegating it to the user, asking that they be willing to specify to accept timeouts, unavailable caches, auth errors, etc, all as cache misses, or some configured set of the above, and with what noisiness.
silently ignored
Not silently, it did print a warning before. I think that's still the right thing to do -- if the user sees lots of warnings, they will realize the remote cache is not working at all (they will also see it in the final status of number of cache hits). Not sure how we could allow more finely grained control.
What I think would help is some statistics at the end of the build that can be tracked. They should include remote cache failures. I think cache hits doesn't really convey the right information. If I see a sudden drop in cache hits, was it because the code changed a lot? Maybe the toolchain changed? I would rather see remote cache failures explicitly listed.
I agree, let's file a separate feature request of having a better unified approach (tracking) for various remote failures (caching, execution, BEP). Currently, we do different things for all three.
@ola-rozenfeld, how is the remote cache timeout is determined or set? I see my cache server answered the get request with "200", but still hit "Unexpected IO error.: Exhausted retry attempts"
Are you running with my patch? Can you please run with --verbose_failures?
The timeout is set by the --remote_timeout option, the default is a minute.
It looks like a potential fix was cherrypicked into the 0.18 RCs. Is a fixed 0.17 patch release likely?
Most helpful comment
It looks like a potential fix was cherrypicked into the 0.18 RCs. Is a fixed 0.17 patch release likely?