Bazel: Allow using remote cache for repository cache

Created on 11 Oct 2018  路  15Comments  路  Source: bazelbuild/bazel

Description of the problem :

The flag --repository_cache saves a lot of time wasted otherwise to re-download third party maven jar and http_archive that we have already fetched.

Problem is when running on stateless build servers (like GCB) that feature doesn't really help us, as the disk gets reset on each build. Given many external binary dependency from remote sources - just downloading everything may take expensive minutes on each build.

Feature requests:

If the execution is using R/W remote cache - it only makes sense to use the remote cache instead of the disk.

Have you found anything relevant by searching the web?

See discussion here:

A different idea is to use GCS:

comments

Several mitigations are available:

  • If build server allows - enable persistent folder between different builds.
  • Add a step to rsync that folder from storage before build and back to the storage after the build.

Of course - ideally if we're using remote executions as well, that uses the same cache, the most efficient thing to do is to not really download everything from remote cache to the host environment on early stage Most of the binaries are not used in that environment but only in remote workers that already have access to the cache.

CC: @buchgr and @aehlig

P3 area-ExternalDeps team-XProduct feature request

Most helpful comment

Our team is very keen to see this progressed.

We've been struggling with flakey CI for awhile now due to connect timed out errors randomly accross many different external hosts, similar to this:

java.io.IOException: Error downloading [https://nodejs.org/dist/v10.16.0/node-v10.16.0-linux-x64.tar.xz] to /home/runner/.cache/bazel/_bazel_runner/f2e96da83c9a9bca36350376aeb4df02/external/nodejs_linux_amd64/bin/nodejs/node-v10.16.0-linux-x64.tar.xz: connect timed out

To alleviate this we've tar'd up our external folder and stored it on an internal file server, which we download and extract before CI, while it works to reduce flakyness it is rather manual when updating any external repositories.

All 15 comments

Also quoting @buchgr

I would love for this to be implemented as part of a community contribution :-) and would be happy
to work closely with anyone willing to take on this task!

@or-shachar have you made any progress on this? I know you investigated implementing it? :-)

Is anyone working on this?

no that I am aware of! Feel free to pick it up :-).

This would be a great feature for stateless builds.

There's a Remote Repository Cache proposal from @jmillikin-stripe in https://github.com/bazelbuild/proposals. But its still in draft state.

@jmillikin-stripe any updates on the proposal or pointers for how people could help out?

There's a draft implementation of the .proto at https://github.com/bazelbuild/bazel/pull/8782, and I'm currently awaiting review from a Bazel core maintainer before I start writing the implementation.

Our team is very keen to see this progressed.

We've been struggling with flakey CI for awhile now due to connect timed out errors randomly accross many different external hosts, similar to this:

java.io.IOException: Error downloading [https://nodejs.org/dist/v10.16.0/node-v10.16.0-linux-x64.tar.xz] to /home/runner/.cache/bazel/_bazel_runner/f2e96da83c9a9bca36350376aeb4df02/external/nodejs_linux_amd64/bin/nodejs/node-v10.16.0-linux-x64.tar.xz: connect timed out

To alleviate this we've tar'd up our external folder and stored it on an internal file server, which we download and extract before CI, while it works to reduce flakyness it is rather manual when updating any external repositories.

We are currently trying to agree on an API. Here's a proposal similar to @jmillikin-stripe's that we are currently discussing: https://docs.google.com/document/d/10ari9WtTTSv9bqB_UU-oe2gBtaAA7HyQgkpP-RFP80c/edit?disco=AAAADULntWg&ts=5d5eecc1

Any progress/updates?

https://github.com/bazelbuild/bazel/pull/10622 is a proposed implementation of the most recent proposal.

Thank you for the PR @jmillikin-stripe

10622 implemented a parameter "--experimental_remote_downloader"

How to use this parameter? @jmillikin-stripe
Could you please give an expample?
Does this feature depend on a GRPC-cache-server?


INGORE all above
I got answer in this page : https://github.com/buchgr/bazel-remote

Experimental Remote Asset API Support
There is (very) experimental support for a subset of the Fetch service in the Remote Asset API which can be enabled with the --experimental_remote_asset_api flag.

To use this with Bazel, specify --experimental_remote_downloader=grpc://replace-with-your.host:port.

10622 provides parameter "experimental_remote_downloader"

From the doc, I think current protocal between bazel client and remote-cache is GRPC.
But we deployed multiple remote-cache servers behind a cluster of t-engine, which can only transfer http requests
@jmillikin-stripe Could you please provide an implementation of http protocol. So we can use that feature without any change

Our current topology is:
DNS(global host) ---> LVS clusteer ---> t-engine cluster ---> bazel-remote-cache cluster ---> backend storage( aliyun oss, similar to S3 or GCS)

T-Engine server is what we need for Tracing and High Availability(retry 3 times). Remote-cache server can be restart at any time while user can't notice that

I'm not planning to implement an HTTP version of the remote downloader code. Getting the gRPC version into Bazel took a large amount of work, and I do not have time to do the same for HTTP.

According to https://github.com/alibaba/tengine/issues/672, Tengine supports HTTP/2. I believe you could use it to proxy gRPC, because gRPC is built directly on the HTTP/2 protocol. The Tengine changelog says gRPC is available in versions 2.3.0 and later. This would require adding gRPC handlers to your bazel-remote-cache implementation.

Thank you for the solution @jmillikin-stripe

Now I am trying the new feature, but got Error

bazel-remote build with totay's source , bazel version 3.3.1


When I build a very simple demo project written in c++

first time I build my project with new param, It worked. In this path /home/admin/.cache/bazel/_bazel_admin/cache directory is old and has some files.

bazel build //...  --experimental_remote_downloader=grpc://127.0.0.1:9092  --remote_cache=grpc://127.0.0.1:9092 
INFO: Invocation ID: 5a478b83-133c-40ef-924c-f2dc03ef06e5
INFO: Analyzed 3 targets (21 packages loaded, 307 targets configured).
INFO: Found 3 targets...
INFO: Elapsed time: 0.692s, Critical Path: 0.16s
INFO: 22 processes: 22 remote cache hit.
INFO: Build completed successfully, 36 total actions

Then I deleted all cache content from local disk ,

rm -rf /home/admin/.cache/bazel/_bazel_admin

then build again. I got Error:

bazel build //...  --experimental_remote_downloader=grpc://127.0.0.1:9092  --remote_cache=grpc://127.0.0.1:9092 
INFO: Invocation ID: 66db7d37-46d8-4369-93e2-e7e8a18ab74b
INFO: Repository rules_cc instantiated at:
  no stack (--record_rule_instantiation_callstack not enabled)
Repository rule http_archive defined at:
  /home/admin/.cache/bazel/_bazel_admin/c6a5c929336b1584a43833c19bad1c7a/external/bazel_tools/tools/build_defs/repo/http.bzl:336:31: in <toplevel>
ERROR: An error occurred during the fetch of repository 'rules_cc':
   java.io.IOException: io.grpc.StatusRuntimeException: UNIMPLEMENTED: unknown service build.bazel.remote.asset.v1.Fetch
ERROR: While resolving toolchains for target //:function_2_test: com.google.devtools.build.lib.packages.RepositoryFetchException: no such package '@rules_cc//cc': java.io.IOException: io.grpc.StatusRuntimeException: UNIMPLEMENTED: unknown service build.bazel.remote.asset.v1.Fetch
ERROR: Analysis of target '//:function_2_test' failed; build aborted: com.google.devtools.build.lib.packages.RepositoryFetchException: no such package '@rules_cc//cc': java.io.IOException: io.grpc.StatusRuntimeException: UNIMPLEMENTED: unknown service build.bazel.remote.asset.v1.Fetch
INFO: Elapsed time: 0.101s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 0 targets configured)
    currently loading: @bazel_tools//tools/cpp

Finally, I found the reason

in the bazel-remote start log:

experimental gRPC remote asset API: disabled

I missed one configuration
in yaml file, add this parameter

experimental_remote_asset_api:
  true
Was this page helpful?
0 / 5 - 0 ratings

Related issues

davido picture davido  路  61Comments

dslomov picture dslomov  路  71Comments

philwo picture philwo  路  88Comments

dfabulich picture dfabulich  路  67Comments

dslomov picture dslomov  路  106Comments