Bazel: Feature request: remote cache: Don't wait for uploads to finish

Created on 2 Mar 2018  路  2Comments  路  Source: bazelbuild/bazel

Description of the problem / feature request:

It would be nice if, when Bazel is uploading something to the remote cache, that process was nonblocking: The rest of the build proceeded while the upload was taking place, and when the build was done, the command prompt returned even though the upload was still happening in the background.

Feature requests: what underlying problem are you trying to solve with this feature?

For people who do have slow or moderate-speed Internet connections, the upload part of the bazel cache can actually make builds dramatically slower.

For example, if I am at home, my upload speed is 11 megabits per second. If my build process creates, say, a 50MB file (such as the output of @bazel_tools//tools/jdk:gen_platformclasspath), the upload takes 36 seconds. That is 36 seconds that is "wasted" for me, since the upload is just for the cache.

I realize this is a very tall order, so it may simply be too hard:

  • without some kind of limits in place, the local machine could quickly develop a huge backlog of data to upload
  • to deal with that, it might make sense to have the option to "drop" some uploads (since it's just a cache)
  • you would probably have to copy the local files to another location before uploading, and delete it after, to avoid having the file change during upload
  • currently, the foreground bazel, not the daemon, does the uploading, so returning to the command prompt before finishing the upload would require changing that somehow

Probably a much more feasible solution than modifying Bazel itself is to have a proxy that sits between bazel and the actual remote cache. It could then handle the above problems. Bazel would think it had uploaded successfully; the (long-lived) proxy would upload to the actual remote cache, and would handle the above issues. But nonetheless, I wanted to offer this as a feature suggestion.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

BUILD file:

genrule(
    name = "main",
    srcs = ["in.txt"],
    outs = ["out.txt"],
    cmd = "cp $(SRCS) $@",
)

Set up a remote cache. Then, at the command line: generate a 50MB dummy file, and then run bazel, which will end up copying a 50MB build artifact to the remote cache:

$ dd if=/dev/urandom of=in.txt count=50000 bs=1000
$ time bazel build --remote_http_cache=... --experimental_remote_spawn_cache :main

What operating system are you running Bazel on?

macOS 10.13.3

What's the output of bazel info release?

0.11.0

Have you found anything relevant by searching the web?

GitHub issues, and google: "bazel upload wait"

P2 team-Remote-Exec feature request

Most helpful comment

We (Asana) have open-sourced our Bazel S3 cache which has this background-upload behavior: https://github.com/Asana/bazels3cache

All 2 comments

Hi, I am starting to implement this behavior in https://github.com/buchgr/bazel-remote due to popular demand.

We (Asana) have open-sourced our Bazel S3 cache which has this background-upload behavior: https://github.com/Asana/bazels3cache

Was this page helpful?
0 / 5 - 0 ratings