Gradle: Let multiple containers share downloaded dependencies

Created on 14 Nov 2016  ·  115Comments  ·  Source: gradle/gradle

Hi!

Looks like Gradle is locking the global cache when running the tests. We run Gradle in Docker containers, and from what I saw in the logs, it fails to acquire the lock with:

14:09:05.981 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] The file lock is held by a different Gradle process (pid: 1, operation: ). Will attempt to ping owner at port 39422

Expected Behavior

Gradle should release the lock when it executes the tests. Other Gradle instances are failing with:

Timeout waiting to lock Plugin Resolution Cache (/root/.gradle/caches/3.2/plugin-resolution). It is currently in use by another Gradle instance.

Current Behavior

Gradle holds the lock

Context

Our CI servers are affected. Parallel builds are impossible.

Steps to Reproduce

I managed to reproduce it with a simple Docker-based environment:
https://github.com/bsideup/gradle-lock-bug

Docker and Docker Compose should be installed.

$ git clone [email protected]:bsideup/gradle-lock-bug.git
$ COMPOSE_HTTP_TIMEOUT=7200 docker-compose up --no-recreate

Your Environment

Gradle 3.2 (tried with 3.1, 3.0 and 2.12 as well)
Docker

feature 3.2 contributor dependency-management

Most helpful comment

A huge disk space and network usage. We will have to download the same dependencies for every Gradle job type. Right now Gradle cache takes a few GBs, but if we don't share, we will have to multiply it by the number of Gradle-based jobs we have, so the result will be tens, or maybe even hundreds of GBs, which is not really acceptable for us

All 115 comments

I don't quite understand the use case yet. Are you running several builds at the same time on the same working directory? That will give you many other odd problems besides just the .gradle directory being locked. Just think about what happens when one of those builds runs a clean while another is trying to compile.

If you want to do different builds on the same project at the same time, I'd recommend using separate checkouts for that.

Just a minor piece of terminology: The .gradle directory inside your project is the local cache. The global caches are in the user home by default.

@oehme
We don't, I just reused the same project source to demonstrate an issue. We run different projects inside the containers at the same time with .gradle folder shared across them (think CI environment)

Got it, thanks. The Gradle user home cannot be shared between different machines. Why do you want to share it? It'll just create contention between your builds, even if this specific issue was solved.

@oehme well, it's a bit hard to define "machine" here.
We use Docker containers, on the same host.

Having to create a separate .gradle for each project sounds a bit expensive and breaks the concept of the shared global cache.

A docker container is a machine for that matter. It's processes are isolated from the host system.

Having to create a separate .gradle for each project sounds a bit expensive and breaks the concept of the shared global cache.

I don't really understand the use case I guess. What is the reason to run the builds in docker containers, but share the user home? If you don't trust the code, then it absolutely should not have write access to the host's user home. If you trust it, then what do the docker containers buy you?

@oehme it's not about the security.

We use Docker containers as a unified way to run different kinds of builds in our CI process, different projects might want to use different Java versions, for instance

This is pretty common things nowadays I should say. Jenkins is promoting Docker builds a lot, others are integrating Docker containers as well.

I understand the problem with "the multiple machines issue". However, this issue is more about "Why test executor takes a lock for a long time?", because AFAIK locks in Gradle are short living things.

Gradle processes will hold locks if they are uncontended (to gain performance). Contention is announced through inter-process communication, which does not work when the processes are isolated in Docker containers.

Hm, back in the days it was different - Gradle was trying to release the lock as soon as possible, and I really liked that strategy. What happened? :)

The cross-process caches use file-based locking, so every lock/unlock operation is an I/O operation. Since these are expensive, we try to avoid them as much as possible.

Any chance to get them configurable? I would really like to disable this optimization on our CI environments. Otherwise, we just delete lock file manually to workaround the issue when there are some long-running tests are being executed :D

We could potentially add a system property that tells Gradle to "assume contention". There might be other issues that we haven't yet discovered though, since sharing a user home between machines is not a use case we have designed for.

I'd like to assess the alternatives first: What would be the drawback if you don't share the user home?

A huge disk space and network usage. We will have to download the same dependencies for every Gradle job type. Right now Gradle cache takes a few GBs, but if we don't share, we will have to multiply it by the number of Gradle-based jobs we have, so the result will be tens, or maybe even hundreds of GBs, which is not really acceptable for us

I think the best next step would be for you to implement a fix for that specific problem and try it out in your environment.

My gut feeling is that there may be other issues waiting when you try to reuse the user home. If there aren't, then we could discuss introducing a flag into Gradle to opt-in to a "docker mode" :)

@oehme ok, thanks for the link! I'll try to play around with it and will report back.

Also, there is one more option - on *nix-based systems, Gradle can use sockets to communicate. That way it should work, and Docker will allow us to mount the socket inside a container.

WDYT?

That could work ad well. Let's first make sure though that the locking problem is in fact the only problem here.

@bsideup Did you fix this ? I am currently facing this issue with the same kind of setup as yours...
At least, it would be nice to have an option to set the timeout.

Another use case is when a user runs multiple different builds of different projects on multiple different hosts, all using his/her account. This is typical of environments with network mounted home directories.

Gradle has to pro-actively release the lock as soon as it is done with the cache. I am willing to pay the price of an IO operation to save the build from a timeout. Please see the excellent explanation in this GRADLE-3106 comment.

Just want to explain how to reproduce this problem by posting a simple build.gradle file:

task sleep() {
    doLast {
        Thread.sleep(100000)
    }
}
````
Get two terminals on different hosts that mount the same home directory with the same `~/.gradle` in it, then type `gradle sleep --debug --stacktrace` in both terminals. One of them will fail to acquire the lock and die waiting. The failing one will show:

The file lock is held by a different Gradle process (pid: 64549, operation: ). Will attempt to ping owner at port 40291

Of course the other process cannot be notified, it is on another host, resulting in:

Caused by: org.gradle.cache.internal.LockTimeoutException: Timeout waiting to lock file hash cache (/home/martinda/.gradle/caches/3.5/fileHashes). It is currently in use by another Gradle instance.
Owner PID: 64549
Our PID: 25504
Owner Operation:
Our operation:
Lock file: /home/martinda/.gradle/caches/3.5/fileHashes/fileHashes.lock
```
Could it be as simple as adding the IP address of the process holding the lock to the lock file and add it to the pingOwner method?

My team is also encountering this issue when dealing with containerized CI builds, forcing us to keep many copies of the Gradle cache. We'd love to see an option to aggressively release the lock.

FYI:
For us, the workaround was to run the container where Gradle is with --net=host, this way Gradle will be able to communicate with other instances.

My workaround is setting up a maven repository acting as a proxy (and cache) and loading some init.gradle in the build container.
This allows us to keep one cache instead of multiple ones, and there is no conflict.

@bsideup That sounds good, though I don't know if I'll be able to convince my CI runner to do that ... I'll try it out.

@saiimons Could you go into more detail about your workaround?

@AdrianAbraham I run a Nexus repository with a proxy configuration for the major maven repositories (jcenter, maven central, etc. check your log for the URLs).

image

All these guys go behind a group, in order to use a single URL:

image

Then my build container will load a init.gradle file (in /opt/gradle/init.d/ as I am using this image for CI) :

allprojects {
  buildscript {
    repositories {
      mavenLocal()
      maven {
        url "http://nexus:8081/repository/global_proxy/"
      }
    }
  }
  repositories {
    mavenLocal()
    maven {
      url "http://nexus:8081/repository/global_proxy/"
    }
  }
}

@saiimons We're running a local Nexus, and our configuration is similar (no mavenCentral(), though); but Gradle still has to download the packages from Nexus into its own cache to run a build. Does your setup just avoid Internet downloads? Or does it avoid the Gradle cache itself?

Yes, gradle downloads the packages, but as the storage is local, the latency is small and there is no bandwidth consumed.
My build sped up and the need for external resources was reduced (we were able to build when DNS servers were DDoSed in October and most of the repositories were unreachable).

We're doing something similar, but using automatically-generated docker containers for each branch (this is handled by Jenkins 2.0 pipelines and it's _somewhat_ flexible). We still have to download 2GB or so worth of jars for every build of every branch, and keep storage for all of these until Jenkins auto-cleans them a few days later.

It's not a deal breaker, but it's a serious inconvenience. I could live with an occasional contention on the folders if it made the downloads unnecessary.

Is there any way to configure the timeout for the cache?

We used to solve this issue by generating a base image that cached most of the dependencies. This base image was rebuilt every day, thus, the actual difference between the current and cached dependencies was kept small.

In our new project, however, we are facing the same issue as @bsideup as we want to share the cache between containers (the above scheme works for this). At the same time we are limited by the 'concurrent builds cannot use the same cache' aspect. Actually, the real issue is not even the cache size or the network usage, but the serialized download of artifacts. We have 10G connection between the servers, (I even put the repos on RAM disk,) still the network utilization is low as the dependencies are downloaded sequentially. Probably even over different TCP connections, resulting in lots of time wasted for TCP ramp up? This may be a result of using Maven repos. The real solution would probably be a server side resolver framework that could return all the artifact URLs at once.

Actually, the real issue is not even the cache size or the network usage, but the serialized download of artifacts.

Gradle 4.0-milestone-2 downloads metadata and artifacts in parallel, you might wanna give that a try.

Gradle 4.0-milestone-2 downloads metadata and artifacts in parallel, you might wanna give that a try.

Definitely very promising. Once 4.0 comes out, I'll push for the upgrade.

Hello,
We have similar setup, docker containers as jenkins slaves.
We're trying to have the gradle user home as a mounted volume on the docker container so all containers reuse the gradle cache and avoids downloading the common 3pp dependencies.

Using Gradle 4 RC 3, still does not fix the issue
FAILURE: Build failed with an exception.

  • What went wrong:
    Could not create service of type FileHasher using GradleUserHomeScopeServices.createCachingFileHasher().
    > Timeout waiting to lock file hash cache (/srv/jenkins/workspace/.gradle/caches/4.0-rc-3/fileHashes). It is currently in use by another Gradle instance.
    Owner PID: 170
    Our PID: 170
    Owner Operation:
    Our operation:
    Lock file: /srv/jenkins/workspace/.gradle/caches/4.0-rc-3/fileHashes/fileHashes.lock

As of now, we need to set gradle user home unique per container so each build is downloading all dependencies.
We're not able to set --net-host, as we have some services in the container that are used as part of integration test (postgres) and we would need to have different port per container

We see this happen periodically in our Jenkins build fleet as well.

The general use pattern is:

  • 1 executor per node (so there are not multiple builds executing in parallel or multiple Gradle builds running sharing the same home directory)
  • Home directory is (currently) shared across multiple builds and Gradle versions
  • All build steps are ran inside of a container
  • User may run many shell commands, ./gradlew, background shell scripts, etc, inside of that container
  • The container is provisioned for as long as the user's build executes
  • A user may use the daemon inside the container if they want to
  • Container is stopped at the end of the user's build so any Gradle daemon running inside that container is also stopped

We then sometimes see the same error mentioned by @zageyiff for users' builds that execute on that node.

Could not create service of type FileHasher using GradleUserHomeScopeServices.createCachingFileHasher()

Some builds execute successfully on a node while others may not. They may fail or be manually killed. Then, a Gradle execution will finish with the error above.

I don't have an easy way to reproduce it yet, but if there is anything useful I can provide here let me know.

Is it possible to point gradle to an additional read-only local cache ? If this can be achieved a read only volume can be mounted on a container, which can avoid multiple user and permission related problem as well.

We observe this on our docker based Gitlab runners as well, as we also have one shared cache for all runners. We share the cache by mounting the cache folder into every docker container.

We have now a workaround in place. We do not any longer share the cache but have individual caches in each docker container. This results in slower build times and uses a lot of resources (Storage, bandwith, ...).

Would be nice if the proposed new gradle system property "assume contention" could be implemented.

This issue is causing a lot of pain for me as well. We're trying to not rely on build node dependencies and thus running all of our gradle tasks inside of docker containers. Given that we also have several gradle utilities that we use to execute fine-grained build and environment tasks with a high degree of parallelism, each task takes significantly longer than it should because it has to fetch all of the dependencies. It would be nice to be able to mount gradle home on the build node each time leverage the shared cache similarly to how it can be done in maven.

Does anyone have an example for how they implemented pre-cached containers? Do you link a data container on gradle home that is pre-populated with the expected cache?

Similar to @RMsiemens, I'm looking at using a multi-stage Docker build on Gitlab Runners to build Java apps and a subsequent application container. I also plan to use Gitlab Runners to run CI tests on commits and merged in Docker containers. I want to do this to more easily offer multiple Java versions to our dev teams.

I would appreciate a gradle cache sharing solution allowing me to have a common cache on the Gitlab Runners that I can share to multiple Docker containers running builds or tests that may be happening at the same time. We use a remote artifact store and will have to pay for massive amounts of bandwidth to perform the download of dependencies each time a new build or merge happens and all of the dependencies are downloaded.

Another use case for this is using Gradle for automation in an HPC environment, particularly with different tasks that the user may want to run in parallel in a computing cluster.

These clusters typically have high-performance networked file systems.

Right now, wrangling Gradle to work on HPC systems is not easy.

I hope this can be reprioritized. Building in Docker containers is becoming more and more common, especially for CI, and since containers usually start from a "clean" state, this means that each and every build triggers an artifact download. The alternative is to bake the Gradle cache in the container, which is really hacky and also makes the containers really big.

The cleanest solution, by far, is Gradle being able to share the same cache without any conflicts between multiple instances. I appreciate Gradle, but c'mon folks, (conceptually) it shouldn't be that hard, Maven 2 was doing it without breaking a sweat back in 2008 😢

We are facing exact same problem. Please let us know when should we expect fix for this.

Hello guys,
I am facing the same issues while trying to run multiple Dockers that share the same cache. Any workaround?

A possible work-around that was communicated to us by Gradle support is:

[...] by running a remote cache container per docker host and arranging for the containers doing builds with the build tool to use that endpoint for remote caching. Gradle Enterprise also has a high performance shared remote cache backend that you could try out for free.

And:

We make a docker container available of a remote cache: https://hub.docker.com/r/gradle/build-cache-node/

But it sounds more like caching the tasks output and not for caching the dependencies.

Also possible workaround would be spinning up proxy artifcatory image in same as CI network.

To workaround this issue we let our containers (jenkins nodes) rsync the entire gradle cache to a common persistent volume when they are destroyed after each unique build process (rsync is very efficent in not copying existing files etc). During startup of the containers the cache is reloaded from the persistent volume using rsync as well to in memory on the jenkins node. It is a simple solution using merely shell scripting but it is effective for us and rsync deals with most complexit, in combination we do use the gradle build cache and nexus as well locally within our CI/CD-cluster but those changes only made minor performance improvements - compared to using a local in memory gradle cache.

However it would be nice not to be locking files when reading from the cache to share common persistent volume accross all our container for reading only.

Another workaround and not a good one is to set different local cache folders for each Gradle build - http://mrhaki.blogspot.in/2017/04/gradle-goodness-change-local-build.html. This is not optimal but is simple.

Daniel Zwicker
Schulstraße 45b
64342 Seeheim-Jugenheim

On 4. Dec 2017, at 15:17, Sriram Viswanathan notifications@github.com wrote:

Another workaround and not a good one is to set different local cache folders for each Gradle build - http://mrhaki.blogspot.in/2017/04/gradle-goodness-change-local-build.html http://mrhaki.blogspot.in/2017/04/gradle-goodness-change-local-build.html. This is not optimal but is simple.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub https://github.com/gradle/gradle/issues/851#issuecomment-348974306, or mute the thread https://github.com/notifications/unsubscribe-auth/AAejAyWAaO7Q7n9mDTWX7ZWQ-ZkTiZOHks5s8_7pgaJpZM4KxW7_.

FWIW, I'm getting errors related to the GRADLE_USER_HOME even though I'm using different GRADLE_USER_HOMEs for each job. Builds will sporadically fail with the message:

Could not create service of type FileHasher using GradleUserHomeScopeServices.createCachingFileHasher()

All GRADLE_USER_HOMEs are on NFS.

It seems like Gradle's caching issues are not limited to concurrent use of the same GRADLE_USER_HOME.

@anderslauri would you mind to share the scripts (jenkinsfile, bash, whatever), if possible?

@marmax

It's simple, stupid simple. Below is the snippet for our OpenShift DeploymentConfiguration for our Jenkins node (OpenShift pod) which executes the following script once it's get killed after each unique Jenkins build (the script has a timeout per default of 30 seconds - can be increased).

  - image: nexus.kafkaint.fhm.de/stargate/jenkins-node:latest
    lifecycle:
      preStop:
        exec:
          command: ["/home/jenkins/bin/stop"]

The script stop contains the following where the variable GRADLE_CACHE is the directory for the Gradle build cache in memory on the pod and the respective NFS variables are PVC mounted directories on the pod.

if [ -d "${GRADLE_CACHE}" -a -d "${GRADLE_CACHE_NFS}" ]; then
  rsync --whole-file --ignore-existing --recursive "${GRADLE_CACHE}/" "${GRADLE_CACHE_NFS}"
fi

if [ -d "${SONAR_CACHE}" -a -d "${SONAR_CACHE_NFS}" ]; then
  rsync --whole-file --ignore-existing --recursive "${SONAR_CACHE}/" "${SONAR_CACHE_NFS}"
fi

When a new pod is started in OpenShift the entrypoint script contains this snippet below. All of this works very fine for us, we use a cronjob in OpenShift to clean the cache on the NFS once a month to ensure a relevant cache also. There has not been any issues with performance that we have noticed, rsync is very efficent - our Gradle cache is around 3-4 gb.

  # Synchronize the Gradle cache to the container.
  if [ -d "${GRADLE_CACHE_NFS}" -a -d "${GRADLE_CACHE}" ]; then
    nohup rsync --whole-file --ignore-existing --recursive "${GRADLE_CACHE_NFS}/" "${GRADLE_CACHE}" > /dev/null 2>&1 &
  fi

  # Synchronize the SonarQube cache to the container.
  if [ -d "${SONAR_CACHE_NFS}" -a -d "${SONAR_CACHE}" ]; then
    nohup rsync --whole-file --ignore-existing --recursive "${SONAR_CACHE_NFS}/" "${SONAR_CACHE}" > /dev/null 2>&1 &
  fi

Is there any stance from the Gradle team about this? Is this something that should be supported or are we supposed to go all in on the build cache and a local repository? All I see here are workarounds.

The answer depends on the problem you are trying to solve. Each of them might have different solutions.

  1. Dependencies being re-downloaded? Use Nexus/Artifactory close to your build agents (e.g. as a container on the same machine). We might also put downloaded dependencies into the Gradle build cache to serve a similar purpose.
  2. Too much disk space being used because each agent stores its own copy of the dependencies? If you use ephemeral agents that are cleaned up after each build this should not be an issue. If you use long-lived agents we could work against the cache growth by implementing something for #1085
  3. You want to share configuration like gradle.properties or init scripts? Only copy those into the agents. We might separate a configuration dir and a cache dir in the future.

Any other use cases I'm missing?

@oehme yes. The reported one :D

  1. Concurrent access to ~/.gradle folder from different Gradle daemons / instances.

Currently, they use localhost to communicate with each other. If you change it to file socket, for instance, it shouldn't be a problem anymore.

Current workaround for Docker users:
Run your containers with --net=host, so that Gradle instances will communicate with each other

Another possible workaround for Docker users (I haven't tried, yet): put all containers in the same network.

@gesellix AFAIR that will not work because Gradle will still use localhost :(

@bsideup ah, you're right, literally localhost.

@bsideup That's not a use case, but an implementation detail. You are doing this because you want to solve one or multiple of the use cases I listed.

Allowing concurrent access to ~/.gradle would have weird side effects. E.g. suddenly one container would try to run builds on a daemon from another container, because they both see the same daemon registry. Also, the caches in there are OS-dependent, so if you have VMs with both Linux and Windows for instance, that wouldn't work either.

So for the limited use case of "Docker containers on the same machine" we could make this work by separating the caches from other state and making that sharable. But for all other use cases this wouldn't help. Plus, it might mean losing some performance for the common case of a non-shared home.

I think a better/more general solution would be to put dependencies in the build cache and then have a build cache node on the same physical machine as the build agents.

By node you mean an actual container? I guess that could work, even though it would make things more complex... What would creating a "build cache node" involve?

I think a better/more general solution would be to put dependencies in the build cache and then have a build cache node on the same physical machine as the build agents.

You keep repeating that, or "start local Nexus for Maven repository', but... Really? That's a hack/workaround, not a solution. It's even worse when multiple teams/companies share the same CI infrastructure like Jenkins.

Allowing concurrent access to ~/.gradle would have weird side effects. E.g. suddenly one container would try to run builds on a daemon from another container, because they both see the same daemon registry.

That's not a side effect but bug if you ask me.

So for the limited use case of "Docker containers on the same machine" we could make this work by separating the caches from other state and making that sharable

Sounds like a great separation of concerns which is broken right now. IMO it should be fixed not because of #851 (this issue), but also because of any other potential issue it might cause.

I agree with @bsideup: The cache should be shareable between containers/machines with access to the same file system. It's simply what users expect. And cleanly separating the caches from all other state sounds like a very good idea in general.

We might separate a configuration dir and a cache dir in the future.

There is an issue for this already: https://github.com/gradle/gradle/issues/1319

My users login to multiple hosts. These hosts are shared between multiple users. We don't mix OSes, so it is an homogeneous network of hosts. Any given user can run any number of builds on any number of hosts at the same time. Since all users are pointing to their respective NFS homes, the ~/.gradle cache is common between those hosts.

If gradle released the cache sooner (after downloading dependencies?) it would help. Maybe this can be a temporary solution?

@bsideup suppose we separate caches and allow them to be shared between containers. How would you solve the different OS problem?

@lptr different OS?

I think we all agree that having a shared cache would be great, for different reasons. However, the fact our cache is _by design_ limited to a single machine, and not meant to be shared over the network or docker containers is not an arbitrary decision to make everybody swear. Ideally, if we can find a solution that:

  1. allows concurrent read/write access in the cache
  2. from a single machine (multiple builds executed concurrently on a single machine)
  3. fine grained (doesn't lock the cache for the lifetime or a build, like it used to, or dependency resolution, like it used to also)
  4. supports different OS (including, yes, Windows)
  5. supports multiple concurrent hosts (aka, the docker use case here, or different CI agents)
  6. doesn't lock the cache for a single host, blocking all others
  7. is TCP/IP connection failure safe (in other words, you're not allowed to use TCP/IP to see that there are concurrent access, because docker isolates everybody)
  8. doesn't kill the performance of the local builds

Then of course, we would accept such a PR. Today, what we have supports 1 to 4.

@lptr I wonder why does caching of things downloaded from the internet have to be OS dependent? Not talking about Gradle's build cache or something

We could reduce the scope of the problem to just the file store (i.e. downloaded POMs and JARs), leaving all other caches private. Since the file store is written infrequently, there shouldn't be a performance problem when sharing it with a more pessimistic locking strategy. Even with a localized dependency mirror that would still be worthwhile, as the agents would save disk space.

The other caches (e.g. compiled scripts, build cache, transformed artifacts etc.) would be much harder to handle, as @melix explained. So we'd need to separate those directories using some new option.

@oehme sounds good to me. The issue was about the files downloaded from the internet (POMs, JARs, etc), everything else is not critical for us

+1 from me for scoping "dependencies" (jars, etc), because we would like to reduce network traffic of bigger files. Regarding the build cache: I assumed that would already be addressed with the global build cache (and disabling local build cache)?

@bsideup just out of curiosity, why are you having to download dependencies over the Internet? Is it that …

  • You do not have an internal proxy (e.g. Artifactory)?
  • You are using a near proxy but it is still too slow?
  • You are building on a platform (e.g. Travis) that doesn't support that kind of near proxy?
  • Something else?

@gesellix @gayakwad @mkobit @zageyiff @saiimons - I'd appreciate your answers to if you don't mind. If you prefer, you can email me direct via luke - at - gradle.com. Thanks in advance.

@ldaley,

You do not have an internal proxy (e.g. Artifactory)?
You are using a near proxy but it is still too slow?

No, that will only increase the complexity & cost of our build infrastructure

You are building on a platform (e.g. Travis) that doesn't support that kind of near proxy?

Sometimes, but it's not affected by the issue I described (Although you have to delete some files (some .locks) from the cache, otherwise Gradle will fail... )

Something else?

A standard setup with Jenkins where every job is executed inside a Docker container (to avoid having to install the tools on a host), multiple containers per host with mounted ~/.gradle.

@oehme Can you guess what milestone this feature might be targeted for? Just being able to share those POMs and JARs would be a huge speed improvement for teams that build in Docker containers.

There is no plan at this point. Our recommendation for now remains using artifactory/nexus to provide fast artifact downloads both for your CI agents and your team members.

In regard of this 'issue' in a CI context:
For our setup I think I introduced a good 'workaround', but it depends on your number of executors per build node and the size of a singe cache: I map a container volume to keep the gradle caches and set the GRADLE_USER_HOME to <cache_volume_path>/${env.EXECUTOR_NUMBER} (on Jenkins-CI, do not know for other CIs). So I avoid any parallelism issues and have still the cache around for reuse and the cache duplication is justifiable/feasible (for us at least).

@redeamer Have you run into any issues from persisting GRADLE_USER_HOME from build to build? Wondering what else other than the downloaded dependencies is carried over.

We use almost exactly the same workaround as @redeamer and haven't had any issues

Unfortunately a workaround like the one @redeamer mentioned above won't work with Dockerized slaves. The build will always be running on executor 1 since each Docker container is treated as a new node.

@bsideup we are using kubenetes to launch the jenkins agents on containers using the kubenetes plugin. we do face a similar issue when we run multiple containers, which are sharing the same NAS. This approach does help us to fasten our build time..

As per the discussion above, how can we inject the --net=host in the kube environment for the containers as we run. The plugin kubernetes-plugin, does not give us the options to pass the arguments when we run the docker.

  • ./gradlew clean test --refresh-dependencies
    Starting a Gradle Daemon, 1 busy and 1 incompatible and 3 stopped Daemons could not be reused, use --status for details

FAILURE: Build failed with an exception.

  • What went wrong:
    Could not create service of type FileHasher using GradleUserHomeScopeServices.createCachingFileHasher().
    > Timeout waiting to lock file hash cache (/nas/jenkins/.gradle/caches/4.0/fileHashes). It is currently in use by another Gradle instance.
    Owner PID: 141
    Our PID: 141
    Owner Operation:
    Our operation:
    Lock file: /nas/jenkins/.gradle/caches/4.0/fileHashes/fileHashes.lock

Our cache takes about ~1.3GB, so the decision was to rsync .gradle for every docker container and than rsync updates back to volume. But still would be nice to have a solution to run docker just from volume, it would be much convenient and will take less time for "syncing".

very actual problem,

on each docker pipeline build at the moment I'm:

  • disabling daemon org.gradle.daemon=false

  • on non root user doing chown, because all folder are mounting wuth root user

  • and mounting no whole ~/.gradle folder, but only dependencies caches:

docker run --rm --name run-my-e2e-tests \
  -v ~/.gradle/caches/modules-2/files-2.1:/home/e2e/.gradle/caches/modules-2/files-2.1 \
  -v ~/.m2/repository:/home/e2e/.m2/repository \
  my-e2e-tests

Here is our situation:

We use docker for our CI builds. Not for security; we use docker for repeatability, isolation, and ease of management. Each build runs in its own docker container. We have up to 3 build agents running on the same host, each as a separate docker container.

We recently enabled the gradle build cache, pointing to a remote HttpBuildCache. I hadn't realized that the default behavior, if you enable caching, is that you also get a local cache, at $GRADLE_HOME/caches/build-cache-1. Note that we have $GRADLE_HOME mapped outside of the docker container, to a directory on the actual host.

It's possible (I think; I haven't tried it yet) to just disable the local cache. But now that I think about it further, I think it's a good idea, and I'd like to keep it.

But having 3 different local build caches on the same host, when they in the long term contain largely the same contents, seems wasteful. And has now led (on two separate occasions) to filling up the local disk.

I will probably work around this by limiting the size of the local cache. Apparently you can no longer limit by size, but I can change the default DirectoryBuildCache.removeUnusuedEntriesAfterDays to be something less than the default of 7. But it would be preferable to just have all of the agents share a cache.

But from this issue it sounds like that's not possible.

@oehme Is the workaround described by bsideup (--net=host Docker option) safe? If not, would it be safe to mount only .gradle/caches/modules-2 as a Docker volume in the container? In that case each container would still have its own .gradle directory, but the modules-2 subdirectory would be mapped from the host.

Another potential workaround: would it be possible to add a command-line option to fall back to using the local Maven cache? I know Gradle already checks the local Maven cache first, but with such an option it could be instructed to write to it as well.

.gradle/caches/modules-2 can not be shared too as gradle will put a lock there once using it. What we have done is to setup a pool of shared caches and allocated to container to accelerate the build.

We tried a mount to ~/.gradle/caches/modules-2/files-2.1 for just the artifacts (no build or metadata caching which we didn't want anyway). That seemed to work without causing locking issues. Stable artifacts did not re-download and snapshots with a 1ms changing timeout downloaded every time as expected.

YMMV but in our case, even though everything was working as expected, it didn't improve our pipeline build times. We are now thinking we have some other issue with our agents file system latency. Unlike others, we're not too concerned about disk space since the Jenkins agent containers are destroyed after every build.

YMMV but in our case, even though everything was working as expected, it didn't improve our pipeline build times. We are now thinking we have some other issue with our agents file system latency. Unlike others, we're not too concerned about disk space since the Jenkins agent containers are destroyed after every build.

Using a different CI system(gitlab-ci) I had similar results. The real cause behind slow builds was IO. We migrated our gitlab runners(jenkins agents?) to SSD's and made sure every runner VM was on a different host to not have IO spikes at the same time.

We are also wanting to make use of "throwaway" Jenkins Build Agents that are simply Docker Containers, but still share the cached downloads between them to save bandwidth and time.
We will often run multiple builds in parallel, and having to download & store the exact same artefact for the different builds is wasteful.
Or we will have to use multiple volumes and then run de-duplication/syncing on a schedule; which strikes me as brittle and prone to failure (and potentially blocking builds whilst they run).
Even with a Nexus proxy, pulling artefacts can take up to 13 minutes - that simply isn't going to be viable.

Edit: The most concerning scenario is running multiple builds of the same project in parallel. Imagine two devs working on the same project:

  • Dev 1 commits & pushes - this triggers a build & test with takes time.
  • Dev 2 commits & pushed as few second later - this also triggering a build& test.

In order to inform each dev as fast as possible of any failure, forcing Dev 2 to wait for Dev1's build to complete is less than optimal. For this reason we want to run the Gradle builds in parallel but share all/part of the cache.

We tried a mount to ~/.gradle/caches/modules-2/files-2.1 for just the artifacts (no build or metadata caching which we didn't want anyway). That seemed to work without causing locking issues. Stable artifacts did not re-download and snapshots with a 1ms changing timeout downloaded every time as expected.

@nniesen , how did you achieve that? When I try I see the ownership of "caches/modules-2" etc flip from "gradle:gradle" to "root:root" and the builds then fail.

Edit: After a bit of digging, it's as soon as I add a mount to "~/.gradle/caches/modules-2/files-2.1" the folder path "caches/modules-2/files-2.1" become owned by "root"; meaning that the user "gradle" has no access.

@roadSurfer: Unfortunately, I no longer have access to the environment but I believe everything was running as a build agent user on a Jenkins Kubernetes/Azure build agent. DevOps set up the Jenkins build agent configuration so that the jenkins agent's docker image had access to an external volume.

Perhaps your "throwaway" Jenkins Build Agent docker image was not properly setup to execute builds as the 'gradle' user. If the docker file doesn't create a user and switch to that user the image will execute using the root user.

In the Jenkins pipeline, before the Agent ran a Gradle command that downloaded artifacts, I did something like the following:

# In the Agents gradle user home, create the modules-2 directory.
mkdir -p ~/.gradle/caches/modules-2

# In the Agent, create the gradle user home files-2.1 link to the external directory.
# Note: /external-files-2.1 is the volume mount in the Agent that points to external location.
ln -s /external-files-2.1 ~/.gradle/caches/modules-2/files-2.1

Now when the pipeline runs the ./gradlew build, Gradle builds the metadata (in the Agents gradle home directory (~/.gradle/)) for all the existing artifacts in /external-files-2.1 and also downloads any missing artifacts.

Gradle is usually very fast at rebuilding the metadata in the users and projects .gradle directory so I assume we were having latency issues accessing and computing file hashes for the artifacts on the external volume.

Note: You can kind of play around with it locally by setting the GRADLE_USER_HOME environment variable to a temporary location. Then you can arbitrarily delete files from that directory or the projects .gradle directory to see how Gradle gracefully rebuilds any missing information.

Thanks @nniesen , I was wondering if the symlink trick would work. I placed the whole lot into a script to make my life easier:
Edit: The previous version of the below script was causing issues when it used links and multiple builds were running (random inability to read "module-artifact.bin" etc). So I had to change to using rsync and copy everything in and out. This takes ~20 seconds per build and that is not really acceptable long term.

#!/bin/bash
set -e

GRADLE_CACHE_NAME=caches
GRADLE_HASHES_NAME=fileHashes
GRADLE_MODULES_NAME=modules-2
GRADLE_NATIVE_NAME=native
GRADLE_WRAPPER_NAME=wrapper

GRADLE_SOURCE=/gradle
GRADLE_CACHE_SOURCE=${GRADLE_SOURCE}/${GRADLE_CACHE_NAME}
GRADLE_VERSION_SOURCE=${GRADLE_CACHE_SOURCE}/${GRADLE_VERSION}

GRADLE_TARGET_USER=/home/gradle/.gradle
GRADLE_CACHE_USER=${GRADLE_TARGET_USER}/${GRADLE_CACHE_NAME}
GRADLE_HASHES_USER=${GRADLE_CACHE_USER}/${GRADLE_VERSION}/${GRADLE_HASHES_NAME}
GRADLE_MODULES_USER=${GRADLE_CACHE_USER}/${GRADLE_MODULES_NAME}
GRADLE_NATIVE_USER=${GRADLE_TARGET_USER}/${GRADLE_NATIVE_NAME}
GRADLE_WRAPPER_USER=${GRADLE_TARGET_USER}/${GRADLE_WRAPPER_NAME}

if [[ "${USE_CI}" == "Yes" ]]; then
    if [[ "$1" == "-u" ]]; then
        if [[ -d ${GRADLE_CACHE_USER} ]]; then
            echo "CI cache already enabled."
        else
            echo "Copying cache into Container."
            rsync -a --include /caches --include /wrapper --include /native --exclude '/*' --exclude '*.lock' ${GRADLE_SOURCE}/ ${GRADLE_TARGET_USER}
        fi
    elif [[ "$1" == "-d" ]]; then
        if [[ -d ${GRADLE_CACHE_USER} ]]; then
            if [[ ! -d ${GRADLE_VERSION_SOURCE} ]]; then
                echo "Minimal source structure did not exist - creating."
                mkdir -p ${GRADLE_VERSION_SOURCE}
            fi
            echo "Copying ${GRADLE_HASHES_USER} to ${GRADLE_VERSION_SOURCE}"
            rsync -au --exclude '*.lock' ${GRADLE_HASHES_USER} ${GRADLE_VERSION_SOURCE}
            echo "Copying ${GRADLE_MODULES_USER} to ${GRADLE_CACHE_SOURCE}"
            rsync -au --exclude '*.lock' ${GRADLE_MODULES_USER} ${GRADLE_CACHE_SOURCE}
            echo "Copying ${GRADLE_NATIVE_USER} to ${GRADLE_SOURCE}"
            rsync -au --exclude '*.lock' ${GRADLE_NATIVE_USER} ${GRADLE_SOURCE}
            if [[ -d ${GRADLE_WRAPPER_USER} ]]; then
                echo "Copying ${GRADLE_WRAPPER_USER} to ${GRADLE_SOURCE}"
                rsync -au --exclude '*.lock' ${GRADLE_WRAPPER_USER} ${GRADLE_SOURCE}
            fi
        else
            echo "CI cache was not enabled."
        fi
    else
        echo "Unknown option: '$1'"
        exit 1
    fi
else
    echo "CI cache not enabled. 'USE_CI' was: '${USE_CI}'"
fi

"USE_CI" should be set to "Yes" either in the image itself or via Container config.
"GRADLE_VERSION" should be set either in the image itself. via Container config or by getting it from Gradle.
The location "/gradle" should be defined as a volume in the Image.
This should then be mapped to a named volume pointing at a full Gradle cache.
Run as "script-name.sh -u" at the start of a Jenkins pipline (for example) and then "script-name.sh -d" at the end.

Only outstanding issue I can see with this approach is having to remember to run the script at the start and end of every pipeline. Still, it's an improvement.

I really hope the team find a proper fix for this CI issue soon.

I created this https://discuss.gradle.org/t/sharing-managing-global-project-between-docker-containers-hosts-with-gradle5/31578 issue in the forum to maybe start a discussion about the use cases. It also covers the topic of the project cache, which can be in your way too (in development rather then CI).

I really wonder what the "reason" is behind https://github.com/gradle/gradle/issues/851#issuecomment-368226538 and also how this topic related to the "remote cache" https://docs.gradle.org/current/userguide/build_cache.html#sec:build_cache_configure_remote

I would love to have some exchange in the forum to not overload this issue too much, maybe if needed extract the core / moving forward into this or a different issue.

I think from a personal POV, this very much topic is harming the gradle project far more then anticipated - i already have seen a backoff from gradle after a successful migration from maven just because of that very reasons - they just went back to maven. Sharing ~/.m2 is just dead simple and those kind of cached folders are nodays very well supported by the CI solutions - where you just define a volume and give it a name, and it gets mounted on all your build docker containers.

It is practical on maven, npm, composer, and dep(go) and AFAIR also with go-modules from 1.17 - sounds like the demand is high and the reason understandable.

for @oehme i would really invite you to join the discussion there too. You suggestion with an nexus-proxy is by far not and "drop in alternative" - we are running a nexus-proxy anyway, and this is not even near to what you get with mounting a folder. Not only the transport time ( dedicated server in DC ), but the time to write i/o regarding to that amount of files is just not something one would suggest as "ok that is a viable alternative". Maybe you could reconsider here?

This is not only a problem for containers, it's also a problem when used with "normal" hosts. E.g. say there are two Linux machines. You ssh into one and run gradle build. This starts a daemon on that host. When you ssh into the other one, it will not be able to proceed due to:

> Timeout waiting to lock file hash cache (/home/<user>/.gradle/caches/<version>/fileHashes). It is currently in use by another Gradle instance.

Even more problematic is that if you have a lot of hosts, you don't readily know which host the daemon is running on to kill it.

I agree with @EugenMayer's comment, this is really killing gradle as a development option.

Ah simple workaround for jenkins totally missed in earlier comments!

@redeamer commented on Mar 2, 2018

In regard of this 'issue' in a CI context:
For our setup I think I introduced a good 'workaround', but it depends on your number of executors per build node and the size of a singe cache: I map a container volume to keep the gradle caches and set the GRADLE_USER_HOME to <cache_volume_path>/${env.EXECUTOR_NUMBER} (on Jenkins-CI, do not know for other CIs). So I avoid any parallelism issues and have still the cache around for reuse and the cache duplication is justifiable/feasible (for us at least).

Can't believe missed this, such an obviously simple workaround! Issue only happens in Jenkins when parallel builds run and access the same cache. So having one cache per executor means a cache can never be accessed by multiple containers at the same time. We only have 2 executors, so only one extra cache no issue for us at all.

We are using jenkins pipeline, so the simple change we made to all our scripts was simply to change docker args from:
args '-v gradleCache:/home/gradle/.gradle'

to:
args '-v gradleCache' + env.EXECUTOR_NUMBER + ':/home/gradle/.gradle'

Thanks @redeamer!

Even more problematic is that if you have a lot of hosts, you don't readily know which host the daemon is running on to kill it.

It would make sense for Gradle to store the host name and report it when the cache lock cannot be obtained. At least we would not have to hunt down which host holds the cache.

This clearly a much needed future, any ETA by any chance?

I would expect there is rather a small interest in ever getting work done here because of Gradle EE and the build cache server. I think there are commercial interest steering away from this issue - which are fine since somehow Gradle developers have to pay the bills too.

I must say, while there are not many othe reasons or none, this is the single most crushing one i will/would consider going back to maven

I understand that gradles build-cache is far more complex and thus the whole folder sharing is harder to do.

But sharing the dependencies / jar of already downloaded dependencies - no matter by which build, is a very easy challenge without any thread safety, paralell usage or whatever comes to your mind. It is pure additive and in CI downloading the dependencies of a usual sized project takes 60% or more of the compile time.

Well i think, in the end, we do not pay for Gradle and thus either put work into it ourself or just be happy that Gradle is out for free at all :)

Build cache server and Gradle EE does not have much to do with this though. This is mostly for dependency artifact cache while build cache server only caches gradle task steps.

There is interest in fixing this and work is planned for 6.1, in whatever form it will take.

Docker's experimental RUN --mount=type=cache feature provides three sharing modes. The private or locked sound like they could solve the described problem:

private creates a new mount if there are multiple writers. locked pauses the second writer until the first one releases the mount.

Did someone test these options in combination with a shared gradle cache?

Docker's experimental RUN --mount=type=cache feature provides three sharing modes. The private or locked sound like they could solve the described problem:

private creates a new mount if there are multiple writers. locked pauses the second writer until the first one releases the mount.

Did someone test these options in combination with a shared gradle cache?

It works, if you use docker build instead of docker run. But most of them use mounting gradle cache into container runtime.

Thanks for gradle and understand it's a free project, but wanted to add some support for this feature getting impleemented.

Coming from a different (python) world, I am somewhat shocked this is an issue. Gitlab CI, drone, concourse, github Actions... some of them being container-first and running e.g. a test, linter, checks and so on at once is a pretty neat way how to speed up builds. But because of this, it's impossible as individual (parallel) steps randomly fails. Not sharing the cache, on the other hand, takes enormous amount compare to the task (e.g. linting takes 10 seconds in our midsize project, downloading all deps on multi CPU AWS machine takes a minute at least)

FYI #1338, which somewhat relates to this issue, has been updated.

And there are plans to make the situation better overal:
(copied from #1338)

Further ideas / improvements are under discussion:

  • Allow to add a read-only dependency cache layer. The goal would be to enable sharing that one between containers, while additional files not cached would still be downloaded inside the container. In addition to the network traffic reduction, this should also allow to reduce image sizes since the cache would not have to be embedded in each.
  • Provide some tooling around seeding, copying and setting up these different caches in a container environment.

Is this fixed with the changes in 6.1 or is there something else that will be released in 6.2 to resolve it?

Gradle 6.1 will support copying the dependency cache without constraint. Some form of sharing will be made available with Gradle 6.2. Expect an update here when it makes it into a nightly for early feedback.

Just another workaround from my side: if you are using a docker image for building, just include the gradle cache in there, it will be unique per build. Of course regular updates need to happen

@gluehbirnenkopf That's exactly what we've done so far. A shared solution would be ideal to keep image sizes small and we have a bunch of projects that use similar dependencies.

@gluehbirnenkopf That's exactly what we've done so far. A shared solution would be ideal to keep image sizes small and we have a bunch of projects that use similar dependencies.

@rgoomar which folder exactly do you keep in your image? Complete .gradle/ folder ?

Hi folks,

Gradle 6.2-rc-1 is out with an option to have a shareable, read-only dependency cache: https://docs.gradle.org/6.2-rc-1/userguide/dependency_resolution.html#sub:ephemeral-ci-cache

Please give it a try, we would appreciate any feedback on this before the final, thanks a lot!

Does gradle check whether the RO cache is effectively read-only, or does it trust the user on this?

$GRADLE_RO_DEP_CACHE
|-- modules-2 : the read-only dependency cache, should be mounted with read-only privileges

It trusts the user that it's read-only (the docs mention that). Gradle _shouldn't_ write anything in there though, unless, of course, the directory you use is actually a write cache of another Gradle instance.

Are there any strategies on how would re-seeding look like?

For example let's say a team updates dependency in one of their projects. I assume they would have to run some special build that would re-seed the cache and then copy it around?

It's up to the team to decide. Some would have a build which re-seeds every day, others on every dependency change (maybe overkill), some on demand, ...

How does Maven do it? I do not have permissions to move files into the read only cache (which is not unusual in large organizations). So my only option is too disable caching alltogether by using a specific directory for each build making builds performaing 2x slower than Maven builds.

Maven doesn't do this. It's at your own risk, basically: you can have concurrent writes.

Would be nice if Gradle supported the same as it works perfectly well for maven. Gradle Remote Cache is much slower than disk cache. Keeping an additional read only cache adds quite some complexity.

Maven doesn't support concurrent writes, see MNG-2802 Concurrent-safe access to local Maven repository, open enhancement since Jan/07. In my experience, sharing .m2 between concurrent builds always worked for me, except when .m2 was on NFS. Maybe I was unusually lucky.

Would be nice if Gradle supported the same as it works perfectly well for maven

As stated here, it's not true. We have horror stories about concurrent writes in .m2, especially on CI. We _won't_ provide a way to do "unlocked writes" in Gradle, because it's inherently unsafe. That's precisely the reason for the feature we added the read-only dependency cache and the ability to relocate dependency caches.

Note that it has _nothing_ to do with the build cache. Saying "Gradle Remote Cache is much slower than disk cache" is irrelevant to this conversation, because the build cache is for _tasks execution_, not dependencies.

I just don't understand that either:

So my only option is too disable caching alltogether

as I said, tasks caching has nothing in common to the conversation here.

It would help me if I understood where gradle puts different types of files, and whether those locations support concurrent access. I feel that users like me have some understanding of that, but perhaps it is incomplete. It would help me if I could find the information on a single page explaining the caches, the repositories and the other places gradle accesses for read or write, for example: .m2, .ivy2, ~/.gradle, .gradle, the build folder, /tmp, etc.

What seems to be harder to understand is whether those are designed for concurrent access or not, how and when to clean them, and the conditions under which they are locked (lock acquired and released).

I often work on multiple projects at the same time on multiple hosts simultaneously all using the same NFS mounted /home or NFS mounted build folder. It would be helpful to understand where the caches and repositories should be located (on a disk local to an execution host?, on an NFS mounted on to multiple execution hosts?). Examples would also help. For example, should I share the .m2 folder with builds running in maven and gradle at the same time accessing the same .m2 folder?

It seems maven is not dead after all, Concurrent-safe access to local Maven repository is now fixed :)

Was this page helpful?
0 / 5 - 0 ratings