Gradle: Make dependency caches relocateable

Created on 7 Feb 2017 · 52Comments · Source: gradle/gradle

Original issue: https://issues.gradle.org/browse/GRADLE-2690

Highly votes issue: 24

Expected Behavior

The cache can be transferred from one machine to another and can be used in offline mode.

Current Behavior

Since Gradle's cache stores the native OS absolute path (e.g. drive letters "C:" and backslash() file separators in the case of Windows) it will not work if unpacked into a different drive letter and/or directory structure. Similarly, the cache cannot be moved between a Windows and *NIX environment. (Our build process supports running under both Linux and Windows.)

Without necessarily changing any of the external behaviors/interfaces it would be fantastic if the artifact cache stored paths relative to the cache base directory and normalized to always use '/' as the file separator. Please note that this request is NOT asking to change the outwardly-visible behavior of the cache; i.e. existing APIs could still return runtime-resolved absolute paths.

Context

My company requires that we be able to escrow our software. This requires us to build a package that includes everything necessary to build, including 3rd-party libraries, without assuming that any particular network resources are available.

To satisfy this requirement, I need to be able to package the Gradle artifact cache (e.g. the entire GRADLE_USER_HOME) and then use it on another machine (which I may not be able to control) and have the cached artifacts resolve successfully in --offline mode

feature contributor dependency-management

Source

bmuschko

👍43

Most helpful comment

The first step of this work has been merged to master.

With it, you can safely move $GRADLE_USER_HOME/caches/modules-2 to a different GRADLE_USER_HOME and still benefit from the dependency cache.

This however requires both builds to use a recent nightly, at least 6.1-20191107230047+0000.

Under the hood, what was problematic was the content of metadata-* and it works as of metadata-2.90.

Versions prior to Gradle 6.1 would still need to issue HEAD request to match the metadata and JAR files. With Gradle 6.1 and beyond, dependency management will have 0 network interaction if the cache was properly seeded.

We are interested in feedback on the usability improvement from this change, especially in the seeding of ephemeral CI nodes. Please try it and let the Gradle team know!

We understand this change is only one step on the path of having full support for ephemeral CI nodes as it still requires one cache copy per node.

Further ideas / improvements are under discussion:

Allow to add a read-only dependency cache layer. The goal would be to enable sharing that one between containers, while additional files not cached would still be downloaded inside the container. In addition to the network traffic reduction, this should also allow to reduce image sizes since the cache would not have to be embedded in each.
Provide some tooling around seeding, copying and setting up these different caches in a container environment.

Note however that these items fit more under #851

ljacomet on 12 Nov 2019

👍10 🎉6 ❤1

All 52 comments

To satisfy this requirement, I need to be able to package the Gradle artifact cache (e.g. the entire GRADLE_USER_HOME) and then use it on another machine (which I may not be able to control) and have the cached artifacts resolve successfully in --offline mode

That is one solution, but there is another that doesn't require changes to Gradle: Write a Gradle task that creates a file repository from your dependencies and package that file repository. Then have logic in your build that says "if that file repository is present, use it".

oehme on 2 Mar 2017

👍1

@oehme Sure that's doable, but why would I want to basically re-invent Gradle's caching logic by wrapping it in some higher-level tasks that do the same thing?

sschuberth on 2 Mar 2017

👍7

The use case mentioned above is not about caching, but about providing a package that contains everything that was used for building that specific project. Including the whole user home doesn't sound like a great solution to that, as it will contain everything else that happened to lie around there, like dependencies of other projects and all kinds of caches that can be recreated from the project content.

oehme on 2 Mar 2017

Well, I should know about the use case as I was one of the commenters on the original JIRA ticket. To quote myself:

Same here, we want to pre-populate the cache on newly added CI nodes by copying over the cache from some other node, which potentially stores the cache in a different absolute directory.

sschuberth on 2 Mar 2017

That's a completely different use case than the one I quoted. We should not squash those into the same ticket. Pre-populating (dependency) caches on CI is indeed interesting.

oehme on 2 Mar 2017

I fail to see how that's a different use case. Could it be that you misread the initial post? The "Expected Behavior" says

The cache can be transferred from one machine to another and can be used in offline mode.

So, being able to transfer / copy the cache would both solve the problem of pre-populating CI nodes and being able to build --offline.

sschuberth on 2 Mar 2017

I was quoting from the description of this issue:

My company requires that we be able to escrow our software. This requires us to build a package that includes everything necessary to build, including 3rd-party libraries, without assuming that any particular network resources are available.

That's not something I'd solve by zipping up the user home. It can be solved much more elegantly by generating a file repository.

The escrowing use case is very different from the CI cache use case. They could both be solved with the technical solution proposed in this ticket, but that doesn't mean it's the right solution or that they need to have the same solution.

oehme on 2 Mar 2017

Ok, I was interpreting "escrow" differently in this context. If it basically just means "package all jars of all (transitive) dependencies" then you're right.

Now we need to agree on whether we need a separate ticket for the other use case, and which of the two use cases that other is.

sschuberth on 2 Mar 2017

I think we should rename this one to "Make dependency caches sharable between CI machines". The escrow use case and checked-in-dependency use cases (also mentioned in the original ticket) are already possible today and don't need a new ticket I think.

oehme on 2 Mar 2017

I'm still not convinced that Gradle's internal caches should be relocatable. The "prime my CI agents" use case can also be solved by creating a file repository, pushing that to the agents and making your build logic react to the presence of that file repository. This also works nicely for the "lots of docker build agents on the same host" use case ( #851 ) where you can share that file repository between the host and all the containers without having to worry about the locking that Gradle does on its internal caches.

What benefit would copying the whole Gradle user home have over generating a file repo that contains exactly what you need?

oehme on 2 Mar 2017

What benefit would copying the whole Gradle user home have over generating a file repo that contains exactly what you need?

The benefit would be that I wouldn't have to implement your aforementioned logic to react to the presence of that file repository for all products we're building.

Also, let me ask the opposite question: What benefit does it have for Gradle to store absolute paths in the cache? I can't see any, and that's the main issue for this use case.

sschuberth on 2 Mar 2017

The benefit is the freedom to do whatever is most performant without having to keep guarantees about relocatability.

The logic I mentioned could be a community plugin which you apply to your projects.

oehme on 2 Mar 2017

How is using absolute paths more performant than relative ones? I hope you won't start arguing about string concatenation here...

I was assuming you'd make the plugin proposal, but still it needs to be applied in all projects. Why let others solve the problem in many places when you can solve it once in a central place?

I start to feel you're very reluctant to making the cache portable, and I'm really wondering why, as I didn't read a good / valid argument about that yet.

sschuberth on 2 Mar 2017

👍6

I start to feel you're very reluctant to making the cache portable, and I'm really wondering why, as I didn't read a good / valid argument about that yet.

The Gradle User Home contains a bunch of stuff that are not meant to be shared (e.g. daemon registry) and the cache structure is not a public contract. Making it a public contract would put maintenance burden on us. That would be okay if there was a use case that can't be solved another way. But the one presented here is easy to solve with a plugin.

@bmuschko @mark-vieira What do you guys think? I don't want to monopolize this discussion :)

oehme on 2 Mar 2017

Note that I'm not asking for a public contract / API for the cache here. The exact structure of the cache can stay an internal implementation detail as long as the cache directory can be copied as-is to another machine, and the same Gradle version as was used to create the cache contents is able to re-use it.

sschuberth on 2 Mar 2017

👍1

I think there are ways to solve this use case w/o having to make the Gradle cache portable. Like @oehme mentioned, this is an opaque cache and there are potentially a bunch of consequences to packing up the entire Gradle user home directory and shipping it to a different machine.

My company requires that we be able to escrow our software. This requires us to build a package that includes everything necessary to build, including 3rd-party libraries, without assuming that any particular network resources are available.

If this is the case then wouldn't building a local Maven or Ivy repo with these libraries be sufficient? There are plugins out there that make this process fairly simple.

mark-vieira on 2 Mar 2017

Note that the escrow use-case isn't mine, but William Price's. Mine is the pre-populate CI nodes use-case only.

sschuberth on 2 Mar 2017

Mine is the pre-populate CI nodes use-case only.

This is a better use case IMO. The case here being that we want to share a single local dependency cache across many CI machines. FWIW, folks are already doing this. Not by "copying" the cache, but by sharing a single cache amongst multiple machines. There are two separate issues here, but they are somewhat related.

We can't copy a cache from one location to another because of absolute paths in metadata
We can't easily share a single cache across machines because of lock contention

I don't have the requisite knowledge to know why we can't (or wont') change #1.

As for #2, I think @adammurdoch has brought up before that this is something we can improve by simply making locking of the cache less aggressive in certain scenarios.

mark-vieira on 2 Mar 2017

Exactly. Neither option currently works (despite https://issues.gradle.org/browse/GRADLE-2795 being resolved), and solving it either way would suite my use case.

sschuberth on 2 Mar 2017

The issue you mention is only part of the problem. The main issue I'm referring to is we lock the dependency cache during resolution (?). At a certain concurrency threshold the contention becomes high enough where builds timeout waiting for the locks to release.

I'm more inclined for us to solve this problem as a solution to this use case as I think it's more common for folks to have a single shared folder or network drive that they use as a Gradle cache rather than physically copying the cache to machines.

mark-vieira on 2 Mar 2017

The only public contract for the Gradle team regarding this issue would be: Gradle Home directory is always relocatable. This would mean one can move Gradle Home directory to another location (folder / drive / machine) and just point the environment variable to a new location. It should work without downloading any artifacts again. This would cover all OFFLINE use cases (and also online cases where one relocates the folder to another drive when space is needed). For users having many OFFLINE projects using the same Gradle Cache it is really a burden to change all those Gradle build scripts... this should be solved centrally in Gradle and not on a project level...

gocursor on 3 Mar 2017

👍4

I work for SUSE and we have a requirement to build all software we ship without Internet access. We have infrastructure to build any kind of software project (Open Build Service), that automates the creation of an isolated, reproducible build environment. Currently, it does not have Java-specific support in it, in particular, it has no support of Maven or Ivy repos accessible from build machines.

What we do for Maven projects is simply relocating Maven caches to the build environment. It would help if Gradle caches were relocatable as well.

moio on 3 Mar 2017

👍3

If I may, I would propose a new name for this issue: "Make dependency caches relocatable"

gocursor on 3 Mar 2017

it would be better for multiple Docker containers to be able to use the same Gradle cache as a volume without running into locking issues.

This is not the same use case as mine. I need it for offline builds.

gocursor on 3 Mar 2017

The locking issue is covered by #851

oehme on 3 Mar 2017

851 seems to be limited to test execution is there also already a ticket tracking @mark-vieira's suggestion to "making locking of the cache less aggressive"?

sschuberth on 3 Mar 2017

I changed the title, it had nothing to do with test execution in particular.

oehme on 3 Mar 2017

I also have the 'escrow' use case. Sharing the cache seemed like the best solution, however:

If this is the case then wouldn't building a local Maven or Ivy repo with these libraries be sufficient? There are plugins out there that make this process fairly simple.

I'd prefer to use a plugin that would create a repository than to use the whole cache. I believe it would also solve this issue (sharing between CIs). However I haven't found anything (apart from repository proxies). @mark-vieira could you please share a link to such plugin?

Proxy would probably not do, as I rely on Gradle's AWS S3 maven integration.

MartinTeeVarga on 5 Mar 2017

What we do for Maven projects is simply relocating Maven caches to the build environment. It would help if Gradle caches were relocatable as well.

@moio I don't understand this part. Who fills the cache in the first place? If the answer is "build it once online and then share the cache", then I think a better solution is the one proposed for the 'escrow' use case: Generate a file repository and make the build use that. This reduces the waste of sharing the whole Gradle cache, which contains a lot of other things besides the downloaded artifacts.

oehme on 10 Mar 2017

I guess you are referring to:

Write a Gradle task that creates a file repository from your dependencies and package that file repository. Then have logic in your build that says "if that file repository is present, use it".

IIUC that implies patching Gradle files of whatever projects I am about to package, is that correct?

In that case sure, the solution would work, but it would not be very convenient. As a packager I try as much as possible to limit patches to upstream code, especially in the build system, otherwise I will have to maintain those downstream patches (first creating them for every packaged project, then updating them when they fail to apply due to upstream code changes).

I am aware that zipping up the whole Gradle cache is wasteful, but that is not a huge problem for my use case.

moio on 10 Mar 2017

You don't have to patch those projects in any way. You can inject logic into their build with an init script.

oehme on 10 Mar 2017

Hi, I have the problem that I need to export my cache of gradle to another computer that is in linux, my development project I have it in windows.
As the other computer is in linux uses a different path so I can not use the same gradle cache.
I wonder if the issue has been solved that I can use the gradle cache on another computer or some other solution that allows me to use gradle without downloading the libraries.
The problem is that the computer where I need to install the application is the same where I want to compile the project and is a linux server which has no access policies to external repositories.
In short I need my gradle project to a linux computer that does not have access to download the libraries from an external repository, I think the best way would be to reuse my gradle cache and simply move the files from one computer to another.
Has this problem been solved yet?

Yerme on 6 Jul 2017

The problem is that the computer where I need to install the application is the same where I want to compile the project and is a linux server which has no access policies to external repositories.

Why don't you build a distribution that you just install on Windows? It doesn't sound right to build the whole program from scratch on the production environment.

I don't consider this a good use case for caching.

oehme on 7 Jul 2017

Why don't you build a distribution that you just install on Windows?

Ok I understand your point, but that's just a case where I need it.
The issue is that the server is not where the program is published, is not the production enviroment for the application. it is a Jenkins server that builds the files war and jar and then send them to the server where they will be published, the server jenkins updates the changes and should build the files , But the jenkins server does not have access to repository maven by policies.

I also need to send the projects built with gradle to other people, the problem is that the cache is built in a location according to the username and that is different on each computer. The problem gets worse if I want to share the project with someone on linux and that person does not have access to the maven repository.

I really need to be able to compile the project without having to download the libraries on each switch, I've been looking for a solution to this for months.

Yerme on 7 Jul 2017

it is a Jenkins server that builds the files war and jar and then send them to the server where they will be published, the server jenkins updates the changes and should build the files , But the jenkins server does not have access to repository maven by policies.

That policy sounds broken to me. It's not protecting anything if you are allowed to dump your user home into that server. It might as well be allowed to access the repo then.

I also need to send the projects built with gradle to other people,

Either:

Those other people should be able to access your repo or
you should send them a ready-built distribution or
you create a file repo that you bundle with the source code when you send it to them

I really need to be able to compile the project without having to download the libraries on each switch

If you absolutely can't change the restrictions of your environment, then the simplest thing would be to create a source bundle including all dependencies and shipping that.

oehme on 7 Jul 2017

Hi, Gradle Masters! If I copy a gradle project it considers all output files out-of-date. Because of absolute file paths in projectRoot.gradle\3.5\taskHistory\taskHistory.bin. I suspect it's org.gradle.cache.internal.btree.BTreePersistentIndexedCache. I guess if you make the dependency cache relocateable, then I'd be able to copy/move my projects without the need for recompilation. If yes, I upvote.

My use case is following: I want to work on another branch of a big project, while leaving the current copy unchanged. I copy the project directory to a new place and switch the branch. I expect gradle to recompile only changed files. Now it recompiles everything ("No history is available."). The same happens even if I don't switch a branch, just rename the directory.

jarekczek on 23 Dec 2017

@jarekczek What you're describing would be a better fit for the build cache: https://docs.gradle.org/current/userguide/build_cache.html

big-guy on 23 Dec 2017

Thanks for the hint, big-guy. Sounds perfect, in theory. In practice it may be different (I mean efficiency), but surely we shouldn't extend the build-cache thread here.

jarekczek on 24 Dec 2017

That is one solution, but there is another that doesn't require changes to Gradle: Write a Gradle task that creates a file repository from your dependencies and package that file repository. Then have logic in your build that says "if that file repository is present, use it".

@oehme, something like the following?

# Machine A
$ cd $GRADLE_USER_HOME/caches/modules-2/files-2.1
$ for f in $(find . -type f -name "*.jar"); do (cd $(dirname $f) && zip /tmp/deps.zip $(basename $f)); done

# then on Machine B
$ mkdir -p /tmp/deps
$ unzip /tmp/deps.zip -d /tmp/deps
# replace all `jcenter()` with 
flatDir {
  dirs '/tmp/deps'
}

If this is the case then wouldn't building a local Maven or Ivy repo with these libraries be sufficient? There are plugins out there that make this process fairly simple.

@mark-vieira, any suggestion of which plugin(s) to use?

jose on 17 Nov 2018

@jose You might want to take a look at the Ivypot plugin: https://github.com/ysb33r/ivypot-gradle-plugin

mark-vieira on 19 Nov 2018

👍1

This apparently doesn't work:

The only public contract for the Gradle team regarding this issue would be: Gradle Home directory is always relocatable.

So what is the workaround? How to use cache to build the package without downloading things?

yurivict on 28 Nov 2018

caches feature is broken. What is the workaround?

This renders gradle unusable, because package builder can't directly download anything and caches which is supposed to have dependencies is broken (unportable). This prevents me from creating the FreeBSD port for one project that uses gradle.

yurivict on 1 Dec 2018

@yurivict FWIW the hack I use in SUSE distros is the following:

I build on my own machine using --gradle-user-home /tmp/gradle --project-cache-dir /tmp/gradle-project (this downloads artifacts from the Internet)
I package /tmp/gradle
in the build script, I use --gradle-user-home /tmp/gradle --project-cache-dir /tmp/gradle-project --offline. This makes the build succeed even without an Internet connection.

(details in the tetra project)

Downsides: this stores more than what is strictly needed, /tmp/gradle is hardcoded.

moio on 3 Dec 2018

👍2

@moio Thank you! I used this workaround for the FreeBSD port biology/gatk.

yurivict on 3 Dec 2018

Why have an --offline parameter when it just does not work? I've transferred my cache, added --offline, and in the debug output, gradle actually finds the dependencies.... but then fails with an error about not being able to download the file is just found in the cache...

I love gradle, but if you don't have an online repo, you are screwed. I just can't fathom why this issue is so hard to fix. All the files are right freaking there!

mpnewcomb on 8 Mar 2019

👍4

As far as the dependency cache is concerned, copying ${GRADLE_USER_HOME}/caches/modules-<version>/files-<filesversion> should already give a valuable benefit.

It is not sensitive to absolute path, and thus can be relocated. It will cause subsequent builds to issue HEAD requests for getting metadata and jar checksums but will then locate them on disk instead of downloading.

This will not solve the relocation + offline issue though, for that some modifications need to be made to the format of the files in ${GRADLE_USER_HOME}/caches/modules-<modulesversion>/metadata-<metadataversion>/*.bin.

Is it too hard to do? Certainly not. But as with the rest, it is a question of priorities and available manpower.

ljacomet on 12 Apr 2019

So is the ability to make depenendency caches relocatable resolved?

I believe I would encounter similar issue soon as my development machine has no connection to internet. So i was thinking of download libraries via gradle in another machine connected to internet, and bring the gradle cache folder (containing the downloaded libraries) into my development machine.

hanct on 16 Jun 2019

👀4

Work on this has started, the first phase will make the whole caches/modules-<moduleversion>/ relocatable, that is the path before it can change but the cache remains effective, no HEAD requests will be emitted.
The relative paths stored will also be normalized so that copy between file systems that have a different separator (/ vs \) is also possible.

ljacomet on 10 Oct 2019

🎉4

The first step of this work has been merged to master.

With it, you can safely move $GRADLE_USER_HOME/caches/modules-2 to a different GRADLE_USER_HOME and still benefit from the dependency cache.

This however requires both builds to use a recent nightly, at least 6.1-20191107230047+0000.

Under the hood, what was problematic was the content of metadata-* and it works as of metadata-2.90.

We are interested in feedback on the usability improvement from this change, especially in the seeding of ephemeral CI nodes. Please try it and let the Gradle team know!

We understand this change is only one step on the path of having full support for ephemeral CI nodes as it still requires one cache copy per node.

Further ideas / improvements are under discussion:

Allow to add a read-only dependency cache layer. The goal would be to enable sharing that one between containers, while additional files not cached would still be downloaded inside the container. In addition to the network traffic reduction, this should also allow to reduce image sizes since the cache would not have to be embedded in each.
Provide some tooling around seeding, copying and setting up these different caches in a container environment.

Note however that these items fit more under #851

ljacomet on 12 Nov 2019

👍10 🎉6 ❤1

Write a Gradle task that creates a file repository from

Do you have an example of such task? This can be useful for many people and can be used as a reference.

rumax on 11 Dec 2019

Do you have an example of such task? This can be useful for many people and can be used as a reference.

https://github.com/ysb33r/ivypot-gradle-plugin

mark-vieira on 11 Dec 2019

Hi all,

Dependency caches are now finally relocatable! Thanks to gradle-6.1 milestone.

I have also previously encountered similar issue, where I try bringing the entire .gradle from one folder to another folder (to emulate moving from one machine to another machine). Then when I import into spring tool suite 4, the import is successful, but the STS4 's java build path setting indicates that the lib dependencies path are still pointing to the original folder. This demonstrate that the .gradle must have contains absolute path to the original folder.

When i repeat the experiment with the latest milestone of gradle, by updating the gradle-wrapper.properties,
distributionUrl=https\://services.gradle.org/distributions/gradle-6.1-milestone-3-bin.zip,
now everything seems to work as I expected. The STS4 's java build path setting indicates that the lib dependencies path are now pointing to the new folder.

*The only quirk now is that during import of my gradle project to STS4, I need to explicitly specify the GRADLE USER HOME in the wizard dialog box (i choose wrapper option, fyi) , although i have already manually add the below line in the gradlew.bat
set GRADLE_USER_HOME=./.gradle
But this quirk, of course, might have nothing to do with gradle itself. Could be IDE or the gradle buildship plugin .