Gradle: Allow matching repositories to dependencies

Created on 8 Feb 2017  路  38Comments  路  Source: gradle/gradle

Original issue: https://issues.gradle.org/browse/GRADLE-1066

Highly voted issue: 20

Example use cases:

  • I need to ensure that the runtime dependencies I ship in my distribution come from a specific repository that passes some licence check. Plugins and compile only dependencies don't have this restriction
  • A repository was added to provide only some specific dependency, but I don't want Gradle to ask it for other dependencies. Maybe it is slower or less trustworthy than others.
feature contributor dependency-management

Most helpful comment

PR has been merged, will ship into 5.1!

All 38 comments

+1

We absolutely need this feature. Note that for us, "declare repository per configuration (e.g. runtime dependencies)" would already sufficient. Maybe you can consider that as well.

+1
I figured that a way to make the situation a bit better is to think carefully about the order in which you declare your repositories. Where I work we have a nexus private repo that sometime is very slow to resolve dependencies. Moving it to the bottom made our life much easier.

+1

One idea:

repositories {
    maven {
        url 'specialCase'
        filter 'specialGroup:**'
    }
    maven {
        url 'repo1'
        filter '!specialGroup:**'
    }
}

Has any work been done on this?

It's a really good feature for competing repositories that may have the same dependencies with slight differences.

This is great idea for when you want to use jitpack.io for a single dependency and not use it for anything else.

I think this would be really cool in terms of not having to poll a bunch of other repositories before getting to the right one. This way, we can just prioritise the repo for a certain repository.

Looking forward to this implementation! 馃槃

I've adjusted the title of this to refer to the actual problem. Assigning a repository to a dependency might not be the best solution. There are alternatives like remembering that a certain module was absent from a repository and not checking it again when we want to see if there is a new version.

Another alternative would be to do matching of repositories based on certain attributes.

For my usecase, I care less about the performance problem, and more about being able to assign a specific dependency to a specific repository for correctness reasons.

Example story:

  • a jar from mavencentral has a bug / needs a feature
  • I modify the jar to suit my needs
  • I publish it to our local 3rd party repo under a new version
  • at some point in the future, a new jar with the same version is published to mavencentral

I might be in the minority here, but I imagine it's a pretty common situation for a project to be "we get all our deps from mavenCentral, except this one vendor library which comes from the vendor's maven repo, and this one hacked-up library we check-in to our 'libs' directory". I guess it's a sticky issue figuring out how to resolve the transitives...

@netwigg The way I've handled your use case in the past is to deploy your "fixed/hacked" artifact under an expressly different GAV coordinate. It may just be that the version is overtly different (i.e. "1.0.1" --> "1.0.1.MYCOMPANY_PATCH").

Another way is to prepend the groupId or artifactId with yours. Back in the day, SpringSource did this with their OSGi Enterprise Bundle Repository. They took a ton of OSS libraries that were not OSGi'd, and modified their MANIFEST.MF and repackaged them and deployed them with the same exact groupId and version, but the artifiactId was prepended with "com.springsource". For example, "org.apache.commons:commons-io" became "org.apache.commons:com.springsource.commons-io".

I like prepending to the group or artifact ID over changing the version, just to make it clear it is in fact a "different" artifact. It also better avoids collisions.

Publishing it under a different group is a very good approach. You can use a substitution rule to tell Gradle to replace any appearance of the original with your modified one. Then you don't rely on repository order for correctness.

The problem with the GAV changes is that they propagate to projects which consume your project as 4th party dependencies but those consumer projects might not have access to the same repositories you have. You care where the libs in your distribution build come from but you don't care / control where consumer projects get them from.

Another use case for this which I think is fairly common at companies and which shows that this is not only about performance: as per company policy 3rd parties have to pulled from a certain "approved" repository which is basically the same as Maven central but has some additional stuff like a virus scanner or something.

I don't understand the two points above. Can you please elaborate how the original proposal of "repositories on dependencies" would have solved them?

If you modify a lib, but don't give consumers access to the repo containing that modified version, these consumers can't work. They can't just use the unmodified version, since that's not going to work correctly with your project. But if they really want to try anyway, they can use a dependency resolution rule to adjust that dependency back to the unmodified version.

If you require dependencies to be pulled from a specific repo for policy reasons, it's much safer to validate that no other repo is used than to rely on repository ordering or trusting users to specify the repository on each dependency/configuration.

If you require dependencies to be pulled from a specific repo for policy reasons, it's much safer to validate that no other repo is used than to rely on repository ordering or trusting users to specify the repository on each dependency/configuration.

How exactly can you validate that no other repo is used today? Usually, the repo which has to be used for policy reasons does not contain all dependencies of your project. For example, due to otherwise too much maintenance effort, the repo which has to be used for policy reasons only contains the dependencies you redistribute (runtime dependencies). However, your project might also have test dependencies, integration test dependencies, compile only dependencies, plugins, etc.. which are not available on that repository. Therefore the only solution I see today is to rely on repository ordering which we both agree on is not a safe / reliable solution. Trusting developers to specify the repository on each dependency/configuration would be a reliable solution because with that your build fails if a certain configuration isn't on the repo you specified it should be.

If you modify a lib, but don't give consumers access to the repo containing that modified version, these consumers can't work.

What if the lib isn't modified, just has to be pulled from a different (private) repo for policy reasons? However, I admit that this use case is a bit far fetched, might not be as common as the "need to use repo X for policy reasons" use case. Dependency substitution on the consumer side would be a fix, but that complicates consumer projects quite a bit. With this ticket, this becomes easier since you don't need to change the GAV.

How exactly can you validate that no other repo is used today?

You can add a task to the build that goes through the repositories and fails if any non-trusted one is used. This could be part of a plugin that every project is required to use.

Usually, the repo which has to be used for policy reasons does not contain all dependencies of your project.

Plugins, compile only dependencies etc. can all compromise the code that you deliver. Compromised testing libraries could help an attacker hide such issues. Checking only the jars you redistribute doesn't mean the distribution is safe to use.

Trusting developers to specify the repository on each dependency/configuration would be a reliable solution

Isn't all policy about not trusting a single party, but having an independent party cross-check? It's much easier to check that each project is using a policy-checking plugin than checking whether each project implements each policy on a line-by-line level in their build scripts.

All that being said, I can imagine having some attribute-to-repository matching that allows you to restrict Gradle's search for dependencies. I would suggest opening a separate issue for this though, because the original intent of this one is about performance, not policy.

You can add a task to the build that goes through the repositories and fails if any non-trusted one is used. This could be part of a plugin that every project is required to use.

Doesn't solve anything for the case I described since what is considered "trusted" depends on the artifact/configuration.

Checking only the jars you redistribute doesn't mean the distribution is safe to use.

Let's not discuss whether such policies make sense or not. Fact is such policies exist and developers have to deal with them.

Isn't all policy about not trusting a single party, but having an independent party cross-check? It's much easier to check that each project is using a policy-checking plugin than checking whether each project implements each policy on a line-by-line level in their build scripts.

I don't really get this point. Goal is to fail the build if certain dependencies aren't pulled from a trusted repository. The fact that the build fails serve as proof that the build implements the policy. Whether the build fails because a plugin isn't present or because something isn't configured correctly in the build script itself doesn't really matter. Issue is that with Gradle today you can only fail the build if any dependency cannot be found on any specified repository. But what is needed is to fail the build if certain dependencies cannot be found on certain specified repositories.

All that being said, I can imagine having some attribute-to-repository matching that allows you to restrict Gradle's search for dependencies. I would suggest opening a separate issue for this though, because the original intent of this one is about performance, not policy.

That would be great. I'm happy to file a separate issue for this, but reading the description and comments of this ticket, why do you think this is only about performance? The "Expected Behavior" of this ticket sounds like what I'm trying to describe here and doesn't mention performance as an issue.

Goal is to fail the build if certain dependencies aren't pulled from a trusted repository. The fact that the build fails serve as proof that the build implements the policy.

So what does a green build tell you in this case? It could be implementing the policy (and no bad dependencies were used) or it might not be implementing the policy (and it might use bad dependencies). It's green either way.

I'm happy to file a separate issue for this, but reading the description and comments of this ticket, why do you think this is only about performance? The "Expected Behavior" of this ticket sounds like what I'm trying to describe here and doesn't mention performance as an issue.

That's because it went into a technical proposal too early. The context section mentions the actual problem - I add a repo just for a single dependency and everything else gets slower. By now it's gotten hard to tell who added a +1 for "make it faster" and who added a +1 for "make policy easier".

I think it's all about what the original Expected Behavior stated:

Individual dependencies can declare the repositorie(s). Any runtime the dependencies are only resolved from the declared dependencies. The build fails if the dependency cannot be found in the declared repositories.

Edit: this is useless debate, and really unproductive. But both ideas seem necessary? Split the ticket if you need to just stop debating.

I've forked out the performance aspect into its own issue and reworded this issue to not propose a technical solution, but talk about use cases instead.

Any update on this? I too have this first requirement stated at the beginning. Any ETA or version this is slated for?

Another thing to take into consideration here is if Gradle eventually provides plugins with the ability to provide custom dependency types along with custom dependency resolution/repositories (somewhat mentioned in https://github.com/gradle/gradle/issues/1400).

I don't want to dive into implementation much, but stumbling through the Gradle API's led me to https://docs.gradle.org/current/javadoc/org/gradle/api/attributes/package-summary.html . I don't fully understand what those are used for yet, but those look like they might fit the bill of "matching things together".

@mkobit attributes are meant to select _variants_ of the same module. They have no knowledge of where the component comes from, so are not the right solution for this.

Our initial thought was to have the ability to declare, for each repository, what it _may_ or _may not_ contain. But we haven't particularly made progress on this yet.

A good example of a problem this would solve that recently affected many users https://github.com/facebook/react-native/issues/13094#issuecomment-389568018

I believe this issue is quite sensitive. We run our own repository for nightly snapshots and I'm amazed to see our access logs from very large Fortune companies aggressively trying to access hundreds of dependencies. We get to know who uses what and which versions (and so does MavenCentral and other repos), which is quite a security leak, plus it scares me that we could even serve some of those (if we were bad guys).

this issue would solve problems like: 'javax.mail:mail:1.3.1'. in central (and a few other repos) there is only pom but no jar. jar is in other repo. so make gradle survive jar missing in central i have to put that pecific repo as a first one.

so the moment i hit the same problem with different repo means i won't be able to build my project with gradle?

What is currently stopping work being carried out on this? We also has this problem with public repositories even being spammed with requests for non-existent artifacts. Would it not suffice to add a whitelist/blacklist (includes/excludes) attribute to a repository block to either:

  • include resolution of all matching dependencies
  • exclude resolution of all matching dependencies

I would be willing to help out if someone would like to point me in the right direction.

Thanks @sboardwell ,

At this point we're collecting use cases for this, to make sure filtering, or matching, is the right solution for each one. We seeing advantages on implementing this, but we'd like to make sure the use cases for this are real, so starting with a list of use cases and possible solutions to them is a good start. Then we can decide if how and when we implement.

Ok, thanks for the info @melix. My use case would be similar to other posters here:

  • internal artifacts pushed to an internal Nexus repository nexus.mycomp.com/my-releases
  • be able to tell gradle: please check nexus.mycomp.com/my-releases for any artifacts having the group mycomp

I also second @ar's comment. The information found in the incorrect requests being sent out can be very revealing, not to mention dangerous.

We are currently solving some of the problems with rewrite rules on the webserver serving our nexus, catching and sending back 404's for any requests known to be incorrect - but this is a really ugly solution to maintain.

Do you know when you'll be finished collecting information for use cases? Just asking since the original ticket was opened in 2010 馃槂

We've been working on dependency management features intensively for the past months, and this issue has been mentioned several times. We were close to implementing but always found better ways to solve the use cases we originally thought would need this. This doesn't mean they aren't. Performance improvement is another one. It is unlikely we will do anything on this before october as we have many more features to finish before.

Usually our customers having such issues workaround by having an internal repository which also does proxying, but we reckon it's not always that simple.

@melix One of usecases is speed up build with multiple slow repositories:

repositories {
    maven { url "http://slow-repository-1.com" }
    maven { url "http://slow-repository-2.com" }
    // ...
    maven { url "http://slow-repository-n.com" }
    maven { url "http://super-slow-repository.com" }
    maven { url "http://extremely-slow-repository.com" }

    jcenter()
}

dependencies {
    compile "foo:bar:1.0" at "http://slow-repository-42.com"
    // ...
}

Hi @melix, thank again for the quick response.

We use an internal nexus with cached/proxied remote repositories for jcenter, maven-central and Co - the default negative cache in Nexus is 1440 minutes, meaning the remote repository will only be contacted once per day for something it doesn't have.

However, if I've understood:

https://docs.gradle.org/current/userguide/introduction_dependency_management.html#sec:dependency_resolution

Once each repository has been inspected for the module, Gradle will choose the 'best' one to use.

correctly (but maybe I've got this part wrong), every configured repository will be contacted on every build with an empty gradle dependency cache (~/.gradle/caches/modules-2) (regardless of the ordering of the repositories). So, even if we declare the internal repository first and the dependency is found, the other irrelevant repositories are checked regardless. Is this correct?

@sboardwell Unless you use dynamic versions, once a version is found in a repository, it is not searched for in the other repos.

But here the comment from @melix is about declaring only your internal repository in the build script and let it handle the proxying.

Another potential use case for an improvement:

  • I want my runtime dependencies to be resolved against a blessed repository only, for security / legal reasons.
  • I have a looser constraint for my test dependencies which are allowed to be sourced from a wider set of sources.

Thanks @ljacomet, I'll have a look at solving it with repository groups.

Any update?

My use case is some issues I've run into on multiple occasions where jcenter hosts old or unofficial artifacts that I need to get from another repo (e.g. Firebase artifacts that should come from google but are instead resolved to jcenter causing versioning issues; declaring google before jcenter fixes that issue but causes others).

Another one (that prompted me to search for this) is companies host their own artifacts, but adding their repository slows down the build, or worse, is somehow used to download all artifacts (even though it is specified last in repositories). My current issue is with Cloudflare's mobile SDK which is:

A. Slow
B. Being used to try and resolve all my artifacts

A PR is ready, if anyone is willing to build the branch and give us feedback.

PR has been merged, will ship into 5.1!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

p- picture p-  路  36Comments

gocursor picture gocursor  路  38Comments

lacasseio picture lacasseio  路  102Comments

bdarwin picture bdarwin  路  37Comments

bmuschko picture bmuschko  路  52Comments