Hi,
I'm quite sure somehow this was already discussed but I'd like to put down my view.
Glide is flattening the dependency tree in the base vendor/ dir. From my understanding this mean that all the dependencies (also the inherited ones) are installed in the base vendor/ dir.
If a dependant package also vendors its dependencies, glide doesn't remove them and these ones take the precedence and are used for compilation instead of the one in the base project vendor/.
I'm temped to call this "soft flattening" .
It's clear that many projects (many monorepo so they also provides libraries) wants to vendor their dependencies (where their vendoring meaning is: commit the vendor directories in VCS), to avoid disappearing repos, going to the network to download etc...
This also looks like the golang stated solution.
Given this I can see two different possibilities to manage nested vendor folders that will have different impacts and behaviors:
What are your thoughts on this?
I can't see a default solution, perhaps the user should be able to force one or the other behavior per package dependency? (but this will mean that the user must explicitly define a repo and also subpackages, also if not an explicit dep but an inherited one, in glade.yml and define something like keepvendor: true/false)
Additionally, will removing just the nested vendor folder break some licenses like explained in http://engineeredweb.com/blog/2016/go-why-not-strip-unused-pkgs/ ?
This is a good and difficult question.
I don't think there's a license issue with removing vendored packages from a dependency. But, that's the least of the problems.
Here are a couple examples.
vendor/ directories from dependencies will cause those repos to become dirty. Working with them becomes significantly harder. Especially for those who work on a dependency at the same time as the parent and push the dependency repo. This can quickly get messy to manage.This makes it hard for a tool to automate things.
As a rule, libraries should not vendor/ dependencies unless they have a very good reason.
I'm not sure how to do this better than we do today for the go toolchain. There is a possible opportunity to detect if a dependency is vendored by a project and skip fetching it.
@mattfarina Thanks for your answer.
I just noticed other tools do (or want to do) "hard" flattening. For example I found some issues on godep like tools/godep#428 and tools/godep#224. In this last one @thockin talks about something similar to this issue (soft vs hard flattening) and, in the end, he prefers "hard" flattening (I apologize if I misunderstood something).
Probably there isn't an automatic solution given pros and cons of soft vs hard flattening and actually it's very difficult if not impossible to detect when a nested vendor is needed or not (since I can put anything inside a vendor folder without any vendoring tool information to detect the package version, sometime also without a source repo etc...).
Probably the unique solution is to let the user choose what's better.
What I'd like to propose is to add some options to Glide to give the user the ability to choose the preferred behavior but I'm not sure how to specify it (options? config file?. What I had in mind was:
vendor/ folder at any level)If this actually doesn't fit directly inside Glide, I'm temped to initially add this to https://github.com/sgotti/glide-vc. Something like --no-nested-vendor and --nested-vendor-keep="list of paths" in addition to the current/future option (considering I'm going to change them to opt in like proposed by you). I'll also add a note specifying that doing this will change the build and also runtime (since another package is used to compile a dependency) behaviors.
What do you think about this approach?
Returning to your questions:
What if you don't strip VCS information. Stripping the vendor/ directories from dependencies will cause those repos to become dirty. Working with them becomes significantly harder. Especially for those who work on a dependency at the same time as the parent and push the dependency repo. This can quickly get messy to manage.
Good point. Will be possible to improve the--update-vendored option to also detect if a repo is dirty and act the same (replacing the repo)?
If a nested dependency has a reason to store a specific version.
See above (explicit keeping of some nested vendor directories).
As a rule, libraries should not vendor/ dependencies unless they have a very good reason.
I can understand it, but on the other side it's difficult to define why a library should not vendor a specific version of a package if it needs to.
It's more difficult to distinguish what can be defined to be a library since many packages I use are part of a larger repo and usually there's a vendor folder at the root of the project so also if I'm using just this package (without any vendor folder in it), with glide, since I'm getting the whole repo, I get also all its parent vendor folders.
@sgotti at the moment I would prefer something like glide-vc handle stripping nested vendor directories. I may be persuaded to change my mind but I don't really want to add a bunch of complexity into Glide for that right now.
Dirty would be hard to detect for a vendored repo. If the VCS data isn't available the VCS can't easily do it for us. This is complicated. What's dirty? Missing files? Changed files? With a VCS it's pretty easy. Without it we just ignore it which is, in part, why updating vendored is opt-in.
I think stripping vendor directories should be in the hands of the developer. They know what they're building and can deal with the consequences. This is really why libraries should not vendor. This is, in part, why storing outside packages in your VCS isn't common (or even allowed) in some other toolchains.
Yet, I understand the issues k8s and others are having that leads to this.
I was running into this today, too. I want glide to see that a dep has a
Godeps file (or glide or ...) and import the deps into the root BUT NOT
save them in vendor/.
As an example: Kubernetes vendors heapster. heapster vendors a client lib
from kubernetes, which has a wide fan-out and transitively includes most of
kubernetes. I am now making a copy of most of kubernetes/... in
kubernetes/vendor/.../heapster/Godeps/_workspace/src/.../kubernetes.
Thankfully glide seems to be smart enough to not recursively vendor that
(or I would have kubernetes/vendor/.../kubernetes/... :)
Stripping out deps' vendor/ and Godeps/ and so on will save a lot of
ugliness and confusion. I don't actually know if Go will respect
repo/vendor/foo.com/bar/vendor/... or not - I hope not, but I suspect so.
On Fri, Mar 11, 2016 at 12:06 PM, Matt Farina [email protected]
wrote:
@sgotti https://github.com/sgotti at the moment I would prefer
something like glide-vc handle stripping nested vendor directories. I may
be persuaded to change my mind but I don't really want to add a bunch of
complexity into Glide for that right now.Dirty would be hard to detect for a vendored repo. If the VCS data isn't
available the VCS can't easily do it for us. This is complicated. What's
dirty? Missing files? Changed files? With a VCS it's pretty easy. Without
it we just ignore it which is, in part, why updating vendored is opt-in.I think stripping vendor directories should be in the hands of the
developer. They know what they're building and can deal with the
consequences. This is really why libraries should not vendor. This is, in
part, why storing outside packages in your VCS isn't common (or even
allowed) in some other toolchains.Yet, I understand the issues k8s and others are having that leads to this.
—
Reply to this email directly or view it on GitHub
https://github.com/Masterminds/glide/issues/303#issuecomment-195526172.
@thockin This is where things get hard.
The way go build and the other go tools work is to look in the vendor/ directory a requesting package is in to find a dependency. If it doesn't find it there it walks up the directory tree searching in each vendor/ directory for it. If not found in a vendor/ directory it goes back to the traditional methods of the GOAPTH and then GOROOT.
This is why the discussions in places like the go nuts mailing list have been to not vendor outside dependencies in your project if you're building a library for others to use.
So yeah, repo/vendor/foo.com/bar/vendor/... is respected. When I asked about it I was told that different people may want to vendor different versions or forks in their codebase. So, it's intentional and people are doing this.
Bringing this back to Glide, we've been wary to add something that alters a codebase to Glide which is what you're asking will do. The project glide-vc is working to be able to do this with numerous options. With this installed you'll be able to do things like run:
$ glide vc
We're also going to add a scripts section to Glide. Consider this in the glide.yaml:
scripts:
setup: go get -u github.com/sgotti/glide-vc
install:
- glide install --update-vendored --strip-vcs
- glide vc
Then you can run:
$ glide run setup
$ glide run install
We need the scripts for varying reasons such as installing applications used by generators. None of the scripts would be run automatically (for security reasons). It would be great to just use make but that's not always available or cross platform.
Thoughts? This is something we may want to get on a hangout to discuss in more detail.
The way go build and the other go tools work
Yeah, that all makes sense EXCEPT when you're vendoring something that vendors.
This is why the discussions in places like the go nuts mailing list have been
to not vendor outside dependencies in your project if you're building a
library for others to use.
That's great in theory. A large number of libs that we use are part of a repo
that also includes a binary. Etcd. Rkt. Heapster. Cadvisor. Docker.
Asking every repo to break itself into a "libs only" repo and a "binary only"
repo is a non-starter.
Given that, what happens once everyone uses glide (world domination, yay)?
1) heapster (the binary) uses and vendors kubernetes/pkg/client
2) glide imports all of kubernetes (because you don't want to prune files, more on that later)
3) heapster now has a vendor/k8s.io/kubernetes/vendor/... dir.
4) kubernetes vendors the heapster client lib
5) kubernetes now has a vendor/k8s.io/heapster/vendor/k8s.io/kubernetes/vendor/... directory
At this point you've gone full-recursive. There's not ACTUALLY a cyclical
dependency between Go packages (which are the true unit of import), but glide
has made a problem by claiming that a full repo is the unit of dependency.
Something has to be the turtle at the bottom. Godep respects go packages (plus
licenses and other metadata) so this is not an issue.
I think any vendoring tool HAS TO break the false cycle here, and the way to do
that is to first interrogate the user (which version do you want to keep) and
then discard the recursive deps. Perhaps one answer to "which version do you
want" is "both" in which case it leaves the recursive deps alone - does that
put enough freedom in the users' hands, while still doing something sane by
default?
Bringing this back to Glide, we've been wary to add something that alters a codebase
Well, _our_ lawyers were pretty clear that this is not an alteration in the
sense of triggering redistribution clauses of licenses. It is established
practice within the Go ecosystem. But you shouldn't trust MY lawyers - you
should consider it yourself. That said, I think stripping out
vendor/... and Godeps/_workspace/... and similar things is EVEN LESS
worrisome than pruning down to the minimum set required. I think pruning
recursive deps has to happen.
Regarding glide-vc: if that's the best we can do, we'll do it, but I'll go on
record as it being more clumsy than it needs to be. I can respect (though
disagree) with the position that pruning code out of a repo is an "alteration",
but I can't buy that pruning recursive deps is an alteration. I think the
default should be to process-and-prune recursive deps. If you want to leave
minimization to glide-vc, so as to put the users fingerprints on it, I am fine
with that (but probably won't do it).
Scripts is a cute hack, but it means that everyone's repo will be different
with different target names. It's Makefile all over again.
@thockin thanks for being patient with me while I work through my thinking on this and taking so much time for your responses.
If you get a few moments can you take a look at #339. It's a pull request I just wrote on the topic.
@mattfarina Thanks for listening and working in this. I think that the work done for glide 0.10 (strip-vcs and strip-vendor) is a good base solution for this issue.
As also explained in #349 handling some incompatible packages with same name could be done installing them as separate nested vendors dirs (some like in npm v3) but as explained this is a tricky problem (causing possible incompatibilities).
And now glide-vc can be used just to remove all unneeded packages and files (better than doing vendor management inside it) if license permits this and to preserve space when committing the vendor directory.
If you want this issue should be closed and perhaps future updates will go in new issues.
fwiw, in the new SAT solver-based engine, the intention is to flatten as an (aggressive) default - because it's ultimately the only path to a sane system - but to allow exceptions in cases where it's provably safe automatically...and then probably in other, perhaps user-chosen situations as well.
also, this pretty much describes my view:
I can respect (though disagree) with the position that pruning code out of a repo is an "alteration", but I can't buy that pruning recursive deps is an alteration.
I'm new to glide and trying to understand how nested vendor directories are handled presently.
From what I can understand, "soft-flattening" is now happening by default, but "hard-flattening" is optional. Do this sound accurate?
For example, all dependencies end up at the top level vendor, but all nested vendor directories can be removed on request by specifying --strip-vendor
Most helpful comment
@thockin This is where things get hard.
The way
go buildand the othergotools work is to look in thevendor/directory a requesting package is in to find a dependency. If it doesn't find it there it walks up the directory tree searching in eachvendor/directory for it. If not found in avendor/directory it goes back to the traditional methods of theGOAPTHand thenGOROOT.This is why the discussions in places like the go nuts mailing list have been to not vendor outside dependencies in your project if you're building a library for others to use.
So yeah,
repo/vendor/foo.com/bar/vendor/...is respected. When I asked about it I was told that different people may want to vendor different versions or forks in their codebase. So, it's intentional and people are doing this.Bringing this back to Glide, we've been wary to add something that alters a codebase to Glide which is what you're asking will do. The project glide-vc is working to be able to do this with numerous options. With this installed you'll be able to do things like run:
We're also going to add a scripts section to Glide. Consider this in the
glide.yaml:Then you can run:
We need the scripts for varying reasons such as installing applications used by generators. None of the scripts would be run automatically (for security reasons). It would be great to just use make but that's not always available or cross platform.
Thoughts? This is something we may want to get on a hangout to discuss in more detail.