NB: keep in mind that the we see vendor/ as a temporary solution, and expect to eventually move to some newer, better version of GOPATH that largely obviates the need for it.
Currently, dep naively copies dependent repositories into vendor/ in their entirety. This may produce a vendor/ directory with a lot of unnecessary files, and many projects relying on existing community tools use some technique to prune out unneeded files.
It is unlikely that we could develop a pruning solution that would be as aggressive as every user wants, while still being safe for all other users. However, we would like to establish baseline pruning behavior that is safe for everyone, so that dep can perform it unconditionally, or with a minimum of necessary user input. If nothing else, this has performance benefits: if dep does the pruning, then it's possible for us to implement optimizations to simply avoid copying files that would later be removed.
There are a few, not fully orthogonal, classes of pruning we might consider:
vendor/ directories. This is required for correct builds, and dep will always do it.lock.json - only whether they're pruned from vendor.)imported by code, or required in the manifest), and could selectively remove those that are not needed. There are sub-considerations here:ignored in the manifest be pruning or preserved?.go files? What types of files?There's some debate as to whether removing files may be a violation of some software licenses. Getting clarity on this question would be ideal; if we can't, then we'll likely have to make additional pruning behaviors optional in order to avoid having dep blacklisted by legal departments.
Finally, pruning is intimately linked to how we approach the problem of vendor verification (#121).
Note that the logic for writing out the dependency tree lives in gps, and is likely to remain there. This issue is to discuss what the behavior and options should be; implementation will happen over there.
keep in mind that the committee sees vendor/ as a temporary solution, and fully expects to eventually move to some newer, better version of GOPATH that largely obviates the need for vendor/.
This is news to to me. How do you plan to support checking in deps for those who want this?
There are a few, not fully orthogonal, classes of stripping we might consider:
In govendor a popular approach has been to specify build tags to ignore. If you don't have any ignore build tags then you can keep the repo as is for those who wants that. It also serves to ignore appengine special files or other unsupported platforms.
keep in mind that the committee sees vendor/ as a temporary solution, and fully expects to eventually move to some newer, better version of GOPATH that largely obviates the need for vendor/.
This is news to to me.
Yeah, this is a newish thing. It's something we discussed within the committee, but I'm making it explicit here after discussions with @rsc .
How do you plan to support checking in deps for those who want this?
I have another draft issue I'm working on where we can explore that. Hopefully I'll get it up today.
Let's discuss specifics like this over there.
In govendor a popular approach has been to specify build tags to ignore. If you don't have any ignore build tags then you can keep the repo as is for those who wants that. It also serves to ignore appengine special files or other unsupported platforms.
Yeah, that makes a ton of sense, and I think it's complementary to the other approaches. I'm now reminded that @freeformz has definitely brought this up in past discussions, too.
Updating the original issue to include this...
Removing by build tag. Files with a particular build (or os/arch) Will this not affect cross compilation?
It would, I don't think that level (os/arch tags) of pruning should be on by default, but pruning of unused packages should.
@jessfraz I couldn’t agree more ...
@olekukonko for sure, and that's what that's intended to mean - if we add such a feature, then the user could choose to enable pruning out dependencies entailed solely by a particular build tag.
To be clear, though, this would only affect the deps that actually get written out to vendor/. It would not have any bearing on the way that dependency _selection_ works - the plan there remains to always consider all os/arch/release/build tags, and to record the resultant set in lock.json.
So speaking of pruning.. 1 concern that I hit while trying to vendor kops out as a library (I know I am crazy.. don't ask..)
The project used go-bindata to generate a file that was clobbered by vendoring... It was a fairly big deal that the file was in the vendor directory to use the tool as a library..
Not advocating that this is necessarily a good idea/design by any means.. Mostly sharing a real experience from the wild that might come up later.. especially if we start pulling files out of vendor/
Thoughts?
@kris-nova oh, this is a great question! Though...i actually can't quite picture the exact workflow you're describing with this:
The project used go-bindata to generate a file that was clobbered by vendoring... It was a fairly big deal that the file was in the vendor directory to use the tool as a library..
i think the answer depends considerably on the details. Could you be a little more specific about how this worked, what "the project" refers to (your attempt to vendor kops, or some dep that kops had with this weird requirement), where these generated files were, etc.?
Ah sorry, so here are the steps that I observed:
Basically there was a .go file that contained the bin data that was generated at compile time. It lived in the kops source tree. The file obviously never made it into kops version control.
When using kops as a dependency, we could generate the file and it would live appropriately in vendor/k8s.io/kops/some/path/to/bin.go. The dependency then behaved as expected. My concern was around that file getting removed if we run a prune (or any dep command for that matter) because of the note above:
Removing unnecessary files. Similar to the previous, should certain types of files be pruned out of vendored dependencies?
An remembered my experience with losing the file.. so I figured I would call it out..
..on second glance it looks like there is discussion around manifests that could prevent this happening..
either imported by code, or required in the manifest
so this point very well could be moot 😄
Removing unnecessary files. Similar to the previous, should certain types of files be pruned out of vendored dependencies?
Speaking of pruning unnecessary files, a while ago I was hit by annoying issues with dependency managers (specifically GB) that blindly trim everything except *.go files: there is a high change you wipe out files used by pkg/text/template.ParseFiles()¹ just because they don't have a *.go extension, thus breaking compilation or run-time execution due to missing files.
So far, I encountered only an handful of packages that were affected by the strict trimming, therefor it's not really a high priority issue and I solved it by manually copying the missing files when necessary. The real problem here isn't the manual copy _per se_, but the fact that a strict trimming could break _automatic deployment pipelines_ too — such as Heroku's tools and a-like. You can't fix an automatic deployment pipeline by manually copy the missing files (it's an oxymoron), especially if you don't have any control over the library/package that breaks.
It would be nice for golang/dep to not fall into the "trim everything except *.go files" trap.
There's some debate as to whether removing files may be a violation of some software licenses. Getting clarity on this question would be ideal; if we can't, then we'll likely have to make additional pruning behaviors optional in order to avoid having dep blacklisted by legal departments.
Pruning unnecessary files is a nice idea, in theory. Especially if you want to guarantee reproducible compilations (which I care a lot about), so ideally an optional feature is the only viable solution at the moment — at least until there is a reasonable heuristic about what can be safely pruned and what not. When the pruning breaks stuff, let the user disable the pruning (doesn't matter if by CLI or by editing a conf file, provided the behaviour is documented).
¹- template.ParseFiles() is just an example, there is plenty of Go APIs where an unforgiving pruning can break stuff;
@Dr-Terrible govendor normally only copies *.go files and README / LICENSE style files.
However you can also specify grab everything in the repo. I think normal trimming is important. I also think being able to grab everything is an important out. Concrete uses are C interop and out of file resources (html for a web site).
IMHO, I think that what LK4D4/vndr cleans up is good enough, especially the pruning of _test.go files.
Maybe the manifest could have an option for telling what prunning policy do you want and by default just copies *.go and legal/license stuff. Since dependencies can have manifests themselves, they can take care of that. In the end, it would be the responsibility of the dependency's developers to ensure it can be correctly vendored.
TL;DR: IMHO, copying README/LICENSE/etc and all files that can be go or cgo source code (excluding tests) it's what I think should be on by default. Any other (reasonable) case should be possible to handle but in an opt-in way.
Maybe the manifest could have an option for telling what prunning policy do you want and by default just copies *.go and legal/license stuff. Since dependencies can have manifests themselves, they can take care of that. In the end, it would be the responsibility of the dependency's developers to ensure it can be correctly vendored.
I'm pretty certain it's the right thing to do to have the pruning policy set in the current project's manifest; otherwise, there's no way of knowing which subset of files to hash (assuming we do a subset). Plus, teams that commit vendor/ would be constantly annoyed at superfluous diffs in their vendor/ resulting from different people setting different pruning policies on the CLI.
I'm less sure about having each dependent project's manifest specify its own policy, though. For one, when I set the policy for my own project, my needs may be different than my dependees - different os/arch, and they probably don't need my tests. So, it might be useful to bubble policy suggestions up from deps' manifests to help the user set a policy in their own manifest, but I think directly reusing them might not work out well. It depends a lot on what constitutes a "policy."
(BTW, side note: the distinction between the powers afforded to the current/root project's manifest, vs. a dependent project's manifest, is encoded in gps.RootManifest vs. gps.Manifest).
Hi,
First of all, I would like to start by saying that I'm baffled by this note:
NB: keep in mind that the we see vendor/ as a temporary solution, and expect to eventually move to some newer, better version of GOPATH that largely obviates the need for it.
We don't have a tool that works good, out of the box with the current solution but we already deprecate the one that we are building... It's unfortunate and confusing to say the least.
Now, back on track to my issue. I've described this in Slack but here's the description of the issue again.
My colleague was using one of the popular vendoring tools. That said tool decided stripped out unused packages / files from the packages. Now that my colleague needed to use the said part of the missing dependency he was: "but why do I need to update my dependencies? I've already said it to vendor this package and all packages it contains".
From what I can see the upcoming tool will continue with this bad UX. Why is it a bad UX? Because people really expect a thing from a package manager that's very basic: copy the package contents from remote to vendor/ by default. If they either it needs a flag or reruns to have this functionality then the issue is revealed.
As such, I would highly vote/recommend/voice/shout/add opinion/etc. that the Package Manager should not do anything smart unless so instructed by the user. The user has to be in control on how this works because then the user can replicate this in any environment the tool will be used, be it a team of one or or a team of 500.
In ending of this, I would like to share a talk which made me rethink the way I approach problems when it comes to UX: https://www.youtube.com/watch?v=5tg1ONG18H8 I hope it will be of help as well.
From what I can see the upcoming tool will continue with this bad UX.
The entire purpose of this issue is to discuss what is safe to do, starting with some structured possible suggestions. If your perception of the proposal is that it's already decided we're doing all these things, then I'd encourage you to reread the proposal. In particular, this line:
It is unlikely that we could develop a pruning solution that would be as aggressive as every user wants, while still being safe for all other users.
Maybe that latter half isn't explicit enough: my own guideline is that safety comes _first_.
Personally, I'm becoming less and less inclined towards pruning at all because of the annoyances it creates for verification.
Because people really expect a thing from a package manager that's very basic: copy the package contents from remote to vendor/ by default.
I'm not sure that's the case (but I am sure that's an assertion absent evidence). For other package managers (in other langs) that include, say, build/codegen steps, the expectations are more complex than "just copying."
The user has to be in control on how this works because then the user can replicate this in any environment the tool will be used, be it a team of one or or a team of 500.
I addressed this in my comment immediately before yours: https://github.com/golang/dep/issues/120#issuecomment-275811629
We don't have a tool that works good, out of the box with the current solution but we already deprecate the one that we are building... It's unfortunate and confusing to say the least.
vendor is not a tool. It's an implementation detail that is used by tools to achieve desired outcomes.
Also, I find your objection here surprising, given that the impulse to minimize the contents of vendor is basically the sole reason anyone cares about pruning in the first place.
Could dep support features in https://github.com/sgotti/glide-vc? Removing nested vendors, stripping test, example files with sub-dependencies are necessary for large projects.
e.g.
glide update --strip-vendor
glide vc --only-code --no-tests
https://github.com/coreos/etcd/blob/master/scripts/updatedep.sh
We are evaluating dep as an alternative to glide, but dep does not seem to strip out anything, so /vendor directory size becomes too large.
Is there any large project using dep successfully that we can learn from?
Could dep support features in https://github.com/sgotti/glide-vc? Removing nested vendors, stripping test, example files with sub-dependencies are necessary for large projects.
@gyuho really, that's much of the question at hand here. We certainly _could_ do a lot of things, but there are steep tradeoffs to doing anything more than very basic removal. For correctness purposes, we have to prune out any nested vendor dirs (which we already do). Beyond that, though, I'm increasingly inclined to not do anything, and leave the responsibility to a separate tool (analogous to glide-vc).
Using such a a tool would make vendor/ verification impossible, but that might just be a cost folks will just have to bear for now.
Is there any large project using dep successfully that we can learn from?
I really hope that no one, especially big projects, is doing anything more than _testing_ dep at this point. That's why there are all those "pre-alpha" and "experimental" markers all over the README 😄
Has anyone given any more thought to the licensing issue raised by @sdboyer? Most FOSS licenses that I can think of would be fine with the pruning as long as the original license is retained, as it should fall under the "modify" part of licenses. Does anyone know of any OSI approved license (for software) that would take issue with this?
There are some licenses that require one to state "significant changes", for example Apache-2.0 and the GPL family of licenses (but not the BSDs). Removing a bunch of the original source code could conceivably trigger that requirement which means that if someone prunes dependencies but are unaware of the license requirement this might make them unintentionally violate the licenses of the projects they use. It seems undesirable to push the requirement to verify this onto users so is there some way they could be warned about that when they do use pruning?
@daenney The legal departments I'm aware of don't have a problem with it.
@kardianos Don't have a problem with it? Which it? Could you elaborate?
We use gvt for vendoring in weaveworks/cortex. Its quite a large project with a lot of dependancies. I've spent today trying to migrate it to dep - see https://github.com/weaveworks/cortex/pull/341. Problem right now is this brings in more than 3x more code than gvt, even after I've aggressively pruned out all tests and non go code - 3.2m loc vs 900k.
I like the unrolling of nested vendored directories (this is main motivation for migrating), but miss the behaviour that gvt has, where it doesn't copy in the entire repo, just the bits of it you need - and their dependancies. I think this is the main reason dep is bringing in more code.
Any chance this is something you'd consider? I'd be happy to code it up.
@tomwilkie see my earlier response - https://github.com/golang/dep/issues/120#issuecomment-284126173
I do realize that pruning is something people will need to do if they're committing vendor/, and even if it doesn't ultimately end up directly in this tool, it will need to exist, and I'd rather keep the discussion close. To keep things fully separated, I'd rather it not be e.g. dep ensure -prune; if instead, you wanted to experiment with a separate dep prune command, I think that might be a good place to start.
if instead, you wanted to experiment with a separate dep prune command, I think that might be a good place to start.
Sounds good. I'll let you know how I get on, can't promise anything soon though.
Turns out I got to it quicker than I thought...
As @gyuho mentioned, I also think that "dep" needs to have a way of cleaning up test files as well as unnecessary license files. I suppose, anyone who has touched large vendor trees would probably share the same concerns. Although the "dep prune" command already removes unused packages, it doesn't seem to clean up tests and license files. So I'm thinking about adding new options for the clean-ups.
From my understanding, however, "dep prune" could be split out into a separate tool in the future. In that case, it will be better to wait until the new prune repo could be created, right?
What's a better approach? Directly add those options to the current form of "dep prune", or wait until the new prune tool could be created?
/cc @tomwilkie
I also think that "dep" needs to have a way of cleaning up test files
+1, and other non-go files.
as well as unnecessary license files.
hmmm, I don't know about this. ~Most licenses (ie Apache 2.0, which covers lots of my world right now) need you to redistribute the license, and I wouldn't want to try and guess which licenses are safe to remove TBH.
From my understanding, however, "dep prune" could be split out into a separate tool in the future. In that case, it will be better to wait until the new prune repo could be created, right?
@sdboyer whats you're current thinking on this? I'm personally keen to see this stay as part of the dep tool, as for the time being I'm always going to be committing my dependancies - I've had too many repos disappear on me.
I'd like to see it rolled into dep ensure (as a flag, and even as a committed config file param or something) as I've already had an occurrence of someone forgetting to run prune and committing many millions of LOC of dependancies.
@sdboyer whats you're current thinking on this? I'm personally keen to see this stay as part of the dep tool, as for the time being I'm always going to be committing my dependancies - I've had too many repos disappear on me.
Ugh. I'm still pretty torn between the different guarantees I'd like us to be able to provide. But this really can't/oughtn't wait much longer, so lemme try to take a stab. Here's me, thinking through it in bullets:
dep ensure when writing out vendor/.vendor/ dirs.vendor/).I think that basic approach should more or less satisfy these competing requirements. It should allow us to have verifiable vendor dirs, while still having automatic pruning within dep ensure.
The catch will be that it won't be possible to verify a more-than-default pruned vendor/ without having the original source repositories present (possibly a problem for e.g. a network-isolated build server). But, I think that's a price we can afford to pay.
I realize this sketch is basically me drawing dots and leaving you to connect them - sorry 😄 i don't have time for a more detailed writeup just now. But it's a tentative "yes" to making pruning work as an automated part of ensure.
Thanks for the info @sdboyer; I'll go and read #121 and start a new issue to discuss plans.
@sdboyer You may be interested in govendor smarter tag ignoring logic:
https://github.com/kardianos/govendor/blob/master/context/tags_test.go
https://github.com/kardianos/govendor/blob/master/context/tags.go
I could contribute to dep if you'd like.
@kardianos oh oh oh interesting! yesyes, that work looks like the shape of what we'll ultimately need for #291. i'd be absolutely thrilled if we could incorporate what you've already done. that area is a bit particular, and i've not fully absorbed what you've written there, but this:
// However in govendor all questions are reversed. Rather then asking
// "What should be built?" we ask "What should be ingored?".
strongly suggests that we're in the right ballpark.
@sdboyer
- Pruning rules are a property for the manifest, not a CLI argument. This guarantees consistency across individual runs, and members of teams.
Does this mean we'll have the default CLI arguments only for the init command and then the settings would live in the Gopkg.toml file?
$ dep init --prune-flags=...
$ cat Gopkg.toml
[[ settings.prune ]]
tests = true
unused_pkgs = true
unnecessary_files = true
build_tag = ""
Or what's your thinking here?
For anyone interested, I'm using the following .gitignore rules to filter out unnecessary files:
# .gitignore
# Ignore everything in vendor/, except for *.go files,
# LICENSE and COPYING. Ignore Go tests.
vendor/**/*
!vendor/**/
!vendor/**/*.go
!vendor/**/LICENSE*
!vendor/**/COPYING*
vendor/**/*_test.go
Does this mean we'll have the default CLI arguments only for the init command and then the settings would live in the Gopkg.toml file?
I don't really see a need for flags at all - why add the complexity to init? Just manually add settings to Gopkg.toml (definitely something along the lines of your example), and subsequent dep ensure runs will apply it.
@sdboyer sounds good, as long as dep init writes the default values to the Gopkg.toml file, so I can tweak the settings right away without having to dig deep into golang/dep documentation.
@VojtechVitek should those settings for Gopkg.toml be working with a dep from current master?
@wedgeV sadly no, those are just a proposal. we still need someone to implement the guts, here.
Closed in favor of #944
Most helpful comment
It would, I don't think that level (os/arch tags) of pruning should be on by default, but pruning of unused packages should.