Dep: Absorb prune into ensure

Created on 4 Aug 2017  ยท  36Comments  ยท  Source: golang/dep

dep prune currently exists as a separate command. This is not optimal; we want to incorporate its behavior into dep ensure directly, obviating the need to run the extra command.

Projects relying on pruning will necessarily have to give up on (fast) vendor/ verification - #121, as that verification will need to rely on a hash of each dependency project's _entire_ tree, prior to pruning. That's just going to be a cost of doing business.

This will involve a few things:

  1. Create a set of flags (express as a bitset, e.g. type PruneOptions uint8) in gps representing the orthogonal, combinable pruning modes:
    i. Remove *_test.go files
    ii. Remove non-go files (except LICENSE, COPYING)
    iii. Remove unused packages
    iv. Remove nested vendor dirs (note that gps needs to have this as a flag, but dep unconditionally sets it - we do not give the user the option to NOT strip vendor directories)
  2. Introduce a new top-level section in Gopkg.toml that allows the user to specify each of these separately. I'm imagining something like this:
    toml [prune] non-go = true test-go = true unused-packages = true
    (All default to false)
  3. Modify gps.WriteDepTree() to take a struct with a PruneOptions bitfield. (Keep an eye on #903, this overlaps with that)
  4. Hack up the logic currently in cmd/dep/prune.go, moving it over into unexported functions in gps that gps.WriteDepTree() calls into itself to perform the pruning.

120 is absurdly long and less relevant now, so I'm going to close it in favor of this.

  • [ ] Remember to update the README when we're done with this!
enhancement help wanted Epic ensure

Most helpful comment

and we are finally, FINALLY ready.

All 36 comments

Can we keep LICENSE and COPYING files with a non-go option?

@VojtechVitek ๐Ÿ’ฏ yes, will update OP, i think it'd be needlessly creating pain not to do that

I assume you meant gps.WriteDepTree instead of gps.WriteVendorTree. ๐Ÿ˜‰

I already did 1, 2 & 3, I can send a PR (it can even be merged before 4 is completed since it doesn't impact behavior). I'm still working on 4 though.

@sdboyer Just to check, after reading the comments in #903 I decided to update the signature of gps.WriteDepTree and make it a method of gps.SourceManager. The new signature is:

func (sm SourceManager) WriteDepTree(basedir string, l Lock, options PruneOptions) error

I've moved the current implementation to SourceMgr and added a dummy implementation for the other dummy sms.

I assume you meant gps.WriteDepTree instead of gps.WriteVendorTree. ๐Ÿ˜‰

derp, yes. updated OP. once, long ago, it was called WriteVendorTree(), and i still have that stuck in my mind

I already did 1, 2 & 3 already, I can send a PR (it can even be merged before 4 is completed since it doesn't impact behavior)

๐ŸŽ‰ ๐ŸŽ‰ ๐ŸŽŠ ๐ŸŽŠ killin it!

@sdboyer The code I just uploaded to my version of dep creates a prune tree while it's verifying dependency hashes.

It might be advantageous to look at this and #121 together.

@karrick there's definitely a lot of overlap...or at least, adjacency...eventually between pruning and verification, but if the prune tree is the one we discussed in #121 (i haven't looked yet), then i think that's a separate kind of pruning from what's going on here.

now, it's possible that we might be able to join these together. however, my current plan (as noted in the OP) is that if you do the kind of pruning this issue is focused on, then you don't get vendor/ verification. that's probably a bridge we COULD cross, but i'd want that to be a separate thing, as there's a long-term goal of having the hash digests we put into Gopkg.lock be the same as the ones that end up being used for content-addressability in the "third space".

I'd like a flag to turn off pruning, for situations where speed is essential such as CI.

@DocMerlin if you care about speed:

  • vendor your dependencies
  • don't run dep on CI

i don't think pruning, at least once absorbed into ensure, is ever likely to contribute much to slowdowns. i'd need to see evidence to the contrary before i could consider adding another flag.

I'd like a flag to turn off pruning, for situations where speed is essential such as CI.

Yeah, pruning will likely be even faster, since we can avoid all the extra I/O operations.

Let's not add any more flags. This should be the default behavior imho.

Yeah, pruning will likely be even faster, since we can avoid all the extra I/O operations.

for now, at least, we generally have to write out the full tree, _then_ prune items out, so we can't realize this gain right now. but you're certainly right that we have the headroom to be able to explore avoiding writing files that will be pruned in the first place.

Can we please also have some more fine grained options for when needed.

For example when using a dependency such as https://github.com/capnproto/go-capnproto2, I would like to still vendor the https://github.com/capnproto/go-capnproto2/tree/master/capnpc-go directory (and all its content) and the https://github.com/capnproto/go-capnproto2/blob/master/std/go.capnp file. This is useful as those files are still dependencies for our builds, even though they are not used go packages in our project or go-files.

In such cases it could be great if we could add some exception filters or w/e on a per dependency basis. Perhaps this scenarios is rare. But without such a possibility I would be resorted to use some shell script on top of dep to copy those files over afterwards somehow (during my bi-weekly dependency upgrade cycle).

question about pruning / might be related, is there similar configuration as the govendor ignore build tag flag? use case is to ignore all *_test.go and appengine related code from vendor. The ignored config in the toml doesn't seem to be working as expected.

I'm not too certain what we're trying to solve with dep prune.

It seems like there's risk in pruning files that need to be around for a project dependency to build.

I'm not too certain what we're trying to solve with dep prune.

my initial take was more along these lines, but many folks consider this a hard requirement for using dep, as they want to keep the size of their vendor directory minimal. so, i relented - see #120 for the history there.

question about pruning / might be related, is there similar configuration as the govendor ignore build tag flag? use case is to ignore all *_test.go and appengine related code from vendor. The ignored config in the toml doesn't seem to be working as expected.

ah, that's a separate thing. pruning is strictly about removing excess files from vendor/. what you're after, i think, goes deeper, as it speaks to what paths we actually follow when analyzing code. it's on the roadmap, but backburnered, as it's not something people have been really coming out of the woodwork to demand - #291

Can we please also have some more fine grained options for when needed.

it's possible that we could add some more rules, but we need to be careful about this. we're going to have vendor/ verification coming pretty soon, which is going to be a _drastic_ performance improvement (and reliability as well) - we'll no longer need to rewrite all of vendor/ on each dep ensure run. that's going to be predicated on having a new hash digest value in Gopkg.lock - something like tree-digest this:

[[projects]]
  branch = "master"
  name = "github.com/foo/bar"
  packages = ["."]
  revision = "cd8b52f8269e01eb486dfeef29f8fe4d5b397e0b"
  tree-digest = "0ba2aa4b3860c5fcc1219b31afaf30c1fcf3b97cbb0a5116b498cafedbb06a1b"

that'll be a hash digest of the contents of the original, unpruned tree. awesome if you're not pruning, but useless if you are - the digests won't match. we need another hash of the tree AFTER pruning for it to be useful:

  tree-digest = "0ba2aa4b3860c5fcc1219b31afaf30c1fcf3b97cbb0a5116b498cafedbb06a1b"
  pruned-digest = "1ed417a0bec57ffe988fae1cba8f3d49934fb873394d61844e0b3c96d69573fe"

if we're going to do that, though, then we need new ways of answering the question of "are imports/Gopkg.toml in sync with Gopkg.lock?" - because if the pruning options in Gopkg.toml change, then the pruned-digest in Gopkg.lock will need to change, as will the contents of vendor. so, we can't name it pruned-digest, as that obfuscates the pruning options that went into it.

the main reason i proposed pruning be controlled via a set of boolean flags is because it would make a comprehensible way of capturing what the applied pruning options are. say V represents vendor stripping, U unused packages, T test files, N non-go files. we might then represent the tree digests like this:

  [tree-digests]
    V = "0ba2aa4b3860c5fcc1219b31afaf30c1fcf3b97cbb0a5116b498cafedbb06a1b"
    VUTN = "1ed417a0bec57ffe988fae1cba8f3d49934fb873394d61844e0b3c96d69573fe"

note that i'm not saying this will be the final form - maybe we use full words, maybe we express as the integer value of the bitfield; maybe we only include the explicit pruned value, rather than always having just the V type. lots of possibilities, i'm just sketching it out. point is, simply by virtue of what key(s) are present, we know _which_ pruning options were set in Gopkg.toml, and thus can tell when we need to sync those up.

if we provide anything other than boolean controls here, then we'd likely have to just hash the prune options. i think that would probably work, but it'd be a hash key pointing to a hash value - utterly useless for understanding what's going on.

we can certainly think about adding more boolean options - for example, we could divide "non-go" files into strictly *.go files vs. all the other types of possible files (.c, cc, .h, etc...) files. or something else ยฏ\_(ใƒ„)_/ยฏ.

now, maybe those wouldn't cover your *.capnp example. if so, you'd have to just not set the N flag for that particular dependency. but, i think it's preferable for us to walk this line of comprehensibility and stick with flags, rather than giving more arbitrary control. any individual user can always choose to not care about dep's verification&fast paths, and implement their own, more fine-grained pruner. but minimizing vendor size is an exercise with seriously diminishing returns, and i'll happily trade some small efficiency losses there for greater comprehensibility.

@GlenDC It's worth it to reaffirm that pruning (beyond removing nested vendor dirs) is opt-in and not forced. So in the worst case scenario, you can just keep all options off and now files will be deleted.

I was thinking of adding an exception list to prune options, but that would complicate vendor verification in addition to the points @sdboyer mentioned.

@pxue The ignored config is for ignoring packages by names.

As Sam said, we still don't have a mechanism to take build flags into accounts while solving dependencies. So, currently, we just consider any import in all Go source code files as a required import.

In the context of pruning, I'm not sure it's a good idea to prune files based on build tags...

will the top-level [prune] section of Gopkg.toml support for listing file types to keep, ie. in my case I'd like to vendor all *.proto files

@pkieltyka no, for the reasons in my previous comment.

if I understand your comment above correctly then - if pruning settings can be set at a high-level and then configured per package in the toml file then that'll work just fine. Thanks and looking forward to this stuff, dropping the prune cli command and making the tree verifiable is a great direction

yep! the only thing is that the properties will be settable per whole project, not per package. so, more possibility for a bit of waste there, but I think still worth the trade-off.

On September 9, 2017 2:30:02 PM EDT, Peter Kieltyka notifications@github.com wrote:

if I understand your comment above correctly then - if pruning settings
can be set at a high-level and then configured per package in the toml
file then that'll work just fine. Thanks and looking forward to this
stuff, dropping the prune cli command and making the tree verifiable is
a great direction

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/golang/dep/issues/944#issuecomment-328295392

A bit of waste? I pruned the vendor dir of one of our project yesterday and that commit was about 2 million deletes...

context matters. "a bit of a waste" is in reference to the idea that
_some_ users may need to turn off _some_ of the pruning flags on _some_
of their dependencies because there's something actually valuable that's
being pruned.

On 09/09 12:31, Glen De Cauwsemaecker wrote:

A bit of waste? I pruned the vendor dir of one our project yesterday and that commit was about 2 million deletes...

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/golang/dep/issues/944#issuecomment-328298726

Yes you are right @sdboyer, sorry for that. If for whatever reason it would be not feasible to have such fine grained (exception) rules, I would also be happy with @pkieltyka's proposal of being able to disable it for a specific dependency (package). As that would allow me disable pruning the capnproto/go-capnproto2 package, while all the other dependencies could still be pruned.

I guess I reacted like this as you seemed to say in your previous comment "...the properties will be settable per whole project, not per package...", which makes me think that you mean that this cannot be set on a per dependency level? But perhaps there is a miss communication? As now you seem to say "...users may need to turn off _some_ of the pruning flags on _some_ of their dependencies...", which is what I think @pkieltyka is saying. But i'm not sure if you're just repeating him, or that you're saying that that is indeed what you wish to support. Because if it's the latter than I can live with that.

ahh! yes, sorry, so - "project" is the term we use to describe an individual unit of versioning & dependency - generally, the tree of packages contained in a single repo. so, yes, you set the rules on a per dependency basis. you just can't have different rules per package within a dependency.

On September 10, 2017 2:45:58 PM EDT, Glen De Cauwsemaecker notifications@github.com wrote:

Yes you are right @sdboyer, sorry for that. If for whatever reason it
would be not feasible to have such fine grained (exception) rules, I
would also be happy with @pkieltyka's proposal of being able to disable
it for a specific dependency (package). As that would allow me disable
pruning the capnproto/go-capnproto2 package, while all the other
dependencies could still be pruned.

I guess I reacted like this as you seemed to say in your previous
comment "...the properties will be settable per whole project, not per package...", which makes me think that you mean that this cannot be
set on a per dependency level? But perhaps there is a miss
communication? As now you seem to say "...users may need to turn off _some_ of the pruning flags on _some_ of their dependencies...", which is what I think @pkieltyka is saying.
But i'm not sure if you're just repeating him, or that you're saying
that that is indeed what you wish to support. Because if it's the
latter than I can live with that.

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/golang/dep/issues/944#issuecomment-328363105

I have just hit a case where dep prune removes files from a vendored internal dependency that prevents go install . from working from within that vendored directory. We use this dependency in go imports and as a generated code build tool. This dependency is https://github.com/Workiva/frugal.

@markcampanelli-wf mmm, troubling! could you open a separate issue for that, with some more detail?

@sdboyer Just filed https://github.com/golang/dep/issues/1350

dep prune breaks installation of vendored package

What's the status of this? It looks like some relevant PRs have been merged, but I can't tell whether this is nearing completion or not. Does it seem close? :)

@vergenzt Almost there. After #1226 gets merged, I'll push one more commit to complete #1219 and then we should have the initial implementation of this feature in dep.

Any update? Both #1226 and #1219 have merged.

they've both been merged into a temporary release branch to coordinate change sets prior to release. we're now blocking on getting the release ready, which is mostly a docs thing - #1499. we don't want to put all this pruning stuff in until just before release, as it will make tip workflows uncomfortably at odds with existing releases.

gonna be moving to a regular release schedule after this one, to hopefully make these lagging releases less of a problem.

On January 9, 2018 11:29:29 AM PST, Jeff Grafton notifications@github.com wrote:

Any update? Both #1226 and #1219 have merged.

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/golang/dep/issues/944#issuecomment-356388664

and we are finally, FINALLY ready.

It's merged in, closing this. Release coming presently...

/cc @fejta

Was this page helpful?
0 / 5 - 0 ratings