Cockroach: build: use `vendor` directory for dependencies

Created on 6 Feb 2016 · 37Comments · Source: cockroachdb/cockroach

Once we move to go 1.6 (and the vendor "experiment" becomes official) we should consider using the vendor system for our dependencies. glock (in its current version) can be problematic for people who only use a single GOPATH since it checks out our dependency revisions into what is supposed to be a shared space (so it may conflict with other projects that use a similar pinning strategy or just go get -u).

There is an unreleased branch of glock that supports the vendor directory (robfig/glock#28), but glide appears to be emerging as the de-facto standard and we should consider switching. It looks like it has an equivalent to glock's cmd feature (which is missing from other dependency tools).

C-enhancement

Source

bdarnell

All 37 comments

It looks like it has an equivalent to glock's cmd feature

Does it? https://github.com/Masterminds/glide/issues/172

tamird on 6 Feb 2016

There's this line in the readme: "In addition to fetching packages, Glide builds the packages with go install", which I took to mean that you could list a main package in a subpackages listing and it would be installed. (into $GOPATH/bin instead of the vendor subtree, so you could still get version conflicts if multiple projects share a GOPATH, but it's something). I haven't tested this, though, and based on that issue it sounds like I'm wrong.

bdarnell on 6 Feb 2016

As it turns out, it's possible to vendor standard library bits. From https://docs.google.com/document/d/1Bz5-UB7g2uPBdOx-rw5t9MxJwkfpx90cqG9AFL0JAYo/edit:

Update, January 2016: These rules do not apply to the “C” pseudo-package, which is processed earlier than normal import processing. They do, however, apply to standard library packages. If someone wants to vendor (and therefore hide the standard library version of) “math” or even “unsafe”, they can.

This means we can vendor net/http and locally patch https://github.com/golang/go/issues/14221 so that we can experiment with multiplexing RPC/PG on a single port during the Go 1.6 cycle.

tamird on 11 Feb 2016

Wow, I'm surprised this is actually possible. Positively. Though it's a bit
scary. This also opens up funny new venues like improved deadlock
detection, ...

On Wed, Feb 10, 2016 at 9:23 PM Tamir Duberstein [email protected]
wrote:

As it turns out, it's possible to vendor standard library bits. From
https://docs.google.com/document/d/1Bz5-UB7g2uPBdOx-rw5t9MxJwkfpx90cqG9AFL0JAYo/edit
:

Update, January 2016: These rules do not apply to the “C” pseudo-package,
which is processed earlier than normal import processing. They do, however,
apply to standard library packages. If someone wants to vendor (and
therefore hide the standard library version of) “math” or even “unsafe”,
they can.

This means we can vendor net/http and locally patch golang/go#14221
https://github.com/golang/go/issues/14221 so that we can experiment
with multiplexing RPC/PG on a single port during the Go 1.6 cycle.

—
Reply to this email directly or view it on GitHub
https://github.com/cockroachdb/cockroach/issues/4182#issuecomment-182674544
.

tbg on 11 Feb 2016

@mjibson Wondering what led to trying to use govender in #4620 vs glide or some other alternative.

petermattis on 25 Feb 2016

@petermattis I didn't try glide because @tamird reported having problems with it. I've used govendor on a few projects before and really liked it. I think it's better than godep because of how it arranges code and how easy it is to manipulate the vendor manifest and dependencies.

mjibson on 25 Feb 2016

glide was just horribly slow.

tamird on 25 Feb 2016

As @mjibson pointed out on #4620, something to consider as we evaluate these approaches, in addition to how we treat a potentially shared GOPATH, is that we're currently depending on the maintainers of our dependencies to continue to distribute them (and at the revisions we depend on).

Some vendoring approaches obviously address the distribution of our dependencies directly -- particularly the "check everything directly into /vendor" -- while we could probably adapt others too as well e.g. fetch from our own fork of every dep when checking out submodules or whatever.

dt on 23 Mar 2016

Another option, especially as we move towards a more formal release process, is to keep the main repo pulling from the upstream version control, but whenever we make a release we commit all of that release's $GOPATH/src (to a separate repo). This would ensure we never lose source that is necessary to reproduce a build without adding clutter to the main repo or making it more difficult to submit PRs upstream to dependencies.

bdarnell on 23 Mar 2016

Although I've dropped off the face of the earth from roachdb, I'd like to update this issue. One option is to use govendor to help manage deps. For instance you can run govendor migrate in the cockroach folder and it will switch the revision annotations over to the vendor meta-data file. You can then run govendor sync to pull down the stated revisions to the vendor folder. Subsequent calls to `govendor sync are cheap because the hash of the files are checked so no network operations are needed, unless the files are found to be out of sync.

Updating packages can be done with govendor fetch pkg@revision/version. Ignoring the source files in the vendor folder can be done by adding the ignore rule vendor/*/.

Lastly, govendor is flexible. By default the migration process from glock makes all deps "tree" deps, in that it doesn't prune unused packages. However you can remove them and add them again without that to just select used packages: govendor remove gopkg.in/inf.v0 && govendor fetch gopkg.in/inf.v0/...@3887ee99ecf07df5b447e9b00d9c0b2adaa9f3e4 only vendor used packages. You can also choose to ignore test files or other build tags. Commands can also be vendored just fine.

kardianos on 5 Apr 2016

(I'm willing to propose a PR if interested in a govendor based solution for the vendor folder.)

kardianos on 5 Apr 2016

Hi Daniel,

have you seen https://github.com/cockroachdb/cockroach/pull/4620?

I'm personally all for vendoring (but haven't looked at it much); maybe some of the comments (that led to eventually closing that PR) can be instructive. Essentially, it was not desired to check in a snapshot of all dependencies in our main repo. I'm also not terribly interested in introducing submodules, and we've talked about eliminating the dependency on upstream vendors' repositories. Maybe a single subtree/submodule'd vendor directory (which contains github.com/cockroachdb/vendor or something like that) which we maintain is the way to go, but that's likely not a standard tooling use case.

In any case, I'm glad you're interesting in improving the state of affairs here, and I hope some of the others involved in this thread will chime in as well.

tbg on 5 Apr 2016

We (@mjibson, @dt, @paperstreet) discussed having a github.com/cockroachdb/vendor repo that is a git submodule. This may be an acceptable solution to everyone. It gets us:

no repo bloat in our main repo
reproducible builds
still go get-able because if the vendor stuff isn't there Go will just use whatever is on disk or fetch the needed packages
only a small amount of git trickery, which can be automated in the Makefile

@bdarnell @tamird do either of you object to this solution (since you objected to the previous one)?

mjibson on 5 Apr 2016

Relevant thread with @tamird in it: https://github.com/golang/go/issues/7764
It's still go-getable, but only if the respective masters of the dependencies build (supposedly they mostly would).

tbg on 5 Apr 2016

@tschottdorf Yes I have. That was before govendor grew "sync" and "fetch" sub-commands. You can now use govendor without needing to checking dep sources.

Actually, govendor would work great with including a "vendor" repo. In part because the meta-data file "vendor.json" is stored _within_ the vendor folder, it enables this setup. You wouldn't need any git trickery at all. Just ensure you have both github.com/cockroachdb/vendor and github.com/cockroachdb/cockroach` cloned and up to date before you build, but other then that, the paths are all correct.

You could run cd $GOPATH/src/github.com/cockroachdb/cockroach && govendor migrate && govendor sync && mv vendor ../ && cd ../vendor && git init && git add .

You would probably want to finesse the actual deps in the vendor folder, but that would be very workable.

kardianos on 5 Apr 2016

only a small amount of git trickery, which can be automated in the Makefile

or go generate?

@tschottdorf Yes I have. That was before govendor grew "sync" and "fetch" sub-commands. You can now use govendor without needing to checking dep sources.

Excellent, glad to hear things have improved.

tbg on 5 Apr 2016

I don't think a vendor submodule addresses my objections; it still makes it difficult to hack on packages in that submodule (we used this approach at viewfinder). The "sync" and "fetch" commands sound like a better approach for me.

I think that for reproducible builds, we should be producing an archive of all the source used for each of our releases, including the entire vendor tree (perhaps in another git repo). This keeps the main repo optimized for development and ensures that we have at least coarse-grained reproducibility. We don't get complete reproducibility of historical builds for use with tools like bisect, but I'm OK with that (the old builds are still reproducible as long as no npm-style disasters happen upstream).

bdarnell on 5 Apr 2016

Looking at it again, we don't have to use a submodule at all. We can keep the vendor deps at github.com/cockroachdb/vendor and Go will pick them up. But this is beside the point.

This may need to be discussed offline, but we may just fundamentally disagree here. One of the features you want is to be able to easily hack on packages. I'm not sure of a reasonable way to do that without those packages being in the $GOPATH and not in our vendor directory. This means that they also have to be at a specific revision, which is one of the specific things I'm trying to remove. I think all of us want to remove the glock git checkout $SHA stuff because it messes up our $GOPATH for other Go projects. Someone in the office said today they have a second $GOPATH just for cockroach because of this. I'm also considering doing that, and it stinks.

Is there another solution here? I've now tried a few times and am out of ideas.

mjibson on 6 Apr 2016

Looking at it again, we don't have to use a submodule at all. We can keep the vendor deps at github.com/cockroachdb/vendor and Go will pick them up. But this is beside the point.

But then we'd need to do something to tie each version of the main repo to a version of the vendor repo, which is basically rolling submodules by hand.

Yeah, there's a philosophical difference: i would like per-project GOPATH to become the norm just like e.g. virtualenv in python; IMHO all this effort to make different projects coexist in a single GOPATH is misdirected. But I concede the point; we shouldn't be stepping on other things in the GOPATH (this includes both git checkout on other packages and installing our versions into GOPATH/bin).

One of the features you want is to be able to easily hack on packages. I'm not sure of a reasonable way to do that without those packages being in the $GOPATH and not in our vendor directory.

That's a good point, and may render all my objections moot. I'm not very familiar with the operation of the tools when a vendor directory is used, but it looks like what I want to do won't work. @kardianos, what's the workflow for using a local modification to a dependency? Do I hack on it in GOPATH and then repeatedly govendor add it for each change? Can I suppress the version of a particular package in the vendor directory and use the GOPATH version instead?

If we use govendor sync instead of checking in a vendor tree somewhere, then the sync tool doesn't have to write things into vendor: it could (optionally) write them into GOPATH for people like me who have adapted to this workflow and prefer it to the downsides of the vendor directory.

bdarnell on 6 Apr 2016

Here's one more vendoring approach, centered around submodules: https://github.com/kovetskiy/manul

tbg on 27 Jul 2016

While figuring out how to distribute /vendor is probably the most contentious question (submodule(s), just a manifest, etc), another potentially significant issue, which affects switching to any form of vendoring that places vendor in a checked out repo, is that tools and linters that recursively currently traverse over all directories will start traversing over vendored dependencies as well.

The most obvious of these is go itself -- the go team has unfortunately decided that the expansion of ./... will _not_ exclude vendor. This is unfortunate since we frequently want to test/lint/vet/super-extra-plus-lint packages we write, and usually do so my running blah ./.... The workaround suggested by the go team is literally to run blahgo list ./...|grep -v vendor``. This has some drawbacks: it complicates many linters and tool invocations, plus, in situations where directory traversal is slow (looking at you, docker), this is _much_ (15s+) slower.

_Any vendoring solution_ will need to figure out how linters and tools that traverse over the packages we write can be excluded from inspecting /vendor. We can use the above just-in-time filtering with combinations of go list or find and filtering of vendor or potentially more ahead-of-time filtering -- a text file containing the list of non-vendor packages, or even reorganizing all cockroach-authored packages under a single subdirectory of the repo, which could be safely traversed recursively without worrying about vendor at all.

Checking out/building an entirely separate vendor directory that is a _peer_ of the repo checkout (e.g. GOPATH/src/github.com/cockroachdb/vendor) would potentially avoid this problem. Unfortunately, unlike a submodule _in the repo_, such a directory would not be automatically fetched by go get which would thus assume our dependencies were unmet and fetch them to GOPATH (i.e. go get builds would not enjoy benefits of vendoring) and we'd still need some form of version mgmt on that dep (glock?).

dt on 6 Sep 2016

(revisiting this now that master/develop split is over)

To recap:

various tools can be used to make/maintain a vendor directory that has all our deps.
a git submodule of a snapshot of all our deps can keep them out of the repo history while owning distribution / reproducibility (i.e. avoiding a leftpad).
glock can still be used just to install executable tools from their vendored sources, pending some other approach to managing binary tools that can be discussed separately.

A big open question / challenge though is linters and other tools that want to traverse all the code _we author_, but need to avoid traversing vendor. When vendor lives in the repo root, as a peer of packages we author, this is messy.

Go's suggested workaround here is to, in the shell invocation, use find or go list to list files/directories/packages then grep them, e.g. go testgo list ./... | grep -v /vendor/``, but this can be clumsy and/or slow.

This mess can be sidestepped by avoiding having the vendor directory share the longest-common directory prefix of all cockroach code, either by moving vendor or by moving our code:

Moving vendor up a level, to be a peer of the repo, would get the property that a single directory (the repo) contains all of, and only, the cockroach-authored packages, but the now external vendor directory would once again need manual version mgmt (e.g. via glock), thus losing vendoring benefits for go get and, critically, preventing having multiple _self-contained_ checkouts.
Moving all authored code into a common subdirectory of the repo root (e.g. src) would allow self-contained checkouts that have both vendor and single directory/package-prefix for authored code, but the extra layer of directory / pkg name is strange/un-Go-like.

dt on 3 Oct 2016

Just in case someone else is going to try this, I was trying to be clever:

mkdir src
cd src
ln -s ../storage ./storage
cd ..
go list ./src/... # finds nothing, unfortunately

tbg on 3 Oct 2016

How about setting up our linters so that they operate on each package individually? We almost never create new top-level packages. We could even automate this with something a la find . -type d -a -maxdepth 0 -a -not -name 'vendor' -print0 | xargs -0 go vet.

tbg on 3 Oct 2016

Some linters can't do their work without operating on all the packages at
once (e.g. unused).

On Mon, Oct 3, 2016 at 12:12 PM, Tobias Schottdorf <[email protected]

wrote:

How about setting up our linters so that they operate on each package
individually? We almost never create new top-level packages. We could even
automate this with something a la find . -type d -a -maxdepth 0 -a -not
-name 'vendor' -print0 | xargs -0 go vet.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/cockroachdb/cockroach/issues/4182#issuecomment-251150015,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABdsPJcJmXUK-cGw3ztboq3YuC0dF4zwks5qwSlRgaJpZM4HU50n
.

tamird on 3 Oct 2016

Yeah, behaviors vary across linters, some (like unused) want to see all packages at once, others can't handle more than one argument, many use go package specs, but others use directories -- my previous proposed PRs have attempted to individually do whatever it took to make each happy, using a find like yours or a loop or a go list | grep as needed. So, while its certainly possible, doing so was messy (and frequently meant traversing the file system multiple times, which is quite slow inside docker), and seems like it will impose a continuing cost on adding/maintaining linters, so it seems worth exploring alternatives before going that route again.

dt on 3 Oct 2016

Next silly idea: we check in a file .golist which contains the up-to-date output of .golist, and generate that in go generate. Perhaps that mitigates the overhead of go list ./... enough (I also don't know why it would be slow, though I certainly remember that happening in docker images - if we found out why, perhaps this would be solvable)?

tbg on 3 Oct 2016

yeah, @tamird and I discussed that (maintaining it either with a CI linter, or in the builder before starting docker), and if we do go the "make the linters happy" with approach, we'll probably do something like that, but since not all our linters can even take more than one arg, it'll still add some additional overhead to adding and maintaining linters vs just having one dir we can say is the one to lint.

dt on 3 Oct 2016

As for the root cause: "docker I/O is slow".

On Mon, Oct 3, 2016 at 1:32 PM, David Taylor [email protected]
wrote:

yeah, @tamird https://github.com/tamird and I discussed that
(maintaining it either with a CI linter, or in the builder before starting
docker), and if we do go the "make the linters happy" with approach, we'll
probably do something like that, but since not all our linters can even
take more than one arg, it'll still add some additional overhead to adding
and maintaining linters vs just having one dir we can say is the one to
lint.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/cockroachdb/cockroach/issues/4182#issuecomment-251170811,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABdsPFFbqwgwrUCmVCyLHlaXy6QBdloyks5qwTw4gaJpZM4HU50n
.

tamird on 3 Oct 2016

One more silly one: Can we write a fast replacement for go list ./...? I'm sure find . -type d isn't slow even in docker and it should be easy enough to generate our list of packages from that with some mild poking.

tbg on 3 Oct 2016

I'm sure find . -type d isn't slow even in docker

You'd think that, but docker _never_ ceases to amaze:

$ time find . -type d | wc -l
    6179

real    0m0.193s
user    0m0.029s
sys 0m0.165s
$ ./build/builder.sh bash -c "time find . -type d | wc -l"
6179

real    0m37.028s
user    0m0.140s
sys 0m1.510s

dt on 3 Oct 2016

😕2

As far as directory locations and their tool friendliness goes, looking at the options above, I guess the important questions are:

How important is _self-contained_ checkouts (i.e. having more than one, at different revisions that use different deps) / go-gettable?
How gross would nesting all cockroach-authored packages under some common subdir of the repo root be?

If the answers to those questions are "very important" and "too gross", then by elimination we're down to figuring out how to make all the linters happy, and performant (e.g. the denormalized go list output, etc). However if we can compromise on either of those, we can probably dodge most of that work, and the ongoing tax on maintaining/adding new linters it implies.

dt on 4 Oct 2016

How important is self-contained checkouts (i.e. having more than one, at different revisions that use different deps) / go-gettable?

I personally don't care about multiple checkouts, but being go-gettable would be good and having the dependencies closely pinned is too.

How gross would nesting all cockroach-authored packages under some common subdir of the repo root be?

It's kind of gross, but that seems to be the only sane choice given the insane choice of including vendor in ./.... I wouldn't be surprised if larger projects gravitated towards putting none of their code in the root of the repo because of that decision.

With Cockroach, we have the privilege that almost all code is internal, so in theory we could move everything into internal (except for the cli commands, which are in ./cmd). I don't know if that would hide everything from godocs (in which we probably want s/internal/src).

Between the peer-repo, the linter tax and the subdirectory, I think the subdirectory is by far the best.

tbg on 4 Oct 2016

I'll whip up a vendoring PR that nests all our packages under something to see how it looks in practice... name bikeshed for this subdir:src ?

dt on 4 Oct 2016

👍1

Moving everything into a subdir sounds reasonable to me. The name src is fine, but there's also some precedent in other go projects (etcd, docker) for a pkg directory.

-1 on using internal for this; even though nothing's really designed for external use I don't think we want to make this a compiler-enforced restriction.

bdarnell on 9 Oct 2016

kubernetes also uses the name pkg