Go: cmd/go: add GOMODCACHE

Created on 25 Sep 2019  Â·  39Comments  Â·  Source: golang/go

Summary

Add GOMODCACHE to control where the module download cache lives. Its default can continue to be GOPATH[0]/pkg/mod, and the variable would be very similar and consistent with GOCACHE.

Description

The module download cache has lived in GOPATH[0]/pkg/mod/ since it first appeared. It's understandable why it doesn't live under GOCACHE, where the build cache is located; builds are generally fast and reliable if one has the source, but downloading a module from the internet isn't nearly as reliable.

I also understand why it was put under GOPATH; until recently, it was the only persistent directory that Go made use of. That only changed in the last release, with the addition of os.UserConfigDir()+"/go/env" for go env -w.

However, there's no way to configure where the module download cache is located. For example, this is useful in CI environments to place the build and module download caches somewhere that's persisted between builds.

The only way to store the download cache elsewhere is to move GOPATH entirely. This has several disadvantages:

1) GOPATH contains much more. For many users, it still contains code. For almost everyone, it also contains the installed binaries, unless they've set GOBIN. It's too big of a knob to just change the location of the module download cache.

2) Many environments explicitly set GOPATH, such as the golang:1.13 image, meaning that we can't simply go env -w GOPATH=/custom/path just like we could with GOCACHE. This makes the module download cache harder to deal with than the build cache, for no apparent reason.

3) GOPATH's future is uncertain; it might contain more in the future, or it might go away entirely. Relying on it to set the module download cache location is not a good long-term plan.

This idea first came up in https://github.com/golang/go/issues/31283, which was closed by its author. I left a comment there a while ago, but thought it would be better to open a new proposal.

GoCommand NeedsFix Proposal Proposal-Accepted modules

Most helpful comment

I talked to @bcmills, @jayconrod, and @matloob (the team working on the go command) two weeks ago about this, but I forgot to summarize here. I think we all agree about the following.

We should add GOMODCACHE, defaulting to GOPATH/pkg/mod. Setting it would redirect _all_ current accesses in GOPATH/pkg/mod to GOMODCACHE instead.

We should not move GOPATH/pkg/sumdb, nor make it controlled by GOMODCACHE.

The module cache is just that: a _cache_. If the module cache is removed, it can safely be refilled by redownloading. During the redownloading, all the bits will be reverified using the checksum database,
which is what makes it safe to start over. No one can replace the bits you used to have with different bits. Even large fragments of the checksum database are cached in GOMODCACHE. Those are okay to delete too; they'll be reverified on download as well.

The one thing that makes all of this safe is the single file GOPATH/pkg/sumdb/sum.golang.org/latest. It contains the single hash that all of this other verification relies upon. It _cannot_ be deleted without giving all that up. It therefore does not belong in the module cache next to all the deletable things. (That's why it's not.)

CI/CD systems that want to point GOMODCACHE at a different directory should not need to worry about GOPATH/pkg/sumdb. As long as the repo being built has a complete go.sum, the go command will not do anything with the checksum database. I believe -mod=readonly will keep from even attempting to use the checksum database, and that's what CI/CD systems should be using anyway.

All 39 comments

One unfortunate side of this proposal is that GOCACHE seems to mean "directory holding all Go cached data", while GOMODCACHE would be separate. If we had a time machine, I'd argue that GOCACHE should be called GOBUILDCACHE instead.

An alternative idea is to eventually re-define GOPATH as a directory to store persistent data that doesn't belong in the short-lived GOCACHE nor in <config>/go/env. That would solve points 1 and 3, but not 2.

An alternative partial solution to point 2 is to stop explicitly declaring GOPATH in the official Docker images. That might break users who haven't switched over to $(go env GOPATH) to handle the default properly, but I hope that eventually the env var can be removed anyway.

Thanks for making this proposal. I briefly recall conversations in the past that we might want to move the module cache out of $GOPATH/pkg eventually, but it's helpful to have something more actionable open.

I would like to see the module cache to not be in GOPATH in the future. If it's viable, I think a better outcome is if we can do this without adding configuration. That may not be viable.

I think we have multiple options if moving the module download cache is an option. For example:

  • Put it under GOCACHE, since it's a cache after all - just an expensive one to rebuild. One could end up with a directory structure like GOCACHE/build and GOCACHE/mod, to separate the cheap build cache.
  • Put it alongside go-build in os.UserCacheDir, for example <go-mod>. Though for consistency, you'd probably still require GOMODCACHE.

Both of these options put the expensive and cheap caches in the same place by default, though. If we want to properly separate the moduel cache from the build cache, here's another idea:

  • Put it under os.UserDataDir, which in an ideal world would be a suitable and portable place to put non-config, non-temporary data.

CC @jayconrod

Adding a GOMODCACHE variable to control the cache location makes total sense to me.

Moving the default location of the module cache seems more disruptive. Should we have some logic that checks if $GOPATH[0]/pkg/mod exists, then if not, some directory in os.UserCacheDir()?

I think if we're happy with adding an env var, we shouldn't worry about moving the default location right now. I only replied to that point to show some thoughts that might be helpful in the long term, if we do want to move it at some point.

I found myself wanting a GOMODCACHE variable recently for another use case:
If for example you have a local machine running an IDE talking to a remote server where your source tree lives mounted over NFS, Go modules are going to cause a bad time due to their use of flock, which NFS does not support.

go: failed to lock file at /Volumes/nfs/go/pkg/mod/cache/lock

Of course there is a workaround by creating a dummy GOPATH tree on the local machine and configuring the IDE to prepend that GOPATH ahead of the real one, so that it caches modules locally. That's a rather counter-intuitive thing to do though, and a direct environment variable for controlling the mod directory would be a nice feature.

Please don't move the default location of the download cache into GOCACHE.
Quoting https://groups.google.com/forum/#!msg/golang-dev/RjSj4bGSmsw/KMHhU8fmAwAJ:

I understand why this would seem surprising at first, but cached compilation resuilts and cached downloaded source code are a bit different.

The build cache ($GOCACHE, defaulting to $HOME/.cache/go-build) is for storing recent compilation results, so that if you need to do that exact compilation again, you can just reuse the file. The build cache holds entries that are like "if you run this exact compiler on these exact inputs. this is the output you'd get." If the answer is not in the cache, your build uses a little more CPU to run the compiler nstead of reusing the output. But you are guaranteed to be able to run the compiler instead, since you have the exact inputs and the compiler binary (or else you couldn't even look up the answer in the cache).

The module cache ($GOPATH/src/mod, defaulting to $HOME/go/src/mod) is for storing downloaded source code, so that every build does not redownload the same code and does not require the network or the original code to be available. The module cache holds entries that are like "if you need to download [email protected], here are the files you'd get." If the answer is not in the cache, you have to go out to the network. Maybe you don't have a network right now. Maybe the code has been deleted. It's not anywhere near guaranteed that you can redownload the sources and also get the same result. Hopefully you can, but it's not an absolute certainty like for the build cache. (The go.sum file will detect if you get a different answer on re-download, but knowing you got the wrong bits doesn't help you make progress on actually building your code. Also these paths end up in file-line information in binaries, so they show up in stack traces, and the like and feed into tools like text editors or debuggers that don't necessarily know how to trigger the right cache refresh.)

I expect there are cron jobs or other tools that clean $HOME/.cache periodically. If part of the build cache got deleted, it would be no big deal, so it's fine to store the build cache there. But if downloaded source code got deleted unasked, I think that would potentially be quite surprising and problematic in various ways. That's why we store the source code in $GOPATH/src/mod, to keep it away from more expendable data.

Adding GOMODCACHE seems fine to me, and @jayconrod said that too. @bcmills?

Just to echo @rsc and my earlier comment: adding GOMODCACHE seems fine. Let's not change the default location though.

Agreed: a GOMODCACHE variable seems useful enough.

The non-parallel naming with GOCACHE is unfortunate, but probably not unfortunate enough to warrant anything as drastic as renaming GOCACHE or relocating the default GOMODCACHE.

As a quick drive-by thought, if we want to fix the inconsistency with GOCACHE, we could deprecate it in favor of something like GOBUILDCACHE, but keep accepting GOCACHE as a fallback. That still wouldn't be perfect though, as it's not just the build cache, but also the test cache. So I agree that it's not worth the effort, and GOCACHE is good enough as it is.

An other value of GOMODCACHE is to change it per project. For example to keep a different cache for volatile project and one for persistent project.
With GOMODCACHE=./cache it could even replace ./vendor !

Why not call it GOMODPATH to avoid the unfortunate non-parallel naming? (It is not a cache like build cache or test cache, but more of a source code path like $GOPATH/src). @mvdan

@h12w because everywhere in the docs it's called module cache or module download cache. I think "cache" is an okay name, and the cost of changing it is too high - confusing a large portion of Go developers.

A module cache directory/path is a module directory/path. The word cache could be omitted because there are no "non-cache" directories for modules. Looking through go env, there are also GOTMPDIR and GOTOOLDIR, so IMHO, both GOMODDIR and GOMODPATH are okay names too. @mvdan

Since we can't use GOCACHE, then I think GOMODCACHE is our best option. Because we already have some things that are using MODCACHE, such as go clean -modcache. I think DIR is weird (at least we have never used) and one GOPATH is enough, please don't bring more PATHs. 😂

What @aofei said. We don't do go mod clean -modpath or go mod clean -modcachepath.

Change https://golang.org/cl/219538 mentions this issue: cmd/go: allow configuring module cache directory with GOMODCACHE

Based on @jayconrod and @bcmills's comments on the codereview, (and offline discussions I had with them) it looks like there are two unresolved issues we need to figure out before we can move forward. I'll let them correct me if I have a misunderstanding:

First, this proposal is to create a GOMODCACHE set to $GOPATH/pkg/mod by default. This leaves open the question of w here the sumdb directory goes. Currently, it lives in $GOPATH/pkg/sumdb. I think we wouldn't want to put the sumdb directory in $GOMODCACHE/../sumdb: it would be surprising behavior to create a new directory side-by-side with the user supplied directory name. Two alternatives would be (a) to start putting the sumdb in $GOMODCACHE/sumdb, and migrate over the old sumdb at $GOPATH/pkg/mod to the new location. The alternative is to set GOMODCACHE to $GOPATH/pkg by default, and put put the module cache in $GOMODCACHE/mod and the sumdb in $GOMODCACHE/sumdb. My preference is for the first option because the second option is a bit clunky.

The second issue isn't directly addressed in this proposal, but in our discussion, it seemed worth bringing up. Do we want to specify a relationship between the GOMODCACHE variable, and what we could set GOPROXY to be to proxy from the module cache? According to the proposal we have, it would be GOPROXY=file://$GOMODCACHE/cache/download.

My preference is for the first option because the second option is a bit clunky.

I agree - as long as the migration of the old sumdb dirs is smooth.

Do we want to specify a relationship between the GOMODCACHE variable, and what we could set GOPROXY to be to proxy from the module cache?

It seems to me like that could be decided as a separate issue, as long as we don't close that door in this proposal.

Okay, @bcmills and @jayconrod, what do y'all think?

I did a bit of digging, and it looks like the relevant directories are currently:

  • GOPATH/pkg/mod contains extracted modules.
  • GOPATH/pkg/mod/cache contains the download directory and one lock file. Seems like kind of a waste of a level of directory...
  • GOPATH/pkg/mod/cache/download contains, essentially, the “local module proxy“: cached .zip, .mod, and .info files.
  • GOPATH/pkg/mod/cache/download/sumdb/sum.golang.org contains cached tile and lookup information from the checksum database.
  • GOPATH/pkg/sumdb/sum.golang.org/latest contains the latest signed tree head from the checksum server.

If not for that last entry, I think we would pretty clearly want to slot in GOMODCACHE as GOPATH/pkg/mod, and not worry too much about the remaining nesting. However, it seems pretty unfortunate to leave that signed tree head behind.

That path comes from here:
https://github.com/golang/go/blob/1fb7d5472e8f46faaa034fe6e16ca66a1e7c766f/src/cmd/go/internal/modfetch/sumdb.go#L194-L202

It is mentioned in this part of the checksum database design doc:
https://go.googlesource.com/proposal/+/master/design/25530-sumdb.md#command-client
Unfortunately, the design doc doesn't explain _why_ that file is where it is, so I think we'll need to follow up with @FiloSottile and @rsc to figure out whether it can be safely relocated.

I suspect that the reason it is not stored within GOPATH/pkg/mod is so that the signed tree head is not removed by go clean -modcache: that way, the signed tree head never regresses. However, I do not know how important that property is to the overall design.

I suspect that the reason it is not stored within GOPATH/pkg/mod is so that the signed tree head is not removed by go clean -modcache: that way, the signed tree head never regresses. However, I do not know how important that property is to the overall design.

That's correct. It's very important because it's what guarantees that the log is indeed append-only, and it's what lets us worry less about go.sums outside of a module, for example. Every time that's deleted there is a window for the tree to get forked.

I'd prefer it didn't end up in a cache directory. If we are clearing pkg/, maybe UserConfigDir?

I'd prefer it didn't end up in a cache directory. If we are clearing pkg/, maybe UserConfigDir?

UserConfigDir seems like a fine choice. The signed-tree-head file is small, and we already store a go/env file there.

However, note that UserConfigDir may fail if (for example) neither HOME nor XDG_CONFIG_HOME is set to a non-empty value. (We ran into trouble with that when we started requiring a build cache: see #29267.)

I think it makes sense to move the STH out of GOPATH. We should avoid requiring a GOPATH directory in module mode. Since the STH needs to be somewhat durable, it should not be part of any cache (GOMODCACHE, GOCACHE, UserCacheDir). UserConfigDir is the only other place that makes sense.

Some open questions though:

  • If $GOPATH/pkg/mod/sumdb already exists, should we keep it there or actually move it? What if it also exists in UserConfigDir, for example, because both 1.14 and 1.15 are used on a system?
  • What if there is no UserConfigDir (HOME and other environment variables are not set), but GOPATH is set (for example, in a Docker container)? Is that an error, or should we continue using $GOPATH/pkg/mod/sumdb?
  • What if neither GOPATH nor HOME is set? I think this is currently an error right now because there's nowhere to put the module cache.

IMO, we should move $GOPATH/pkg/mod/sumdb to somewhere in UserConfigDir if that directory is defined, but we should continue to use $GOPATH/pkg/mod/sumdb if UserConfigDir is not defined.

In alpine linux we're thinking on how to cache source archives.
For build servers it is reasonable to setup goproxy, but for maintainer dev machines it is not.

Downloaded archives go into $GOPATH/pkg/mod/cache/download and currently there is no way to alter this location, except to alter GOPATH, which is suboptimal because:

  1. it contains installed binaries, sumdb and unpacked sources.
  2. files in pkg/mod are created unwritable which is great for normal go development, but in APKBUILDs we want to set -modcacherw (available in go1.14) to avoid chmod-clean
  3. changes in unpacked sources are undetected, unless you run "go mod verify"
  4. APKBUILDs can run stuff like "go clean -modcache" that wipes everything in pkg/mod.

Obvious way for us is to make archive cache outside of GOPATH and outside of proposed here GOMODCACHE, maybe with yet another envvar.
Is this reasonable?

GOPATH/pkg/mod/cache contains the download directory and one lock file. Seems like kind of a waste of a level of directory...

There is also cache/vcs, which is created when GOPROXY=direct

@kaey What do you mean by source archives? Like the zip files that are currently in $GOPATH/pkg/mod/cache/download? Could you say more about how you're using these? For example, do they need to stay in the module cache, or are they moved and distributed somewhere else? You mentioned an archive cache. Is that intended to be used as a module proxy with a file:// URL?

What do you mean by source archives? Like the zip files that are currently in $GOPATH/pkg/mod/cache/download?

Yeah those ones.

Could you say more about how you're using these?

We build packages for linux distro and want to avoid redownloading sources for every rebuild. Any method will do actually and for servers (builders and ci) setting GOPROXY seems optimal to me. Currently altering GOPATH to a shared location is usable (but see my previous message), but usually it's set inside temporary build dir, which is cleaned when build is done.

are they moved and distributed somewhere else

GOPROXY seems optimal for this usecase, so no, it just have to be some local shared space to be reused across different builds by go build directly via GOPROXY=file://... or some other way

Is that intended to be used as a module proxy with a file:// URL

Not really, but it is a possibility. If we had a way to run something like go mod download during our fetch() phase that would put zips in a specified directory without unpacking them, then we'd be able to set GOPROXY=file:// during build() and test() phases.

@kaey Thanks for explaining.

In general, I'd very much hesitate to add an option that would split the module cache into more pieces. Doing so adds complexity to the command line interface (more environment variables or flags for users to worry about). There's also complexity caused by different versions of Go interacting with each other (old versions won't respect the new options and will create redundant data). I think we should only consider this if there's a problem that affects large number of users that can't be solved another way.

It sounds like a local proxy would be a good solution to the problems you describe. It could be a file server that sets GOMODCACHE (to serve out of its own directory) and serves files, running go mod download when something is missing. Alternatively, there are proxies like Athens that can run out of a Docker container and can use shared storage.

You mentioned earlier a proxy could be used on CI but not on dev machines. Could you say more about why that is? Is it just the overhead of setting it up, or is there a technical barrier?

There's also complexity caused by different versions of Go interacting with each other (old versions won't respect the new options and will create redundant data)

This proposal already creates such issue.

I think we should only consider this if there's a problem that affects large number of users that can't be solved another way.

It is supposed to help all users building packages in a clean environment (distribution packages are a common example)

running go mod download when something is missing

This is actually what I done originally

# SRCDEST contains source cache
# srcdir is a temporary directory where sources are unpacked
export GOPATH=$srcdir/go
fetch() {
    mkdir -p $SRCDEST/go $srcdir/go/pkg/mod/cache
    ln -nfs $SRCDEST/go $srcdir/go/pkg/mod/cache/download
    go mod download
    cd $srcdir/go/pkg/mod
    chmod -R u+w $(ls | grep -v '^cache$') # Though this can now be replaced with -modcacherw
}

While possible, games with symlinks are always annoying. We'd need to add some way to mark package as go, so that symlink is created automatically.

You mentioned earlier a proxy could be used on CI but not on dev machines. Could you say more about why that is? Is it just the overhead of setting it up, or is there a technical barrier?

Alpine packages are built using alpine-sdk. It consists of compilers, binutils, helper utilities (like signing) and abuild tool written in shell that combines all that together. Devs download "ports" repository (https://git.alpinelinux.org/aports/tree/main), cd into directory with package they want to build and run abuild -r, which results in a repo and a built package in $REPODEST. This directory can be then served over https directly to users.
Thats it, there is pretty much no configuration, no complex runtimes and no long running servers.

Alternatively, there are proxies like Athens that can run out of a Docker container and can use shared storage.

That would require configuring such server and managing it. Most maintainers are not go experts and do not want to understand all language's quirks. Running goproxy server for the duration of abuild seems overcomplicated.

I did have another idea of supporting GOPROXY=cmd:///path/to/binary protocol. In that case go would execute provided binary with url path as first argument and read response from stdout (or error from stderr). This doesn't require adding new envvars or keeping long running servers and allows for GOMODCACHE directory layout to be an implementation detail.

Running goproxy server for the duration of abuild seems overcomplicated.

Note that you can use a file:// URL as a valid GOPROXY. There is no need to run an explicit server if you already have all of the files cached.

But this is really getting off into a tangent. #35922 is probably a more relevant venue for this discussion.

There is no need to run an explicit server if you already have all of the files cached

I do not and I already mentioned above what it takes to use go mod download see fetch() function example above.

But this is really getting off into a tangent. #35922 is probably a more relevant venue for this discussion.

My original proposal is still about expanding scope of GOMODCACHE, I only provide reasoning and potential alternative approaches.

I talked to @bcmills, @jayconrod, and @matloob (the team working on the go command) two weeks ago about this, but I forgot to summarize here. I think we all agree about the following.

We should add GOMODCACHE, defaulting to GOPATH/pkg/mod. Setting it would redirect _all_ current accesses in GOPATH/pkg/mod to GOMODCACHE instead.

We should not move GOPATH/pkg/sumdb, nor make it controlled by GOMODCACHE.

The module cache is just that: a _cache_. If the module cache is removed, it can safely be refilled by redownloading. During the redownloading, all the bits will be reverified using the checksum database,
which is what makes it safe to start over. No one can replace the bits you used to have with different bits. Even large fragments of the checksum database are cached in GOMODCACHE. Those are okay to delete too; they'll be reverified on download as well.

The one thing that makes all of this safe is the single file GOPATH/pkg/sumdb/sum.golang.org/latest. It contains the single hash that all of this other verification relies upon. It _cannot_ be deleted without giving all that up. It therefore does not belong in the module cache next to all the deletable things. (That's why it's not.)

CI/CD systems that want to point GOMODCACHE at a different directory should not need to worry about GOPATH/pkg/sumdb. As long as the repo being built has a complete go.sum, the go command will not do anything with the checksum database. I believe -mod=readonly will keep from even attempting to use the checksum database, and that's what CI/CD systems should be using anyway.

Based on the discussion above, it sounds like this is a likely accept.

No change in consensus, so accepted.

Change https://golang.org/cl/230537 mentions this issue: doc/go1.15: add notes for GOMODCACHE, modcacheunzipinplace

Change https://golang.org/cl/241275 mentions this issue: cmd/go: include GOMODCACHE in 'go help environment'

Was this page helpful?
0 / 5 - 0 ratings

Related issues

stub42 picture stub42  Â·  3Comments

Miserlou picture Miserlou  Â·  3Comments

OneOfOne picture OneOfOne  Â·  3Comments

michaelsafyan picture michaelsafyan  Â·  3Comments

longzhizhi picture longzhizhi  Â·  3Comments