Go: proposal: cmd/go: add .proxy endpoint to the module proxy spec

Created on 6 Nov 2019  ·  21Comments  ·  Source: golang/go

Users would benefit from more transparency around whether or not a specific module version is being temporarily cached in a proxy or whether it is being permanently mirrored. There are a number of reasons why a proxy may choose not to mirror something forever: licensing is one notable example.

The proposal would be to add an additional optional endpoint to the proxy spec (ie. go help goproxy), which proxies could implement if they choose to, which would give this information. For example:
https://proxy.golang.org/golang.org/x/text/@v/v0.3.2/mirrored
would return "true" or "false" as plaintext.

This is something we could pair with a utility in x/mod which would accept a go.sum file and indicate which versions aren't being permanently mirrored by any of the proxies listed in GOPROXY. That might help you decide to use a different version of the module, vendor that dependency, or encourage you to file an issue against the module if you see that a suitable license is missing, for example.

/cc @jayconrod @bcmills @heschik @hyangah @rsc

Proposal Proposal-Hold modules

Most helpful comment

The existing convention in these URLs is to disambiguate based on the file extension not a new path element, so it would be v0.3.2.mirrored not v0.3.2/mirrored.

Beyond that, though, I wonder if maybe there will be need to send more than a single bit at some point (thus my question about .info). If we don't extend .info then it seems like we should instead define a new JSON-formatted .proxy file for proxy-specific information about the given module. It could start with just one field (Expires?) and add more as needed.

All 21 comments

The Expires headers could/should(?) say the same thing.

e.g. Expires 1 week vs Expires 100 years.

In many cases there is a difference between when the response is considered "stale", ie. Expires, and when the underlying server may be unable to continue serving a zip. The common case would be if a proxy is using a CDN where the cached response provided by the CDN may be stale after a few hours but the proxy server intends to continue serving that zip for much longer.

Having module versions cached temporarily means non-deterministic builds for the users.

IMO availability is one of the fundamental requirements of a public goproxy. And this is true atleast in gocenter.io.

Also, what does it mean for a user who also uses a local goproxy that further points to a public goproxy? Should they selectively clean up the local goproxy's cache always given this new endpoint? Maybe let users decide what they want to consume based on the metadata provided by the goproxy.

+1 to the above. Except for extenuating circumstances like DMCA takedowns etc..., why would a module not be stored (purposefully not using the term "cache" because it implies expiration) forever on proxy.golang.org?

If there were modules that proxies/mirrors might not or did not store, then as @ankushchadha said, builds become nondeterministic. One of the major apparent benefits of proxy.golang.org right now is that it enables deterministic builds.

Edit: the proxy enables _deterministic_ builds

@arschles, a module might not be stored if the proxy maintainer is not confident that the module's license permits it to be stored.

Builds in that case do not become “nondeterministic”: they may either succeed or fail, depending on whether the needed modules are available (locally or from any configured remote source), but if they succeed they will produce the same result as any other successful build.

@bcmills understood, I agree that this feature may be useful for on-prem proxies.

I'm talking about this endpoint in the context of public, hosted proxies. It introduces the possibility that a host may cache modules, and if you get a false back, that module@version could expire at any time. As the developer of an application who relies heavily on proxy.golang.org (we use it to build github.com/gomods/athens), I would prefer that everything returns true at all times (exceptions being made for special cases)

I agree we need a way to tell users what proxy.golang.org will do about a specific module version. But I am not 100% sure about whether this endpoint belongs to the proxy protocol - at this moment, it seems too specific to proxy.golang.org.

It will sound more convincing if there are proxies other than proxy.golang.org that would utilize this new endpoint in a meaningful way.

The endpoint doesn't make much sense for enterprise and private proxies.

gocenter.io is trying to mirror everything once it decides to serve a module version. Most of other public proxies I've seen didn't make any official commitment about their data retention policy. Can other public proxy owners chime in?

goproxy.io is here, but as @bcmills said, we are not confident that the module's license permits it to be stored, and space is always limited.

@arschles

As the developer of an application who relies heavily on proxy.golang.org (we use it to build github.com/gomods/athens), I would prefer that everything returns true at all times (exceptions being made for special cases)

Since we can't do this (re: licensing), the best alternative would be to inform users if there is a genuine risk that their dependency will disappear if it's removed from the origin server. You mentioned that this endpoint would say "that module@version could expire at any time", so one alternative might be to give a timestamp for how much longer this cached copy will live, instead of true/false?

@oiooj

we are not confident that the module's license permits it to be stored, and space is always limited.

Can you clarify? Are you saying that goproxy.io also doesn't mirror things forever, depending on the size of the module and the license? If that's the case, then users of your service may also benefit from this kind of transparency.


Thanks for everyone's comments. As @hyangah said, it's going to be difficult to justify this if it's not something other proxies would benefit from, and if that's the case, this may just be something that proxy.golang.org should do itself if users are asking for it.

@oiooj @katiehockman If gocenter.io wants to preserve rights to evict some of the module versions to alleviate storage usage pressure in the future, I expect gocenter.io to return 'false' for the proposed /mirrored endpoint for all module versions. Then I don't think this endpoint is very useful for its users either.

@hyangah, to the contrary! If some tool uses /mirrored to, say, recommend whether or not users should mirror their dependencies locally (or vendor them), then a /mirrored endpoint that always returns false could still be a useful input to such a tool.

@bcmills Shouldn't the user of the proxy already know about the promise of the public proxy they are using? As far as I know, proxy.golang.org is the only one that may have different answers for modules/versions.

BTW if we are talking about the users who want to distribute the source code of binaries/libraries and control the dependencies, they don't know what proxy "their users" will depend on to build their source code. In this case, will they still need to vendor, or instruct their users to always use specific proxies they verified all their dependencies are mirrorred in?

What's the reason for using /mirrored instead of a field in the info file?

As far as I know, proxy.golang.org is the only one that may have different answers for modules/versions.

There is no intrinsic reason why that must be the case, and having an endpoint would make it easier for users to detect if (say) the proxy that they are using changes its policy to provide longer-term mirroring.

if we are talking about the users who want to distribute the source code of binaries/libraries […], will they still need to vendor, or instruct their users to always use specific proxies they verified all their dependencies are [mirrored] in?

Their users (transitively) can use the same endpoint to decide what to do.

What's the reason for using /mirrored instead of a field in the info file?

Proxies may reasonably re-serve .info files from other proxies. What presumably matters to users is the policy of the “last hop” proxy, not any intermediaries.

What's the reason for using /mirrored instead of a field in the info file?

We've treated the .info file as an immutable object that is provided by the go command and given to clients unchanged. If we start adding custom fields to the .info file which differ between proxies, then I'm not sure what the consequences of this could be. For example, it could mean that a proxy like Athens which chooses to proxy our .info endpoints to their clients will be serving answers that are specific to our server instead of theirs.

I've also always viewed this file as "metadata about the module version" which is proxy independent, rather than "metadata about the module version as it relates to the proxy you got it from" which could change.

@jayconrod @heschik and I were discussing this the other day.

The existing convention in these URLs is to disambiguate based on the file extension not a new path element, so it would be v0.3.2.mirrored not v0.3.2/mirrored.

Beyond that, though, I wonder if maybe there will be need to send more than a single bit at some point (thus my question about .info). If we don't extend .info then it seems like we should instead define a new JSON-formatted .proxy file for proxy-specific information about the given module. It could start with just one field (Expires?) and add more as needed.

Agreed that there is very likely to be a time where more than one bit should be provided. Perhaps the date of expiration, or the detected licenses, for example.

I like the idea of a generalized .proxy file similar to the .info file. I'm not sure if this belongs in the proxy protocol at this point though, especially if no other proxies will want to use this. But in general it sounds like a good approach to start with even if just proxy.golang.org serves it.

It sounds like there is general agreement to add a .proxy file with JSON.
The benefit of defining what it contains is that then cmd/go can potentially present that.

@katiehockman, would you rather:

  1. Put this proposal on hold and have proxy.golang.org start serving this file to gain some experience.
  2. Use this proposal to define the .proxy file and get to acceptance before serving from proxy.golang.org
    ?

Your call. Thanks.

Thanks. Let's go with option 1 for now.

I'll go ahead and work on exposing a .proxy endpoint for proxy.golang.org that can share some extra metadata about cache expiration. We can learn from this, and if it ends up making sense to establish a more formal behavior for .proxy in the future, then we can reassess.

Putting this on hold. Katie, feel free to remove the hold label when you are ready for more discussion.

Was this page helpful?
0 / 5 - 0 ratings