This is a proposal for extending the vgo download API to add a mechanism to allow proxies to redirect vgo to a VCS. This functionality would be useful if a proxy has only a subset of packages in a go.mod
file.
Currently, if someone adds two modules "a" v1.0.0
and "b" v1.0.0
to their go.mod
file and they then run GOPROXY=myprox.com vgo install
, everything works as expected if the proxy has both modules at the given versions. If not, the command fails.
It would be helpful to allow the proxy to tell vgo to fetch one or both of the modules from the VCS if it doesn't have them in its cache. This would be useful for proxy implementations where the proxy will not/cannot cache the module in its own storage or can but doesn't have that module/version in its cache. The Athens project is a present day use case for the latter - it fills its caches asynchronously.
One implementation possibility is adding a $GOPROXY/a/@v/v1.0.0?go-get=1
network request that expects the same output as already-existing ?go-get=1
requests define. This request could be made before starting the download protocol as it currently exists. The new mechanism would allow the proxy to choose to do one of the following for the given module and version identifier:
Update after slack discussion with @myitcv and @zeebo:
vgo get
downloaded from VCSGOPROXY=myprox.com vgo get
will always fail with the current behaviorAnother concern that was brought up is that with GOPROXY
set, the current behavior of GOPROXY=myprox.com vgo get github.com/kubernetes/kubernetes
doesn't let the proxy redirect vgo to another location (i.e. a CDN) it still has to serve metadata and the zipped source code itself.
The proxy can redirect vgo to another URL as long as that URL implements the download protocol. That's because the net/http client handles redirect status codes internally (up to 10 times by default)
The big question here is: can the GOPROXY tell Vgo (during a build) to go use a VCS source instead of the proxy itself? Which is a little different than a simple redirect to another Download Protocol enabled URL.
@rsc How do you envision the flow of the Proxy telling Vgo to switch to VCS?
I'm trying to follow the vgo code and it seems that on an initial build, the initial contact with the proxy's download protocol can be any of these endpoints: /@latest
or /list
or /@v/{certainRevision}.info
.
This means that the proxy would need to return a consistent code (maybe a 404) on each of these endpoints to signal vgo to reconstruct the modfetch.Repo
interface from a vcs source instead of the *proxyRepo
one. Vgo would then have to back up a few steps and retry again.
Vgo can potentially always hit /list
as the first entry point to the download protocol, and if the list is empty then switch to vcs. Or it can always hit @latest
first and if it returns 404 then switch to vcs.
Another solution, is that the Download Protocol could implement a @probe
call with a couple of parameters (module path and revision) and then the Proxy can early on tell vgo to just go for the VCS source for this particular module.
For efficiency, vgo can potentially send the entire list of modules it wants to probe to the proxy.
I haven't fully understood the last 2 days worth of changes to vgo so apologies if I'm a bit off.
I don't think it makes sense for a proxy to tell the go command "go to this VCS instead". We're trying to migrate to proxy by default and while VCS will probably always be with us, I'd rather not mix the two.
I do think it would probably be OK to let GOPROXY be preference list and to also allow some setting like GOPROXY=direct as an explicit name for what the default behavior is. So you could say GOPROXY=https://myproxy/,direct and just let myproxy return a 404 for the things it doesn't know about. Then the proxy isn't in charge of the actual redirect; it's only in charge of "it's not me".
@rsc that's about what I had in mind. The proxy shouldn't say "go to this vcs", it can just say "I don't have this module"
Does that mean the /@probe
endpoint makes sense? Since that means Go can just ask the proxy whether it can work with a specific module before it even asks for @latest
or /list
.
I've altered the Go code if you'd like to look at a reference of what I'm suggesting: https://github.com/marwan-at-work/go/commit/3767be88ba3740d85853a28c2b1715f365d3b3dd
@marwan-at-work I'd rather not have newProxyRepo make any network calls. It turns out to be important to delay those as long as possible. I'd rather have the existing GET paths return 404s and then have the methods be able to return some kind of recognizable "not found error" (maybe satisfying os.IsNotExist is enough) and then something at a higher level will try the next repo method down the list. I think we should wait until Go 1.12 regardless.
@rsc That makes sense since cmd/go checks the cache before making network calls. I'm happy to take on this task if you'd like me to as I'll try to make it work for Athens in the near future.
Either way, I'm happy to know the Proxy can dynamically delegate modules fetching back to Go.
Thanks!
@rsc I have another pass at making this work. This time, we won't hit the network until necessary, and we won't need a @probe
endpoint either. I'm hoping to see if this change is not too invasive for 1.11
The idea is that a *proxyRepo can take an alternative Repo interface that it can switch to in case of 404 (or other future codes). Similar to how *cachingRepo works.
Feel free to take a look if you get the chance https://github.com/marwan-at-work/go/commit/5117c8c267e58db3ef1b8cc3531f2fffebe2e9c3
I see other ways of doing this, such as having a top level Repo interface that accepts a slice of Repos and just tries one at a time in order: (cache, proxy, vcs, etc)
So my solution above of course may still be not what how you'd like to solve this problem but would love to hear your thoughts
Thanks :)
Since this issue discusses a change to GOPROXY to allow the list of proxies or direct method -
A slightly different case I am thinking of is when the proxy server I use by default is temporarily unreachable or unavailable (possibly in the middle of fetching all dependencies) and even we can't get
404s. Reruning go get again with GOPROXY=direct upon network failure is an option when noticing this failure, but I would be happier if I can specify a set of proxy servers or even 'direct' fetch option for fallback.
But @bcmills raised a concern about leaking private package paths in the event of the private proxy failure if we just fall back to the next (direct) blindly.
@hyangah I'm concerned with blindly falling back to git for two reasons:
@marwan-at-work code freeze date is today. do you plan to mail in the change for review as described in the contribution guideline? https://golang.org/doc/contribute.html#sending_a_change_github
Change https://golang.org/cl/147177 mentions this issue: cmd/go: fallback to VCS if GOPROXY 404s
https://golang.org/cl/147177 was mailed before the freeze, so I think this will make the cut.
Change https://golang.org/cl/148377 mentions this issue: cmd/go: allow comma separated GOPROXY URLs.
This didn't make 1.12 after all: needs a bit more design work to indicate (and implement) when it's ok to fall back to the origin vs. failing outright.
@hyangah notes an interesting interaction: when we are resolving the module for a given package, the go
command today starts by querying for a module at the full package path, then progressively shorter prefixes. We ideally want to do those queries in parallel.
That means that the search space has (at least) two dimensions: one across proxies, and another across paths. Probably we should exhaust all of the paths for the first proxy before we try the next one in the list, and only fall back if it returned a module or a 404 code for each path. That way, the first proxy has the opportunity to reject paths (e.g. due to licensing or vetting policy) before the go
command attempts to fetch from a public mirror or the origin server.
Speaking with @hyangah yesterday, I'm concerned about the approach discussed here.
The scenario we discussed is offline yesterday was when a company wants to provide a proxy that serves private modules.
In this scenario the company should run an internal (intermediary) proxy that contains the information about the private modules. This proxy should be configured to have an upstream proxy that it uses for public requests.
If the company wants to have high availability it can run multiple internal proxies.
The client should only be configured to use the internal proxy(ies).
The company can also provide a whitelist / blacklist for the internal proxy for what upstream packages are allowed/disallowed.
@spf13
The use case for an internal company multi-proxy setup is to be able to switch not from one proxy to another, but from one Proxy to VCS. Where a Proxy is not trusted to have credentials but the machine where the Go command is run has VCS credentials. For example, when the Go Module Index comes out but the company is not yet ready to roll their own internal proxy for private modules yet, they should be able to do GOPROXY=moduleIndex,direct go build
and trust that both public and private modules are provided.
On another note, I'm also wondering how having multiple mirrors such as the Module Index (and potentially other companies) will play out? If the user can't do GOPROXY=mirror1,mirror2 go build
, how would Go be able to get a module from one of many mirrors? Will every user need to have their own proxy implementation that fans out to different proxies? It sees much easier to just do the comma-separated command from the client side.
With the Module Index being built to be the default public mirror, I imagine all other proxy implementations must be aware of it? Meaning, if a module does not exist, we need to redirect the user to the public proxy as a worst case scenario.
Thanks!
@marwan-at-work
It will leak the company's private modules, packages and repo paths to the public Go Module Index, so it's best to avoid. I think it's better for the Go command itself to be configured to route the requests to the right proxies if an internal proxy is hard.
This is a reasonable use case I can think of, but in this case, I think the primary purpose is the redundancy. In this case, 404 HTTP error code-based chaining doesn't seem right. (What if mirror1 is down and can't respond with 404?)
@hyangah
The only thing will leak is the module path and nothing else, not sure how bad this is but I can understand that it should be avoided.
My thinking was that if your "first" proxy is down, it's best to stop the build. If your thinking is that we can have multiple proxies for redundancy reasons but not care about what the returned code is, then this is potentially bad because then a proxy will not be able to block a build for security reasons: for example if the first proxy did not allow anyone to download "github.com/malicious/package", it would return a 400 bad request, but then Go will just move on to the next proxy which might not have the same security rule.
I am a little confused about all this discussion. I thought we were going to do, for GOPROXY=proxy1,proxy2,proxy3:
It's an ordered list, not a parallel lookup. By saying GOPROXY=proxy1,proxy2,proxy3 you are _directing_ the go command to send every import path to proxy1. If you should be splitting half your traffic to proxy1 and half to proxy2 and can't send the proxy2 paths to proxy1, then yes, you need a new proxy0 to split the traffic. But that is (1) fine and (2) not the envisioned use case.
The envisioned use case is some company has their own modules on an internal static file server that can pretend to be a Go proxy (because we made static file servers able to do that), and people use GOPROXY=
Does anyone object to implementing the above semantics for GOPROXY=proxy1,proxy2,proxy3? If so, please explain why. Thanks.
@rsc I'm not sure the above discussions were about splitting traffic or concurrently pinging the "proxy list". AFAIK, it was about whether we want to have GOPROXY be able to provide multiple URLs or provide only one highly available URL that takes care of proxying to other proxies if it needs to.
I'm happy either way, but the CL from above does what you suggested
What would be the best work around at the moment?
I have few private modules defined in go.mod
and those are in my GOPROXY.
When I try go mod download
, it fails for modules that are not in GOPROXY.
I can't make sure all the modules and version are in GOPROXY all the times.
@RohitRox one work around is to have that proxy go mod download
anything that's not existent and serve it back to the client. This way, it can provide all the modules and wouldn't 404.
@marwan-at-work That will be a chore for developers :|
I've found this to be a big problem today when trying to setup a GOPROXY at my company.
There are many instances where there may be a mix of public and private dependencies. The problem we have is that not all of the repos on our internal GitHub instance can be made publicly accessible. If a project has any dependency that is "private", we cannot use GOPROXY at all.
Tools like Athens and JFrog Artifactory can be used to store private Go modules and in addition, GoCenter can be used to fetch public Go modules.
@jorng, for Go 1.13 we expect to add a GONOPROXY environment variable that will let you set GOPROXY to a public proxy but avoid the proxy for modules matching a given pattern.
@rsc: That will be very helpful, at least to work around the issue.
I think the GOAUTH stuff may be the best option, once implemented. I鈥檓 imagining setting up a custom proxy that can handle authentication (perhaps using our internal SSO) and gate access appropriately.
Change https://golang.org/cl/173441 mentions this issue: cmd/go: add support for GOPROXY list
I've found this to be a big problem today when trying to setup a GOPROXY at my company.
There are many instances where there may be a mix of public and private dependencies. The problem we have is that not all of the repos on our internal GitHub instance can be made publicly accessible. If a project has any dependency that is "private", we cannot use GOPROXY at all.
@jorng Do you have any help with gos? https://github.com/storyicon/gos
Change https://golang.org/cl/183845 mentions this issue: cmd/go/internal/modfetch: halt proxy fallback if the proxy returns a non-404/410 response for @latest
Any chance https://golang.org/cl/173441 can be backported into 1.12.x?
@mikecook The change to the GOPROXY
behavior is too significant for a minor release. Minor releases are meant only for security fixes, serious issues with no workarounds, and documentation fixes. See https://golang.org/wiki/MinorReleases for more information.
You can get the new behavior by updating to Go 1.13.
No: there were a lot of interrelated changes in the fetch paths.
Besides, we don't generally backport features (only critical bug-fixes, which this was not).
Most helpful comment
I don't think it makes sense for a proxy to tell the go command "go to this VCS instead". We're trying to migrate to proxy by default and while VCS will probably always be with us, I'd rather not mix the two.
I do think it would probably be OK to let GOPROXY be preference list and to also allow some setting like GOPROXY=direct as an explicit name for what the default behavior is. So you could say GOPROXY=https://myproxy/,direct and just let myproxy return a 404 for the things it doesn't know about. Then the proxy isn't in charge of the actual redirect; it's only in charge of "it's not me".