Pkg.jl: Support multiple package servers

Created on 12 Apr 2021  路  6Comments  路  Source: JuliaLang/Pkg.jl

It would be helpful if the JULIA_PKG_SERVER variable accepted more than one package server for use in offline, CI/CD or private environments.

Currently the JULIA_PKG_SERVER variable takes a single url or path, it would be nice if it accepted a lists of paths such as how the unix PATH variable works, e.g. JULIA_PKG_SERVER="http://a.b.c;http://e.f.g"

Most helpful comment

I'm still against this. The point of the pkg server is that provides a single definitive authority for the client's view of the world. If you instantiate a manifest with a pkg server it can later be reinstantiated with the same pkg server. If there is a set of pkg servers, there are suddenly lots of hard questions to answer. What if the same registry is served by different servers in the set but they advertise different states of that registry? What if a manifest uses packages provided by different pkg servers? Is there any closure guarantee that we can make?

Suppose, for the sake of argument, that we get each package or artifact from the first pkg server that knows about it. Now, suppose that only a later server knows about some package version and we get it from there. But one of its dependencies in the manifest is known to the earlier pkg server, so we get it from there. Now, it's entirely possible that the full manifest is never requested from either pkg server, so neither of them gets a chance to fetch the complete set of resources and make sure they are persisted by the storage server.

Ultimately, someone needs to decide on a coherent view of the world and be able to store it and reproduce it. A large part of the point of the pkg protocol is that the pkg server presents a single, coherent and reproducible world view. This proposal pushes the responsibility for creating a coherent world view onto the client, which is not persistent or reproducible and which we should basically assume is ephemeral.

Regarding MuxPkgServer: it seems like that should just be PkgServer with upstream pkg servers as is backing storage servers. That would give you caching for free, which is presumably desirable. It has logic to handle the situation when the multiple upstream servers serve the same registry but might be out of sync about what state of it they are serving (serves the latest version). It also makes HEAD requests for each resource to all upstream servers the first time it fetches it, giving each upstream server a chance to have a complete world view. @fredrikekre, is there some reason that PkgServer.jl cannot be used for this?

All 6 comments

I'm still against this. The point of the pkg server is that provides a single definitive authority for the client's view of the world. If you instantiate a manifest with a pkg server it can later be reinstantiated with the same pkg server. If there is a set of pkg servers, there are suddenly lots of hard questions to answer. What if the same registry is served by different servers in the set but they advertise different states of that registry? What if a manifest uses packages provided by different pkg servers? Is there any closure guarantee that we can make?

Suppose, for the sake of argument, that we get each package or artifact from the first pkg server that knows about it. Now, suppose that only a later server knows about some package version and we get it from there. But one of its dependencies in the manifest is known to the earlier pkg server, so we get it from there. Now, it's entirely possible that the full manifest is never requested from either pkg server, so neither of them gets a chance to fetch the complete set of resources and make sure they are persisted by the storage server.

Ultimately, someone needs to decide on a coherent view of the world and be able to store it and reproduce it. A large part of the point of the pkg protocol is that the pkg server presents a single, coherent and reproducible world view. This proposal pushes the responsibility for creating a coherent world view onto the client, which is not persistent or reproducible and which we should basically assume is ephemeral.

Regarding MuxPkgServer: it seems like that should just be PkgServer with upstream pkg servers as is backing storage servers. That would give you caching for free, which is presumably desirable. It has logic to handle the situation when the multiple upstream servers serve the same registry but might be out of sync about what state of it they are serving (serves the latest version). It also makes HEAD requests for each resource to all upstream servers the first time it fetches it, giving each upstream server a chance to have a complete world view. @fredrikekre, is there some reason that PkgServer.jl cannot be used for this?

Muxing the pkg servers might be too risky for reproducibility. Here's an alternative that chooses the nearest pkg server during the Julia start: https://github.com/johnnychen94/PkgServerClient.jl
Basically, if you have a private pkg server setup of the General registry, you just get there since it's the nearest one.

It probably won't fit your use case because it assumes that all pkg servers provide the "same" General registry.

@fredrikekre, is there some reason that PkgServer.jl cannot be used for this?

No, you can definitely do that. The MuxServer is much more lightweght though (no caching, no registry updating etc). I agree it might not be a good think to configure this to be a "serious package server", but it can still be useful for testing or local usage IMO.

Rolling back the point of this, which I didn't explain well, is to use more than one pkg server to host a unique registry on each.

This is not a request to connect to redundant package servers hosting the same content, and I 100% agree with @StefanKarpinski views on that, its a mess and should be prevented.

MuxPkgServer does most of what I'm looking for, but why do we need an external package for this? You can have multiple registries in the Julia world, why do they need to be co-located on the same pkg server?

I am currently supported this use case by manual modification to the pkg server "registries" file as well as some filesystem foo to combine things. Its hackish, and I know I am not the only person looking to do this, exactly why packages like MuxPkgServer, JuliaOffline exist, need more proof: "Julia Offline" Google Search.

I do suggest this be enforced with consistency checks, e.g. only 1 copy of a registry UUID is allowed on "all" specified pkg server servers, in the event of a pkg UUID conflict there be an intelligent error message and stop (maybe the pkg add has to be extended to support / in this case or throw and error and force the user to correct it manually?

If you are still vehemently opposed to this I'll continue on my way of hacking at the filesystem to work around this, and looking into MuxPkgServer more but would rather have a clean and reusable solution built directly into the Julia pkg system.

The issue is that even if there's zero overlap between what's served by different package servers (which we cannot assume and have to deal with in some sane way), a single manifest will still potentially have packages that it got from multiple different package servers and then there's no entity that can reconstruct any manifests at all. As compared with now where there must be some package server that can reconstruct all of a manifest.

Was this page helpful?
0 / 5 - 0 ratings