Pkg.jl: client-side pkg server selection

Created on 22 Oct 2020  路  11Comments  路  Source: JuliaLang/Pkg.jl

There is already a /siblings end point on package servers:

$ curl -L https://pkg.julialang.org/meta/siblings
["https://us-west.pkg.julialang.org","https://us-east.pkg.julialang.org","https://us-east2.pkg.julialang.org","https://us-east-ci.pkg.julialang.org","https://eu-central.pkg.julialang.org","https://in.pkg.julialang.org","https://kr.pkg.julialang.org","https://sg.pkg.julialang.org","https://cn-southeast.pkg.julialang.org","https://cn-east.pkg.julialang.org","https://cn-northeast.pkg.julialang.org","https://au.pkg.julialang.org"]

The plan, which @staticfloat and I have discussed has been to allow the client to probe the available package server instances to see which one is fastest and pick one based on which is fastest at getting back to them. This will do geographical load balancing more accurately than our current server-side redirection and will also take into account CPU load: if a server is really busy, it will be slower to respond and this will send more clients to other servers, thereby reducing the load on busier servers.

The flow that was envisioned was this:

  1. The client requests https://pkg.julialang.org/meta/siblings and gets redirected to https://us-east.pkg.julialang.org/meta/siblings, for example.

  2. The us-east server replies with the list of sibling servers.

  3. The client requests https://pkg.julialang.org/status or something and chooses the server that replies the fastest and/or reports the lowest workload.

There is, however, a problem with the current plan: we would want this to work even if the user's primary pkg server is not responding at all. If us-east.pkg.julialang.org is completely broken, then anyone who is mapped there will not be able to get a list of siblings at all, so they're just stuck. This design is as flaky as the flakiest server.

Instead, I think we need to arrange for https://pkg.julialang.org/servers to reply with a list of servers as static data, not forwarded to any specific package server (like us-east.pkg.julialang.org). Then the client can use that list proceed to step 3. Any server that doesn't reply at all will be ignored and the client will pick the server that replies first/reports low workload/good status. This design means that any one pkg server being down will not affect clients except that they might need to go a little further to get packages.

Most helpful comment

All possible options, but it seems more reliable to serve the list of servers as a static page. It's a bad first experience if the server happens to be down and you can't connect. Hardcoding a server list is possible, but feels kind of icky鈥攜ou really want to hardcode as little as possible, and if you change from pkg.julialang.org to pkg.company.com then what?

All 11 comments

Cache the list of siblings locally?

So as long as the user is able to connect to their primary Pkg server one time, they can pull the list of siblings, and then we keep (and update) that list locally.

We could also hardcode a (not necessarily proper) subset of the siblings in the Pkg client code. And then if the sibling-list cache doesn't exist, we prepopulate it with the hardcoded list.

All possible options, but it seems more reliable to serve the list of servers as a static page. It's a bad first experience if the server happens to be down and you can't connect. Hardcoding a server list is possible, but feels kind of icky鈥攜ou really want to hardcode as little as possible, and if you change from pkg.julialang.org to pkg.company.com then what?

Hmmm. What happens when https://pkg.julialang.org/servers goes down?

Even if we go with https://pkg.julialang.org/servers and we don't hardcode anything, I think that at least caching the results of https://pkg.julialang.org/servers locally would be good. At the very least, we should cache those results for the remainder of the same Julia session - no need to hit https://pkg.julialang.org/servers multiple times during a single Julia session.

And if we cache the results of https://pkg.julialang.org/servers on disk locally, then maybe we only need to hit https://pkg.julialang.org/servers one time per week to update the cache.

Would reduce strain on whichever machine serves https://pkg.julialang.org/servers


We'd tie this cache to the value of the JULIA_PKG_SERVER environment variable. If someone changes the value of JULIA_PKG_SERVER, we'd invalidate the cache.

There are other special cases that might be nice to do.

For example, if get(ENV, "CI", "") is "true", I would not bother fetching https://pkg.julialang.org/servers - I would just connect to the geo-selected Pkg server, which if we have done our job should be us-east-ci.

Just trying to think of ways to reduce traffic to https://pkg.julialang.org/servers. If every Julia user in the world is hitting https://pkg.julialang.org/servers one time per Julia session, that's a good deal of traffic, even if it is statically serving a file.

Synthesizing all of this, it seems like this may be a viable strategy:

  • cache a list of alternatives for each pkg server value
  • have a hard-coded pre-populated list for pkg.julialang.org
  • if we have a list for a server, try all of them and pick the best one
  • if we don't have a list for a server yet, request $server/meta/siblings in order to get such a list

One thing that's a little bit weird about this is that if you set your package server to pkg.julialang.org then you will be redirected, so you should probably do this dance. On the other hand, if you've explicitly set your package server to au.pkg.julialang.org, should we be second-guessing that or just using that server? Perhaps the logic here should be:

  • if your pkg_server() value is in the list, then use it; if it cannot be reached, what then?
  • if your pkg_server() value is not in the list, always pick the best server from the list

That way people can set a specific sibling to use and that will be honored.

That sounds like a good plan to me!

I agree that the "you've explicitly set your package server to au.pkg.julialang.org" case is a weird case. I think the logic you outlined there makes sense.

Could this functionality be implemented as a non-standard package? That way 3rd-party pkg servers can be added to the pre-populated list in a more flexible way.

(Although I'm thinking of those storage mirrors in China, e.g., BFSU, as 3rd-party pkg servers.)

It seems like a static file on the server would work in that case, no?

Was this page helpful?
0 / 5 - 0 ratings