Pkg.jl: Use libCURL instead of platform engines

Created on 20 May 2020  路  13Comments  路  Source: JuliaLang/Pkg.jl

_This is very speculative, so mainly for brainstorming_

Currently HTTTP downloads within Pkg are done using the PlatformEngines subsystem, which shell out to locally installed commandline tools to perform the download. This generally means the curl/wget/fetch executables on unix and PowerShell on windows. This code is reasonably complex and difficult to maintain (I think)

In order to make things streamlined, one thought is to replace this with HTTP.jl. However, that is a large library, and splitting it will be some effort.

Another idea is to use libCurl.

  • libCURL is widely used and very stable. Every car, phone and TV has multiple instances.
  • It's very efficient, in CPU and memory -- it can do multiple downloads simultaneously in async mode.
  • There is now a well maintained libCURL_jll, and the recent jll work on base julia makes it easy to pull in. One important feature here is that the jll correctly builds against the mbedTLS jll
  • libCURL has http2 support (and ftp support and a dozen other protocols ...)

A few concerns, not insurmountable:

  • libCURL has a low level C API. A higher level Julia API needs to be written, ideally using the multi-api. There are julia examples using the easy api
  • proxies (potentially authenticated) need to be handled
  • root certificates needs to be handled

cc: @StefanKarpinski @staticfloat

Discussion help wanted speculative

Most helpful comment

Yeah, it鈥檚 a mess. I鈥檓 working on using libcurl instead.

All 13 comments

Yeah, I have thought the exact same thing. It is also in the spirit of the artifact system in that it doesn't rely (too much) on the state of the user's system. I think this would be a good idea.

Tangentially, I think Cargo also uses libcurl.

I spent a while yesterday looking at the libcurl multi API and it is... complicated. We can probably get this to work but figuring out how to make it work nicely with our event loop feels non-trivial.

I don't see why "our event loop" is relevant. Wouldn't we just call the multi API from a single-threaded synchronous Julia loop (as in https://ec.haxx.se/libcurl/libcurl-drive/libcurl-drive-multi).

int transfers_running;
do {
   curl_multi_wait ( multi_handle, NULL, 0, 1000, NULL);
   curl_multi_perform ( multi_handle, &transfers_running );
} while (transfers_running);

That wait call is blocking and we don't want to block all tasks.

Alright, so it is the socket system (https://ec.haxx.se/libcurl/libcurl-drive/libcurl-drive-multi-socket) that we need to use?

Yes, the multi_socket API seems like it might be the right one. I'm still processing the docs on that one but it seems like the one we'd want to use (no blocking, maximum scalability).

There is an example of using libuv with the multi api, which might be instructive? https://curl.haxx.se/libcurl/c/multi-uv.html

That example users uv_run which I'm hoping can be offloaded into a Task in Julia?

Nice! Yes, I don't see any obviously blocking operations in that so it may well work.

A very old version of code using curl_multi_perform already exists, with polling using sleep/yield:

https://github.com/JuliaWeb/HTTPClient.jl/blob/master/src/HTTPC.jl#L623

One of the issues with PlatformEngines is that you get different types of outputs on different platforms. Here's one from Alpine. I originally thought it was Pkg printing it. When multiple packages are being installed, it floods your screen with a lot of messages.

screen_shot_2020-06-27_at_4 50 57_pm

Yeah, it鈥檚 a mess. I鈥檓 working on using libcurl instead.

Glad to see this issue tackled and I see there's already good progress being made by Stefan using LibCurl!

Should this functionality should go in Base so we can get rid of the shelling that the download function uses, which also suffers from the same problem. There's an open issue discussing the future of download over at the Julia repo regarding this.

Was this page helpful?
0 / 5 - 0 ratings