It would be nice if the stdlibs started using the artifact system to declare what libraries they depend on and how to get them for the different platforms. That would make the stdlibs easier to move out from the julia repo and in cases where one doesn't want to bundle all stdlibs in a sysimage (e.g. in an "app") it would be clear what libraries can be excluded from bundling as well.
@staticfloat, you seem like the prime candidate for this 馃榿
So here's the thinking that Stefan and I have briefly discussed:
We should firm up some of the implicit laziness that the stdlibs have relied upon with respect to binary dependencies, and simultaneously use this as an opportunity to take a step towards decoupling stdlibs from the Julia build system both at build time and at run time.
MbedTLS.jl
needing to stay in strict lock-step with those shipped alongside Julia.To represent stdlib binary dependencies through JLL packages and Artifacts, Stefan and I think the best way is to start shipping a read-only depot with Julia that gets added on to the default list of depots, that contains all of our stdlibs, their JLL packages, and their Artifacts. This would clean out the _majority_ of the libraries from <prefix>/lib/julia
, and would instead rely on some hoops we jump through to load them from <prefix>/share/julia/stdlib/vX.Y.Z/artifacts/
. It will be a fun challenge to make this work for _everything_ including LLVM. Unsure if we can get there, but we'll give it a good shot. Once these stdlib packages are baked into the system image, we would have a list of things that the resolver shouldn't mess with, so that it doesn't accidentally install a new version of e.g. OpenBLAS_jll
, which would just confuse everyone.
LinearAlgebra
, HTTP
, SuiteSparse
pinned at these versions, etc...." and that list of packages gets downloaded, instantiated into a depot, and is what is read out of in order to create our initial system image. We can of course generate the list of packages offline and just hardcode it to avoid the bootstrapping dependency. We're already doing this with Pkg
, I intend to do it with _everything_.DEPOT_PATH
(even though the Julia code is embedded within the system image, things like artifacts will still need to be found). I see we already have a <prefix>/share/julia/stdlib/vX.Y
directory, perhaps the only change here is to version it by the patch version as well and add artifacts to it.JLL
packages which will already be loaded (as they are part of the system image) and they'll bring their libraries along. For others, like loading LLVM, it would be great for libjulia
to be able to find libLLVM
from the artifacts
directory, but we may continue to have some libraries that are more "special" than others.Pkg.add
time, we'll need a flexible way for Pkg
to know what is baked into the system image and shouldn't be touched. Similarly to how it has a list of stdlibs right now, we'll just need to add the JLL packages and whatnot to them. This might come for free if I'm reading this code right. I am particularly interested in how this mechanism can be integrated with PackageCompilerX
.Overrides.toml
that points to the <prefix>/lib/julia
directory for all the artifacts that we care about. This is almost 100% doable right now, although it does require a few alterations (JLL package artifacts expect libraries to be available at <artifact_dir>/lib/libfoo.so
and Overrides.toml
only allows you to override artifact_dir
. Additionally, Overrides.toml
files only accept absolute paths, so we'd need to allow relative paths within them.) I'm a little bearish on this since I think it would be better to have a fully-flexible stdlib selection pipeline; this would significantly close the gap between what we do to make a Julia package distribution versus what we tell people to do in order to make a Julia "app", which I think is a good thing.Thinking more about things like Julia needing to be able to find libLLVM at dynamic-link time, it will be sufficient on non-windows platforms to bake in RPATHs to look in $ORIGIN/$(datarootdir_rel)/stdlib/vX.Y.Z/artifacts/<LLVM_jll tree hash>
. The only snag here is Windows; we can bake in a call to AddDllDirectory()
within init.c
, but that's a little unsatisfactory. The reason why I'm thinking about this is that I'd like to make it as straightforward as possible for us to truly have Julia use JLL packages for stdlibs, such that eventual rebuilds of Julia system images with _newer_ versions of stdlibs can actually use their binaries in as natural a way as possible.
We also need a plan for dealing with from-source builds. Assuming that we still want from-source Julia builds to work, we're going to have to engage bandying about some benign falsehoods; when we build libopenblas
, we'll have to bundle it up as a "fake" OpenBLAS_jll
product. It's not too hard, we basically just install openblas
like we always do (we're already careful to keep the OpenBLAS
recipe in Yggdrasil as close as possible to the from-source build in JuliaLang/julia
) but then slap it into the share/julia/stdlib/vX.Y.Z/artifacts/<tree hash>
directory that we will know it needs to go into from parsing the OpenBLAS_jll/Artifacts.toml
file.
Some pros/cons of what I've considered so far:
TOML
reader/writer in make/bash/python so as to avoid a Julia dependency.TOML
writer so that I can modify the Artifact.toml
files in e.g. OpenBLAS_jll
. This is necessary so that when we compile a local OpenBLAS, the tree hash listed in the Artifact.toml
matches the tree hash of the files on-disk. This isn't that important right now as we don't check if the files in a folder actually match, but we might in the future, and we want to avoid the (rare but possible) case of someone explicitly trying to update their OpenBLAS, and files not getting installed because the Pkg system thinks they already exist.load_stdlib()
function) so attempts to e.g. add OpenBLAS_jll
are going to immediately return, and loading OpenBLAS_jll
will always return the stdlib version, rather than any version the user may have endeavored to install.Digging into this over the past few days, I've come up with a few difficulties that may take some calm thinking to untangle properly:
First off, there's a philosophical decision to be made; do we want the actual binaries themselves to live in an $prefix/share/julia/artifacts/<tree hash>/lib/libfoo.so
location, or do we want them to continue living in $prefix/lib/julia
? For some binaries, it doesn't matter than much, but for others, it matters quite a bit.
Personally, I would like to push as much as possible for stdlibs and even the basic requirements for Julia (like LLVM, MPFR, GMP) to use artifacts. This is doable with enough scaffolding construction such that Julia can find things, but we need to answer if the necessary scaffolding is worthwhile:
libjulia
to load at all, it's going to need an RPATH
embedded within it to find libLLVM
libgit2
to find libssh2
, we'll need to use the actual JLL package or something equivalent to build our own web of dynamic linker subversion. Otherwise, libgit2
sitting in one treehash directory won't find libssh2
sitting in some other treehash directory.Pkg
(or even other stdlibs, in the case of things like GMP and MPFR, since they're in Base
). Either breaking their dependency on Pkg
by splitting Pkg.Artifacts
and Pkg.BinaryPlatforms
out into something that can be loaded _very_ early on, or by generating special JLL packages that don't know anything about Artifacts and re-implementing themselves more-or-less from scratch. Personally, I think this is not that bad of an option, but it would be some extra work.Let's remind ourselves as to why we're doing this; with this kind of a system, it makes system image building much more modular and easy to understand; the distance between binaries users install and the binaries that ship with Julia shrinks. The resolver can see that LLVM_jll already exists on the user's machine and is of a particular version; attempts to Pkg.add("OpenBLAS_jll")
naturally succeed immediately, as it's an stdlib, and using
it is blazingly fast, as we would expect.
I don't have a concrete solution in mind yet, this is the third time I've written out this comment because I keep on experimenting with different things and finding new problems. The good news is that I have artifact downloading implemented in Make/Python, and putting JLL packages/artifacts into the share/julia
folder works; but these bootstrapping issues are thorny.
I would be fine with them living as artifacts. Making the system image more modular will allow to build smaller system images for deployment - so that's the right direction, imo.
I've made great strides in this on my branch. I've converted everything that it makes sense to, excepting LLVM. LLVM is a special case that I will address after this. First, the changelog:
I've added JLL package downloading/artifact construction to the deps/
makefiles. JLL packages that are used verbatim get altered somewhat to eliminate any dependency on Pkg
. This is easy since the only two things we use Pkg
for (getting binaries and knowing which platform we're running on) are both known at build time. Note that this means the platform gets baked into the stdlib JLLs, which is fine, it doesn't tend to change.
If you choose to build something from source, a "fake" JLL package is generated with the same UUID, but it won't look in the artifacts
directory for its binaries; it instead looks in Julia's bundled lib/julia
directory. Note that @vchuravy raised good points in https://github.com/JuliaLang/Pkg.jl/issues/1704 that it would be nice if this could be done via Overrides.toml
, but since it's not _quite_ flexible enough to do this across all platforms (if all platforms could create symlinks this would be easy) it's easier for us to just generate fake JLL packages. Perhaps in a future Pkg release we can work this out with a more flexible Overrides.toml
syntax.
The Makefile system has been spruced up to have deps with dependencies be able to find their dependencies at compile-time, such that if you want to, for instance, build SuiteSparse from source but provide OpenBLAS from BB, the libraries get found properly.
Libdl has been moved into Base
as Base.Libc.Libdl
. This is necessary so that we can do dlopen()
inside of Base
.
It seems to me that we have an issue; we want to provide libLLVM alongside Julia in a JLL such that when users ask for a handle to libLLVM in a Pkg-informed way (e.g. through Pkg.add("LLVM_jll")
) they are locked to the version that ships with Julia, and thereby get the same version that comes with their Julia version. However, LLVM_jll
provides a lot more than what Julia itself ships with; it contains nice things like clang
and opt
and whatnot. I don't really think we should therefore start shipping clang
with Julia, rather the opposite.
I think we should split LLVM_jll
up into multiple packages; perhaps having a LibLLVM_jll
and then have LLVM_jll
depend on LibLLVM_jll
, and only LibLLVM_jll
is shipped with Julia. @maleadt and @vchuravy I am very interested in both of your thoughts on this.
I think we should split
LLVM_jll
up into multiple packages; perhaps having aLibLLVM_jll
and then haveLLVM_jll
depend onLibLLVM_jll
, and onlyLibLLVM_jll
is shipped with Julia.
Sounds good to me. The CUDA compiler really only needs libllvm, however, with the addition of some additional API calls from this source file. Maybe those should also be provided by the LibLLVM_jll?
For other LLVM-based WIP I also need the headers and binaries, but that's just to build a tool so would be fine to put in a LLVM_jll package that only gets installed as part of a build_tarballs.jl
.
It's not entire clear to me though how we would version this thing (e.g., with multiple builds of the aforementioned tool, one for each LLVM version, and I just want to install whichever one's compatible with the user provided LLVM while maintaining semver of the tool), but that's orthogonal to this refactor.
The CUDA compiler really only needs libllvm, however, with the addition of some additional API calls from this source file . Maybe those should also be provided by the LibLLVM_jll?
Won't the symbols in the file you linked be a part of libjulia
? Those symbols will then always be available, right?
Won't the symbols in the file you linked be a part of
libjulia
? Those symbols will then always be available, right?
Sure, but since they are essentially an extension of ilbllvm's C API it might make sense to put them there?
Ah, I see what you mean; these aren't used by the rest of Julia, they're only for the benefit of LLVM.jl
.
Since we need to still support users building LLVM from source, I think we should probably keep it as a part of Julia's source.
Regarding LLVM_jll the right approach is probably to follow what Linux distros have been doing and break it up into LLVM_jll (with opt
/llc
/llvm-*
) and Clang_jll for Cxx jl and Cxxwrap.jl
My branch now works on Linux, MacOS support is pending a new OpenBLAS JLL (as MacOS is more sensitive to things like dylib IDs than Linux is), and then finally Windows. The great triumph is that a default build (e.g. with nothing setting any USE_BINARYBUILDER_XYZ=0
settings) has only the following libraries outside of the main package depot's artifacts
directory, with the vast majority being served from artifacts:
julia> using Libdl; filter(l -> !occursin("artifacts", l), Libdl.dllist())9-element Array{String,1}:
"linux-vdso.so.1" "/home/sabae/src/julia-jllstdlibs/usr/bin/../lib/libjulia.so.1"
"/lib/x86_64-linux-gnu/libdl.so.2"
"/lib/x86_64-linux-gnu/librt.so.1"
"/lib/x86_64-linux-gnu/libpthread.so.0"
"/lib/x86_64-linux-gnu/libc.so.6"
"/lib/x86_64-linux-gnu/libm.so.6"
"/lib64/ld-linux-x86-64.so.2"
"/home/sabae/src/julia-jllstdlibs/usr/lib/julia/sys.so"
julia> length(filter(l -> occursin("artifacts", l), Libdl.dllist()))
32
Can the system image eventually be served as an artifact - so that I can then have many different system images for different projects?
I think the piece that needs to be solved is getting Julia to load a project-specific sysimage. Right now you need to pass -J
which isn't very user-friendly. It would be nice to have something similar to --project
and JULIA_PROJECT
. I'm thinking something like --project=<path>
could imply -J<path>/.sysimages/sys.$(triplet).$(dlext)
or something. It's a little tricky because we would need to do all of this without running Julia code. It would be a lot easier for this to happen in the context of an editor rather than the REPL, as an editor can automatically pass options and whatnot easily.
Well then we could even do optimized system images by architecture!
We already do that; we have images by architecture (e.g. x86_64, i686, etc...) and then within an image, we compile functions multiple times such that newer processors have versions of functions with expanded instruction sets.
I'm thinking something like
--project=<path>
could imply-J<path>/.sysimages/sys.$(triplet).$(dlext)
or something. It's a little tricky because we would need to do all of this without running Julia code. It would be a lot easier for this to happen in the context of an editor rather than the REPL, as an editor can automatically pass options and whatnot easily.
Wouldn't it require various tools to agree on where to look at the system image? For example, you may want to use the same sysimage in your editor and in stand-alone scripts.
Maybe the UI/API in Pkg.jl or PackageCompiler.jl can include something that creates a simple text file (say) <path>/.sysimages/sys.$(triplet).link
containing the path to the actual sys.$(triplet).$(dlext)
file? I guess it is then easy enough to handle within libjulia
? Also I guess you can use sysimage downloaded in ~/.julia/artifacts
this way. It'd be nice if re-locatable sysimgs with non-stdlib packages can be distributed and used in different projects.
I'm thinking something like
--project=<path>
could imply-J<path>/.sysimages/sys.$(triplet).$(dlext)
or something. It's a little tricky because we would need to do all of this without running Julia code. It would be a lot easier for this to happen in the context of an editor rather than the REPL, as an editor can automatically pass options and whatnot easily.
The VS Code Julia extension has been shipping with exactly something like that for more than a year: https://www.julia-vscode.org/docs/dev/userguide/compilesysimage/.
@davidanthoff See #35794 that adds it to Julia.
I don't think this is a release blocker for 1.6 so removing milestone. @staticfloat please put it back if you see fit.
It isn't going to make it for 1.6 but will be in 1.7.
Most helpful comment
My branch now works on Linux, MacOS support is pending a new OpenBLAS JLL (as MacOS is more sensitive to things like dylib IDs than Linux is), and then finally Windows. The great triumph is that a default build (e.g. with nothing setting any
USE_BINARYBUILDER_XYZ=0
settings) has only the following libraries outside of the main package depot'sartifacts
directory, with the vast majority being served from artifacts: