Julia: mechanism for default packages

Created on 5 Oct 2016  Â·  34Comments  Â·  Source: JuliaLang/julia

Discussion related to https://github.com/JuliaLang/julia/issues/5155. There are a few steps needed to break up Base:

  1. Move functionality into modules, some of this work is are done already.
  2. Move modules out of Base, into a location where they can be loaded via LOAD_PATH.
  3. Have a mechanism for installing, finding, and updating these "pseudo-packages".

It is this last point, the pseudo-package mechanism, which, I suspect, blocks the whole process. Considerations:

  • Base should be independent of the pseudo-packages: it should be possible to build, load, run and test all of base Julia without any pseudo-packages loaded or even installed.
  • Pseudo-packages should be largely independent of each other: it should be possible to load and test each without all the others. Should we allow any amount of interdependency between pseudo-packages?
  • It should be possible to update pseudo-packages independently of Julia, but it is acceptable for pseudo-packages to be considerably more coupled with Julia than normal packages.
  • There needs to be a mechanism for installing and updating pseudo-packages that is not normal package installation, because we want the package manager itself to be a pseudo-package.
design excision modules packages stdlib

Most helpful comment

I feel there should not any difference between "pseudo" packages and "real" packages. For two reasons:

  1. Packages moved out of base should be able to be updated outside of Base's release schedule. Doing a Pkg.update() should bring me the latest bug fixes for (e.g.) FFTW, and FFTW development should be decoupled from base's cadence. Otherwise, what is the point of moving these things out of base? And implementing a separate update mechanism for "pseduo" packages would more unnecessary complexity.
  2. Part of the logic for a having a stdlib #5155 is not only to remove things to base, but to add things to the distribution. So its not just about removing FFT from base, its about adding (e.g.) GLM to standard library. And doing that should not change the cadence of GLM's development.

All 34 comments

It seems clear to me that these packages should be as much like normal packages as possible. Here's a strawman proposal:

  • Add a /stdlib directory to the julia repo, with a Makefile in it.
  • Do make -C stdlib after building julia.
  • That makefile first manually clones the package manager repo, then uses the package manager to get the rest of the default packages.
  • Part I'm not clear on: either the package manager handles multiple directories, or we forget the stdlib directory and put all these packages in the normal place.
  • make install puts the default packages in the site-wide package dir.
  • Another tricky bit: are they compiled into the system image? If not, and you put usings in your juliarc, it will adversely affect startup time. If so, we won't be able to use Pkg to get them, since we won't yet have a sysimg to use to run Pkg. Or we could first build a small sysimg then a larger one, but that will take longer of course.

One of my biggest concerns with breaking up Base is testing and implementing changes to the Base language. Ideally, I would want:

  • CI testing of Julia PRs to see whether they break the "default" modules.
  • A way to submit coordinated PRs that patch both Julia and the default modules.

The former seems straightforward, but the latter seems hard within the context of Github. Or am I missing some Github feature that would allow this?

Arguably, anything that typically needs to be patched along with Julia doesn't benefit much from being a separate package, and its code could remain in this repo (maybe still moved to /stdlib though).

It would be interesting to sort the files in Base according to how many commits have touched them. The files touched by the fewest commits would be the best candidates to split.

I still think there should be a way to run CI on the default packages. Maybe optionally, e.g. via @julialib runtests() or whatever, analogous to @nanosoldier.

Another tricky bit: are they compiled into the system image? If not, and you put usings in your juliarc, it will adversely affect startup time. If so, we won't be able to use Pkg to get them, since we won't yet have a sysimg to use to run Pkg. Or we could first build a small sysimg then a larger one, but that will take longer of course.

Idea: precompile .juliarc.jl the same way we precompile modules? Can we do that?

Of course we can see if the normal package precompile mechanism is sufficient to get good startup time. We also might end up with starting the REPL being slower, but ./julia script.jl being faster due to not loading the repl or other modules script.jl doesn't need.

One option without needing to add too many new features to Pkg would be make Pkg.init perform a copy prepopulation. So the copies we distribute under site (doesn't have to be system-wide) in the default LOAD_PATH aren't managed by Pkg, but when Pkg.init is called with a pre-populated set, it will manage them from then on.

I'm also not sure we really want these to be as much like normal packages as possible. It could be a good idea, but I'm not entirely convinced. As you ask, does it live in the same place? Should we do version resolution in the same way? This seems like a case where monolithic updating of all standard packages together seems like it may be preferable – otherwise you end up in a situation where each version of each standard package needs to work with not only a range of versions of Julia itself, but also with a range of versions of the other standard packages.

To get this done asap it might be easiest/fastest to first split into separate packages logical files in base and to also leave them in base, i.e. src/stdlib/ and have things like LibM LinAlg etc. and then have a separate src/runtime for the runtime. Get a feel of how things work for a bit on master. Later one can think about a more sophisticated modular, flexible approach. (tests should also be moved to the corresponding packages in stdlib, and similarly for the runtime)

Yes, I think that's a good idea, but we'd still need to decide how loading works, and how to install the stdlib, and whether the stdlib modules can be precompiled. If they aren't precompiled the delays could be quite frustrating.

I think we pin the versions of packages that are included in the stdlib, and update the pinned versions frequently with a sufficient amount of testing. Same as any C dependency, we don't pull from master, and it's good to get as deterministic a build as possible so 0.6.0 built when it comes out behaves the same as 0.6.0 downloaded and built from source months later. If Pkg.init uses the bundled versions as a starting point, it can be allowed to update the copies.

I feel there should not any difference between "pseudo" packages and "real" packages. For two reasons:

  1. Packages moved out of base should be able to be updated outside of Base's release schedule. Doing a Pkg.update() should bring me the latest bug fixes for (e.g.) FFTW, and FFTW development should be decoupled from base's cadence. Otherwise, what is the point of moving these things out of base? And implementing a separate update mechanism for "pseduo" packages would more unnecessary complexity.
  2. Part of the logic for a having a stdlib #5155 is not only to remove things to base, but to add things to the distribution. So its not just about removing FFT from base, its about adding (e.g.) GLM to standard library. And doing that should not change the cadence of GLM's development.

Ok, I'm convinced. These should be normal packages :)

We do need to account for the possibility that the Julia distribution gets installed into a read-only location, in which case default packages wouldn't be updateable in-place.

How about this: There is a user and a system package dir, where the former is searched first when a package is to be loaded. The packages included in the Julia distribution go into the system package dir (potentially read-only for the user), updates go into the user package dir (writable for the user). Options to the Pkg commands could make them write to the system package dir instead so that an administrator can update/add/remove packages for all users on the system.

That already works with appropriate setting of LOAD_PATH, just anything outside of Pkg.dir doesn't currently participate in anything Pkg does. We don't want to require root to install a Julia-with-packages distribution, so I don't think system wide should be the default. Bundled packages should be specific to the Julia version they come bundled with, but with options to populate the user (or system wide if you have root) package directory from them.

Sorry, with "system package dir" I didn't mean a necessarily system-wide installation location. For a per-user installation of Julia, that would just be somewhere in his Julia install path. Still, the benefit would be that a user could just delete his .julia and still have the default packages (at the originally shipped version) available.

For 0.6 we should just install standard packages in the LOAD_PATH site directory, which we're currently not using but look in by default. The first step here is to move some of the ready-for-packaging modules in Base into the site directory. What's necessary for that and what modules?

There are multiple pieces here - the build infrastructure for putting specific versions of packages in place, and the restructuring of a few modules. Restructuring modules is rearranging files and imports if they aren't a module yet, then filtering the git history and putting them into a separate package repo.

Series of steps for any module to go through:

  1. Group code into a module within Base (and adjust imports anywhere it's used)
  2. Move the module to not be inside the Base namespace, a separate toplevel module
  3. Rearrange the source code and build system within this repo so the install mechanism is separate
  4. Move the code to a separate repo so it's versioned as a real package

Most of the listed modules can go through any subset of these steps, and each one is (mostly) a prerequisite for the next. We don't have the mechanism for step 3 yet, but once it's there, 2->3 and 3->4 would individually be smaller steps than going straight from 2->4.

A few observations:

  • packages defined under stdlib should be loaded and their exports brought into scope automatically by default. julia stdlib=no will not load any of the stdlib packages, i.e., default is julia stdlib=yes
  • I am OK with the other approach too. i.e. default of julia stdlib=no. Would want a simple way to get everything and the kitchen sink without having to define a bunch of using statements for every small program I write, especially during initial development/prototyping/exploration. The final version can load only specific modules when the code is ready for deployment. So, during development I can always start with julia stdlib=yes
  • We need to have a means of building base julia and a specific set of packages (initially with stdlib but later to including non-stdlib ones and their dependencies) into a single precompiled binary. The current model of loading from node 1 is not scalable with larger number of workers. Folks deploying on clusters can thus deploy the bundled image either on all the nodes or on a shared filesystem.

https://github.com/JuliaLang/julia/pull/18928 reminds me that quadgk would be a good first case here - small, self-contained, no ccalls, not used by anything else in base. Step 1's already even done for it.

Is this still slated for v0.6? Seemed like the push to finish it died down.

I don't see how it can make it into 0.6, because we don't have step 3.

If we move a lot of key numerical functionality out of Base without building additional infrastructure first, I'm concerned that it will have a serious impact on testing changes to the core language. It will be much harder to assess the impact on important packages, and much harder to detect performance regressions, if the numerical packages aren't updated in sync. e.g. a lot of the BaseBenchmarks functionality won't work if a PR breaks something like LinAlg and there is no way to update LinAlg in the same PR because it has been moved to a separate package.

I agree with @amitmurthy that we also need to improve the infrastructure for building "batteries included" system images.

I agree with @amitmurthy that we also need to improve the infrastructure for building "batteries included" system images.

What's wrong with the usrimg.jl approach?

usrimg.jl is fine as far as it goes, but it requires too much user intervention. It needs to be coupled to our distribution mechanism so that julialang.org can post "batteries included" downloads. Moreover, there is the question of how updates are handled.

Oh, that's fine; I thought there were concerns in the mechanics of how userimg.jl works that didn't allow for making a "batteries included" distro.

usrimg.jl is fine as far as it goes, but it requires too much user intervention. It needs to be coupled to our distribution mechanism so that julialang.org can post "batteries included" downloads. Moreover, there is the question of how updates are handled.

Is that more of a distribution issue then, an issue with sticking the right GUI on the install to make things easier, rather than requiring a new infrastructure? (Or a GUI for changing the system image?)

This made me think of a proposal which I opened up here to make it easier.

https://discourse.julialang.org/t/improved-installations-from-executables/1001/1

It is distribution (and Pkg/updating) infrastructure, but that's still new infrastructure.

It appears to me that there are two separate issues here. The first one that is not contentious is that we need new infrastructure, which is already coming along with Pkg3/Bindeps2.

The second issue is what the list of default packages should be, and should they continue to be in the julialang distribution, and if those packages are made available in the namespace by default. I don't think the second answer is easy for the 1.0 release schedule. The Pkg3 manifest file makes it easy to recreate a Julia environment in an easy way, and seems like an important part of the way forward.

I am in favour of closing this issue in favour of existing Pkg3/Bindeps2 issues, which get addressed in 1.0, and the larger issue of a list of default packages be something we address after 1.0.

@ViralBShah, there is also the issue of startup time if you want to load a bunch of external packages by default. Since that is a performance/packaging issue it doesn't need to be settled for 1.0, but it should be tracked somewhere.

We need this one way or another, so whether we close this issue or not doesn't change the actual work that needs to be done. I have some thoughts on this which I can write up.

Is this done now? What else is there to do?

It's probably done. A remaining piece could be to clone certain packages during build, so that packages in external repos can also be in stdlib, but that can probably just be done case-by-case.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

iamed2 picture iamed2  Â·  3Comments

wilburtownsend picture wilburtownsend  Â·  3Comments

musm picture musm  Â·  3Comments

omus picture omus  Â·  3Comments

manor picture manor  Â·  3Comments