Pkg.jl: Ensuring that pre-compilation occurs as part of the add and update process

Created on 28 Dec 2017  Â·  16Comments  Â·  Source: JuliaLang/Pkg.jl

I recently spent a some time with Julia v0.6 and Pkg2 and thinking like a new user to get a sense of usability issue. Forgetting about the actual speed of packages, my general feeling is that the perceived slowness around packages comes down to 3 main things: (1) there is insufficient visual feedback during add/using/update operations; (2) expectations are not properly set that time is required for both the download/installation and the recompilation; (3) user steps are required between installation and pre-compilation.

For problem (2) and (3) of perceived slowness, the users can become furious that after they wait out an update or add (with insufficient visual feedback that it is working, see #89 ) they try to use the package and suddenly they need to wait again (again without enough visual feedback on what is happening in the background).

This can be easily solved by two things: (a) pre-compiling at the end of an add or update so all waiting takes place at the same time, and (b) setting expectations in the output that pre-compilation will take some time. To do this:

  • By default, precompile all packages upon add or update. You could give an option that turns the automatic pre-compilation off (e.g. Pkg.update(:precompile=false) , but most starting users would want to leave it on, so the default should be precompile.
  • With the using intended to automatically add packages now, I suspect that the separation between the activities would already help things out.
  • As in #89, much more output during the precompilation process. The extra output has the added benefit of grounding expectations that this might be a slow process.

Most helpful comment

You could then easily run something like Pkg.update(); Pkg.precompile() to do the equivalent of what your Pkg.update(;precompile = true)

In my mind, the default behavior should be the one that gives the new users the best experience with the minimum amount of deviations from tutorials. Seasoned users will know variations on the parameters, etc. So, I think the best option is if Pkg.update() calls Pkg.precompile() by default (where Pkg.update(;precompile = false) skips that behavior). There may also be parameters you want to pass in to the precompile and update down the road for compiler settings to speed things up/etc. down the road, which could be passed along.

As a worst case, if Pkg.update() doesn't precompile, I think that all tutorials and basic user documentation should be written so new users see the Pkg.update(); Pkg.precompile() pattern, because they are unlikely to look through the documentation for settings before being frustrated.

One last thing: I think it may make sense in the interface documentation to state that or the Pkg.update() which precompiles by default may return prior to a completed precompilation, and call Pkg.precompile() separately. That gives you the ability down the road to generate asynchronous background processes for precompilation, and not break the post-conditions of the interface.

All 16 comments

Yes, an option to precompile packages in batch is a nice feature to have. I wonder if it has to be a part of Pkg.update or if we could just have a "precompile all packages that need precompiling" as a separate function.

You could then easily run something like Pkg.update(); Pkg.precompile() to do the equivalent of what your Pkg.update(;precompile = true) does. The advantage is of course that you can run Pkg.precompile() by itself if you do not want to do any updating.

Would it be possible to precompile in parallel with downloading? The REPL video you linked to in @jlperla's other issue shows incremental installation progress – if it were possible to use the download time for later packages for compiling earlier ones, that would potentially result in massive improvements in both actual and perceived load time.

It might be possible to try to find packages "at the bottom" e.g. with no dependencies, download these first and start precompiling and move our way upwards the dependency chain... It sounds a bit tricky and the Pkg3 download is usually fast enough that I am not sure how much time would be saved in practie.

You could then easily run something like Pkg.update(); Pkg.precompile() to do the equivalent of what your Pkg.update(;precompile = true)

In my mind, the default behavior should be the one that gives the new users the best experience with the minimum amount of deviations from tutorials. Seasoned users will know variations on the parameters, etc. So, I think the best option is if Pkg.update() calls Pkg.precompile() by default (where Pkg.update(;precompile = false) skips that behavior). There may also be parameters you want to pass in to the precompile and update down the road for compiler settings to speed things up/etc. down the road, which could be passed along.

As a worst case, if Pkg.update() doesn't precompile, I think that all tutorials and basic user documentation should be written so new users see the Pkg.update(); Pkg.precompile() pattern, because they are unlikely to look through the documentation for settings before being frustrated.

One last thing: I think it may make sense in the interface documentation to state that or the Pkg.update() which precompiles by default may return prior to a completed precompilation, and call Pkg.precompile() separately. That gives you the ability down the road to generate asynchronous background processes for precompilation, and not break the post-conditions of the interface.

The manifest makes a natural unit of compilation since it is a set of packages that one expects to load when running the code in a project. In that light, it would make sense to automatically compile a manifest when updating it. It's a little unclear how that applies to named environments as compared to user projects. There's also the question of where to keep the resulting .ji files: the easiest place is to put them somewhere in the project directory, e.g. .jl_compiled/ or .ji/ or something like that. The alternative would be to keep them somewhere centrally under ~/.julia and use usage logging and gc to clean them up as appropriate. This would look something like this: cache any cacheable .ji with a content-addressable path. The issue then is cleanup. Knowing which .ji files should be kept and which should be deleted.

  • (a) Keep any .ji that is still valid for an existing configuration.
  • (b) Keep any .ji files that has been sufficiently recently used.
  • (c) Keep the last N .ji files that were generated for a given environment.
  • (d) Delete any .ji file whose environments are gone.

So we need to keep track of:

  1. What environments contributed to a .ji file – save this in file.
  2. Log of when each .ji file is generated and used.

From 1 alone we can determine (a) and (d). From 2 alone we can determine (b). From 1 & 2 we can determine the most recent N .ji files used for each environment (keep those, delete all the others). From 2, we know how many .ji files have been created and we can automatically decide when we need to clean them up to make space.

In my opinion, each environment has its own compilation folder in .julia.

gc should just delete all .ji files that doesn't have a corresponding package entry in the manifest. It is unlikely this file will be valid when the package comes back so it will need recompilation anyway.

And when an environment disappears, the whole directory is deleted.

No need to keep track of anything more than we currently do.

So something like ~/.julia/compiled/guDI/ with contents that look like:

  • Env.toml containing at least

    • path = "/path/to/environment"

  • PackageA.ji
  • PackageB.ji

I was thinking that since the manifest is a natural unit of compilation, it would make sense to just have a single .ji file per environment. @vtjnash, any thoughts?

Yes, almost, except the path is based on a hash from the path to the environment so there is no need to store it explicitly. I guess each environment could have some UUID instead, so moving an environment didn't require recompilation.

I don't see an advantage of using a single .ji file. Would I need to recompile everything if I update a single package then?

the path is based on a hash from the path to the environment so there is no need to store it explicitly

But you can't go backwards from guDI to "/path/to/environment". Are you thinking that we'd look at the corresponding compiled directory when doing a pkg> gc operation and looking at every environment? It would still be helpful to be able to see in a human-readable file, which environment something corresponds to.

I don't see an advantage of using a single .ji file. Would I need to recompile everything if I update a single package then?

With the current arrangement, you need to generate multiple .ji files for a given environment, which leads to O(n^2) compile time compared to normal execution. If we know we just need to generate a single .ji file for an entire environment and we just-preload everything in it as soon as we need anything in it, then we save on compilation time and on load time.

But you can't go backwards from guDI to "/path/to/environment"

Why would you want to. You can't go from the package slug path back to the uuid + version either.

Are you thinking that we'd look at the corresponding compiled directory when doing a pkg> gc operation and looking at every environment?

Yes, this is exactly how the gc works for packages right now.

It would still be helpful to be able to see in a human-readable file, which environment something corresponds to.

Only if there is a reason to manually go into the precompilation directory, which there shouldn't be. Again, similar to package slug paths.

With the current arrangement, you need to generate multiple .ji files for a given environment, which leads to O(n^2) compile time compared to normal execution.

I'm not too sure about the technical details in precompilation so if there is a way to make it significantly faster, that sounds great. But the way I propose should work pretty much exactly as it works for Pkg2.

Why would you want to. You can't go from the package slug path back to the uuid + version either.

I was thinking of adding a way of recording that as well, it just wasn't necessary to get things working. This is a bit different since the compilation directory would correspond to one and only one environment, whereas installed package versions get used from many places.

Alright, but what is the use-case? You go into the precompilation folder with your terminal, and then what?

E.g. you want to know where the precompilation folder for an environment is so that you can delete it. You grep for the path in ~/.julia/compiled and you find the directory and you can go in there and selectively delete some or all .ji files. Of course, we can also build tooling for discovering this, but letting it be easy from the shell seems easier and more general.

E.g. you want to know where the precompilation folder for an environment is so that you can delete it

Hm, presumably you would only want this if you are not going to use the environment so you could just delete the environment and gc. Or remove packages and gc. Anyway, adding a file with the path back to the environment should be easy enough...

Or because Julia is confused and you know that it's invalid and want to force recompilation even though it thinks it doesn't need to. Or you just want to force recompilation for some other reason.

Since you are talking about how this issue interacts with a project environment, I wanted to make sure to write down the docker usecase as well, since that is also important. I would love to be able to create an image with all precompilation already done for my colleagues to use with clouds/clusters/etc. See #92

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cscherrer picture cscherrer  Â·  3Comments

KristofferC picture KristofferC  Â·  4Comments

timholy picture timholy  Â·  4Comments

cossio picture cossio  Â·  3Comments

jlperla picture jlperla  Â·  3Comments