Julia: Save actual machine code in precompile files

Created on 22 Dec 2018  ·  21Comments  ·  Source: JuliaLang/julia

Essentially, store whatever is stored in a sysimage with user packages compiled into it in the standard precompile files.

I would assume that this, in combination with #30487, would go a very long way to make the interactive REPL experience of julia competitive.

I know that the core team has been thinking about this, and I did look for an existing issues that tracks this, but couldn't find any. So, I'm mainly creating the issue so that it can be assigned to a milestone and that I can follow progress :) If this is a duplicate (which I really had expected) and I just didn't find the original, please close.

precompile

Most helpful comment

One factor to consider here is that a lot of time is actually spent re-compiling code, not just compiling it once. When you load packages that add methods to various low-level functions it can invalidate existing native code (since that code was compiled assuming those new methods don't exist).

A lot of code also inherently involves multiple packages. For example, maybe we can compile and save some code for FixedPointNumbers and GenericLinearAlgebra, but where do we put the code for linear algebra of fixed-point matrices? Such code would not exist and not need to exist until somebody loads both packages and uses them together.

There are various mechanical difficulties to work out. For one, it's not clear which code to assign to a particular package. For example, maybe loading package A does the call Float16(1) + Int8(2) and we didn't have code for it already. All the types and functions are in Base, but is that code part of the package's code? This is just to show the kinds of cases that need to be considered and handled.

So while this is possible, we might decide it's not necessarily the best way to improve latency in terms of cost and benefit. For example, a combination of (1) using multiple cores to compile and (2) using standard tiered JIT techniques where we run more things in an interpreter first might work better. Try running julia with --compile=min if you haven't yet, to see the interpreter's effect on latency.

All 21 comments

This has been discussed in various issues. One challenge is that a lot of generated code not a single package is involved but various. So it has to work differently. PackagePrecompiler is a testbed for this.

The PackageCompiler is different - since the whole system is compiled in one go - and you don't get to cache machine code for external packages in addition. This feature is extremely difficult to implement.

This feature is extremely difficult to implement.

Could you please provide an explanation?

One factor to consider here is that a lot of time is actually spent re-compiling code, not just compiling it once. When you load packages that add methods to various low-level functions it can invalidate existing native code (since that code was compiled assuming those new methods don't exist).

A lot of code also inherently involves multiple packages. For example, maybe we can compile and save some code for FixedPointNumbers and GenericLinearAlgebra, but where do we put the code for linear algebra of fixed-point matrices? Such code would not exist and not need to exist until somebody loads both packages and uses them together.

There are various mechanical difficulties to work out. For one, it's not clear which code to assign to a particular package. For example, maybe loading package A does the call Float16(1) + Int8(2) and we didn't have code for it already. All the types and functions are in Base, but is that code part of the package's code? This is just to show the kinds of cases that need to be considered and handled.

So while this is possible, we might decide it's not necessarily the best way to improve latency in terms of cost and benefit. For example, a combination of (1) using multiple cores to compile and (2) using standard tiered JIT techniques where we run more things in an interpreter first might work better. Try running julia with --compile=min if you haven't yet, to see the interpreter's effect on latency.

Have you considered my suggestion for “Context Dispatch” dispatching based on the caller module, and storing the code in the “lowest” module down the call tree that can resolve the call.
In your example it would be the module that contains both FixedPointNumbers and GenericLinearAlgebra.

In the second example it would belong to Base because both types and the generic function + are defined there.

Maybe another option would be to move to a model where precompile happens per environment? And then machine code for everything in that environment gets stored? And whenever one makes a change to the environment, all of that gets updated (or potentially updated, if needed). So essentially say the ]precompile would become an alias for creating a custom sysimage with all packages in that environment, that then will be automatically used whenever that environment is loaded. And maybe precompile happens whenever any change is made to the environment.

That would slow down package operations, but it might help with these complicated package interaction questions?

Similar question appeared independently on Slack #helpdesk yesterday:

Hi all, I was recently showing off some Julia to some colleagues of mine and one of them had the question: "Why doesn't Julia just store the JIT compiled functions from one session, so it can use those in the next session if nothing changed". I had this question too some time ago but forgot the answer and can't really easily find info about it.

"just" :joy:

I have some problems understanding the comment "just". Obviously (to you) it's not straight forward to reuse already compiled code. And you give some examples ("it can invalidate existing native code") above.
Still, while a 100% solution might not be viable, i wonder if precompile could try to get machine code at the function level if the types and calls to subfunctions are somehow fixed. I'm missing terminology here (and i'm for sure no expert how the julia compiler works), but if a function in a module deals with a argument list of Float64 or Arrays of that and is typestable (i.e. only a single type of return) i'm missing a good story line, why this would need a re-compile.

Suppose module A has a single function foo(a::Vector{Float64}) = a .+ 2. While using A; foo([1.0]) will return [3.0], using A, B; foo([1.0]) can return anything, because B can redefine addition, broadcasting, any other Base primitive, or foo itself. See #265

Yea So it was me asking that question. So to the comment of indeed there being possibilities of things being redefined and whatnot makes complete sense. Hence there is the if nothing changed statement. Already, if nothing changed to a module, it doesn't "reprecompile", if things changed it does. So basically what he had in mind, and me kind of too is that a similar check is done on the previously compiled code, if no new functions with the same signature have been defined, nothing happens otherwise recompilation. Now ofcourse I can imagine that the actual implementation of that is probably pretty nontrivial, but to a novice like myself (especially to compilers and the like), it's not obvious.

Maybe another option would be to move to a model where precompile happens per environment? --- https://github.com/JuliaLang/julia/issues/30488#issuecomment-449616973

I think the minimal change in Base that makes it possible is just this one line in #29914. If it gets merged, this idea can be experimented in normal libraries by using Pkg and PackageCompiler APIs.

Why not open this issue to be discussed with the community? core devs share your direction of thoughts and listen to the feedback from the supporters of the language.

I addressed these problems in the "Context Dispatch" idea where the method table of a function is determined by the calling function scope .. all the way down in the call tree.

Once I ready the "Context Dispatch" POC for Julia 1.0 I will post an issue asking for "problems" with saving Jitted code , and for each MWE of a problem supply a MWE of a solution.

Why not open this issue to be discussed with the community? core devs share your direction of thoughts and listen to the feedback from the supporters of the language.

Does the Julia Github not count as being "open to the community"? I'm pretty sure the both of us are not "core devs", yet we're still able to comment on this issue :smile:

I addressed these problems in the "Context Dispatch" idea where the method table of a function is determined by the calling function scope .. all the way down in the call tree.
Once I ready the "Context Dispatch" POC for Julia 1.0 I will post an issue asking for "problems" with saving Jitted code , and for each MWE of a problem supply a MWE of a solution.

Are you referring to the idea you had previously described here? If so, it seemed like @vtjnash was not convinced that your "Context Dispatch" approach was necessary for saving and loading generated native code. Both PackageCompiler.jl and the sysimg (sys.so) are pretty good indicators of Julia's ability to save and load native code.

I think what would help here is if someone would write a package/patch that causes all (realistically most/some) JIT'd code to be written to disk, and automatically re-loaded when the correct conditions are met in a fresh Julia session. That way we'll be able to get a feel for whether saving all of this extra generated code is beneficial at all, and additionally how difficult it might be to pull this off in general.

PackageCompiler is a different thing, it is aimed at AOT compilation and is not easily useful as part of an on going development process, and I say that from past experience.

What I am aiming it, is the issue of caching reliably jitted code on the module level. and dynamically loading it when the module loads.

As Jeff pointed out the problem is not the caching itself , the problem is that the cache is too easily invalidated, according the the current set of dispatch rules.

I see this effort stopped.
Isn't it possible to do the Revise to keep the cached precompilations between REPL-s and give a command to "recompile" in case of recompile in a REPL?

This is beyond the purview of Revise.

However, some things have changed: in more recent Julia versions (and particularly the in-development Julia 1.6) there will be a lot less invalidation. So little (at least for many packages) that I don't think it's a serious obstacle anymore. The others obstacles still remain, AFAIK.

Thank you for the answer Tim!

What do you think, is it possible to list all the obstacles?
Is it possible to solve as many of them that it would eventually mean we have to only restart it 5-10% of the time due to edgecases which doesn't work yet?

This issue is about caching native code in *.ji files, which is really quite different from improving Revise. Let's not change the focus of the issue.

Jeff listed the other obstacles to caching native code very nicely above.

Yeah, sorry, I didn't want to change the subject.

I misunderstanded the whole thing because from an outsider view Revise looked like a "code cache that interactively update with patching" which was so close to caching and updating native code between sessions.

On the caching issue; since invalidations will soon be in much better shape, should we talk about the remaining obstacle?

maybe we can compile and save some code for FixedPointNumbers and GenericLinearAlgebra, but where do we put the code for linear algebra of fixed-point matrices

Question: can the answer depend on circumstance? Specifically, what would happen if two different packages end up stashing the native code for the same method, is there anything particularly bad that happens?

I can imagine two strategies:

  • examine backedges, and if a sequence leads ultimately to PkgThatDoesStuffWithBoth.foo (which lacks backedges because it was called from toplevel), stash the code there. This doesn't help if the chain is cut via runtime-dispatch, though.
  • examine the package dependencies and pick one or more places that end up loading both packages. My memory is fuzzy, but I'm pretty sure I've solved this twice now, in some form, in both SnoopCompile and an unmerged Revise branch. But the Pkg devs could probably give a much better answer.
Was this page helpful?
0 / 5 - 0 ratings

Related issues

manor picture manor  ·  3Comments

omus picture omus  ·  3Comments

StefanKarpinski picture StefanKarpinski  ·  3Comments

wilburtownsend picture wilburtownsend  ·  3Comments

arshpreetsingh picture arshpreetsingh  ·  3Comments