Julia: Save actual machine code in precompile files

Created on 22 Dec 2018 · 21Comments · Source: JuliaLang/julia

Essentially, store whatever is stored in a sysimage with user packages compiled into it in the standard precompile files.

I would assume that this, in combination with #30487, would go a very long way to make the interactive REPL experience of julia competitive.

I know that the core team has been thinking about this, and I did look for an existing issues that tracks this, but couldn't find any. So, I'm mainly creating the issue so that it can be assigned to a milestone and that I can follow progress :) If this is a duplicate (which I really had expected) and I just didn't find the original, please close.

precompile

Source

davidanthoff

👍9

Most helpful comment

One factor to consider here is that a lot of time is actually spent re-compiling code, not just compiling it once. When you load packages that add methods to various low-level functions it can invalidate existing native code (since that code was compiled assuming those new methods don't exist).

A lot of code also inherently involves multiple packages. For example, maybe we can compile and save some code for FixedPointNumbers and GenericLinearAlgebra, but where do we put the code for linear algebra of fixed-point matrices? Such code would not exist and not need to exist until somebody loads both packages and uses them together.

There are various mechanical difficulties to work out. For one, it's not clear which code to assign to a particular package. For example, maybe loading package A does the call Float16(1) + Int8(2) and we didn't have code for it already. All the types and functions are in Base, but is that code part of the package's code? This is just to show the kinds of cases that need to be considered and handled.

So while this is possible, we might decide it's not necessarily the best way to improve latency in terms of cost and benefit. For example, a combination of (1) using multiple cores to compile and (2) using standard tiered JIT techniques where we run more things in an interpreter first might work better. Try running julia with --compile=min if you haven't yet, to see the interpreter's effect on latency.

JeffBezanson on 22 Dec 2018

👍10 ❤1

All 21 comments

This has been discussed in various issues. One challenge is that a lot of generated code not a single package is involved but various. So it has to work differently. PackagePrecompiler is a testbed for this.

tknopp on 22 Dec 2018

The PackageCompiler is different - since the whole system is compiled in one go - and you don't get to cache machine code for external packages in addition. This feature is extremely difficult to implement.

ViralBShah on 22 Dec 2018

This feature is extremely difficult to implement.

Could you please provide an explanation?

lobingera on 22 Dec 2018

JeffBezanson on 22 Dec 2018

👍10 ❤1

Have you considered my suggestion for “Context Dispatch” dispatching based on the caller module, and storing the code in the “lowest” module down the call tree that can resolve the call.
In your example it would be the module that contains both FixedPointNumbers and GenericLinearAlgebra.

In the second example it would belong to Base because both types and the generic function + are defined there.

TsurHerman on 23 Dec 2018

Maybe another option would be to move to a model where precompile happens per environment? And then machine code for everything in that environment gets stored? And whenever one makes a change to the environment, all of that gets updated (or potentially updated, if needed). So essentially say the ]precompile would become an alias for creating a custom sysimage with all packages in that environment, that then will be automatically used whenever that environment is loaded. And maybe precompile happens whenever any change is made to the environment.

That would slow down package operations, but it might help with these complicated package interaction questions?

davidanthoff on 23 Dec 2018

👍2

Similar question appeared independently on Slack #helpdesk yesterday:

Hi all, I was recently showing off some Julia to some colleagues of mine and one of them had the question: "Why doesn't Julia just store the JIT compiled functions from one session, so it can use those in the next session if nothing changed". I had this question too some time ago but forgot the answer and can't really easily find info about it.

EricForgy on 25 Dec 2018

"just" :joy:

JeffBezanson on 26 Dec 2018

😄3 👍1

I have some problems understanding the comment "just". Obviously (to you) it's not straight forward to reuse already compiled code. And you give some examples ("it can invalidate existing native code") above.
Still, while a 100% solution might not be viable, i wonder if precompile could try to get machine code at the function level if the types and calls to subfunctions are somehow fixed. I'm missing terminology here (and i'm for sure no expert how the julia compiler works), but if a function in a module deals with a argument list of Float64 or Arrays of that and is typestable (i.e. only a single type of return) i'm missing a good story line, why this would need a re-compile.

lobingera on 27 Dec 2018

Suppose module A has a single function foo(a::Vector{Float64}) = a .+ 2. While using A; foo([1.0]) will return [3.0], using A, B; foo([1.0]) can return anything, because B can redefine addition, broadcasting, any other Base primitive, or foo itself. See #265

cstjean on 27 Dec 2018

Yea So it was me asking that question. So to the comment of indeed there being possibilities of things being redefined and whatnot makes complete sense. Hence there is the if nothing changed statement. Already, if nothing changed to a module, it doesn't "reprecompile", if things changed it does. So basically what he had in mind, and me kind of too is that a similar check is done on the previously compiled code, if no new functions with the same signature have been defined, nothing happens otherwise recompilation. Now ofcourse I can imagine that the actual implementation of that is probably pretty nontrivial, but to a novice like myself (especially to compilers and the like), it's not obvious.

louisponet on 28 Dec 2018

Maybe another option would be to move to a model where precompile happens per environment? --- https://github.com/JuliaLang/julia/issues/30488#issuecomment-449616973

I think the minimal change in Base that makes it possible is just this one line in #29914. If it gets merged, this idea can be experimented in normal libraries by using Pkg and PackageCompiler APIs.

tkf on 28 Dec 2018

Why not open this issue to be discussed with the community? core devs share your direction of thoughts and listen to the feedback from the supporters of the language.

I addressed these problems in the "Context Dispatch" idea where the method table of a function is determined by the calling function scope .. all the way down in the call tree.

Once I ready the "Context Dispatch" POC for Julia 1.0 I will post an issue asking for "problems" with saving Jitted code , and for each MWE of a problem supply a MWE of a solution.

TsurHerman on 28 Dec 2018

Why not open this issue to be discussed with the community? core devs share your direction of thoughts and listen to the feedback from the supporters of the language.

Does the Julia Github not count as being "open to the community"? I'm pretty sure the both of us are not "core devs", yet we're still able to comment on this issue :smile:

I addressed these problems in the "Context Dispatch" idea where the method table of a function is determined by the calling function scope .. all the way down in the call tree.
Once I ready the "Context Dispatch" POC for Julia 1.0 I will post an issue asking for "problems" with saving Jitted code , and for each MWE of a problem supply a MWE of a solution.

Are you referring to the idea you had previously described here? If so, it seemed like @vtjnash was not convinced that your "Context Dispatch" approach was necessary for saving and loading generated native code. Both PackageCompiler.jl and the sysimg (sys.so) are pretty good indicators of Julia's ability to save and load native code.

I think what would help here is if someone would write a package/patch that causes all (realistically most/some) JIT'd code to be written to disk, and automatically re-loaded when the correct conditions are met in a fresh Julia session. That way we'll be able to get a feel for whether saving all of this extra generated code is beneficial at all, and additionally how difficult it might be to pull this off in general.

jpsamaroo on 28 Dec 2018

PackageCompiler is a different thing, it is aimed at AOT compilation and is not easily useful as part of an on going development process, and I say that from past experience.

What I am aiming it, is the issue of caching reliably jitted code on the module level. and dynamically loading it when the module loads.

As Jeff pointed out the problem is not the caching itself , the problem is that the cache is too easily invalidated, according the the current set of dispatch rules.

TsurHerman on 28 Dec 2018

I see this effort stopped.
Isn't it possible to do the Revise to keep the cached precompilations between REPL-s and give a command to "recompile" in case of recompile in a REPL?

Cvikli on 6 Jul 2020

This is beyond the purview of Revise.

However, some things have changed: in more recent Julia versions (and particularly the in-development Julia 1.6) there will be a lot less invalidation. So little (at least for many packages) that I don't think it's a serious obstacle anymore. The others obstacles still remain, AFAIK.

timholy on 8 Jul 2020

Thank you for the answer Tim!

What do you think, is it possible to list all the obstacles?
Is it possible to solve as many of them that it would eventually mean we have to only restart it 5-10% of the time due to edgecases which doesn't work yet?

Cvikli on 8 Jul 2020

This issue is about caching native code in *.ji files, which is really quite different from improving Revise. Let's not change the focus of the issue.

Jeff listed the other obstacles to caching native code very nicely above.

timholy on 8 Jul 2020

Yeah, sorry, I didn't want to change the subject.

I misunderstanded the whole thing because from an outsider view Revise looked like a "code cache that interactively update with patching" which was so close to caching and updating native code between sessions.

Cvikli on 8 Jul 2020

On the caching issue; since invalidations will soon be in much better shape, should we talk about the remaining obstacle?

maybe we can compile and save some code for FixedPointNumbers and GenericLinearAlgebra, but where do we put the code for linear algebra of fixed-point matrices

Question: can the answer depend on circumstance? Specifically, what would happen if two different packages end up stashing the native code for the same method, is there anything particularly bad that happens?

I can imagine two strategies:

examine backedges, and if a sequence leads ultimately to PkgThatDoesStuffWithBoth.foo (which lacks backedges because it was called from toplevel), stash the code there. This doesn't help if the chain is cut via runtime-dispatch, though.
examine the package dependencies and pick one or more places that end up loading both packages. My memory is fuzzy, but I'm pretty sure I've solved this twice now, in some form, in both SnoopCompile and an unmerged Revise branch. But the Pkg devs could probably give a much better answer.

timholy on 9 Jul 2020

❤1

Was this page helpful?

0 / 5 - 0 ratings