Julia: Precompiling modules outside of packages

Created on 16 Aug 2019  路  7Comments  路  Source: JuliaLang/julia

As far as I can tell, there's currently no built-in way to cache a precompiled module if it's not part of a package. There's two problems here:

  1. This isn't made sufficiently clear by the documentation on module precompilation. In fact, that section of the docs says you can cache arbitrary precompiled modules by simply calling Base.compilecache. This is clearly not the case since Base.compilecache takes a PkgId as its argument.
  2. For use-cases involving distributing a single script with relatively long compilation-time and short run-time, this leaves a lot of performance on the table. For me a couple times its made the difference between whether or not Julia was a viable language for the problem.

So is there any way to either:

  1. Cache precompiled arbitrary modules using the current infrastructure in ~/.julia/compiled or
  2. Automatically cache modules not part of packages in the same directory as the file(s) with the module or
  3. Provide a documented way of creating .ji files programmatically for a given module from within julia code and also loading said .ji files if they already exist.

The documentation on module precompilation should also be fixed, but I'm not confident enough in my understanding of the relation between Base and Pkg to do so myself.

doc help wanted

Most helpful comment

As far as feasibility goes, it's relatively easy to create a .ji cache for a script containing a module using the --output-ji command-line option, e.g. julia --output-ji=myprogram.ji --sysimage=/usr/lib/julia/sys.so --output-incremental=yes --compile=all myprogram.jl. It's also possible to load that .ji file later by roughly following the steps in Base._require_from_serialized. You can't just call it directly because Base.isvalid_file_crc returns false (I'm not entirely sure what that's checking or why it's different for cache files created with --output-ji).

If that could be fixed, the only real additional need is some way of being able to create .ji caches from within julia code instead of having to invoke julia with an additional flag. That may already exist, but I haven't found it yet.

EDIT: Base.compilecache creates ".ji" files by literally starting another julia process with the --output-ji flag, piping the file containing the module into the processes stdin, then running stdin through Meta.parse and then eval: https://github.com/JuliaLang/julia/blob/694d59a1a4d269e17f940dfdc4896161dd7f6738/base/loading.jl#L1172 . So calling the julia binary with the --output-ji flag appears to be the only way currently of creating ".ji" files.

All 7 comments

The line between packages and modules used to be a lot fuzzier. These days it is pretty clear: packages and only packages can be loaded with an absolute import X statement, as opposed to a relative import like import .X. Some of the documentation probably predates 1.0 when that distinction wasn't as clear as it is now. I'm not sure how feasible precompiling Julia scripts is or if it even makes sense given the existence of PackageCompiler.jl.

There's a couple nice things about distributing a script rather than a compiled binary:

  • Cross-platform automatically
  • Size of code to send is smaller
  • Introspectability

    • Users can verify the program actually does what they expect

    • Users can modify the program if they so choose

    • If distributing programs based on GPL code, sending the script means you only have to make available the single script instead of both the binary and source forms

If it's infeasible that's one thing, but I don't think the existence of PackageCompiler.jl is sufficient reason to not support arbitrary module precompilation (lack of relative importance to devote the time of people who understand precompilation on the other hand very well could be).

As far as documentation updates then I see the following needs:

  1. Say explicitly that Julia only caches precompiles for packages
  2. Remove mention of Base.compilecache since obtaining a Base.PkgId doesn't seem super straightforward.
  3. Change "module" to "package" where appropriate.

As far as feasibility goes, it's relatively easy to create a .ji cache for a script containing a module using the --output-ji command-line option, e.g. julia --output-ji=myprogram.ji --sysimage=/usr/lib/julia/sys.so --output-incremental=yes --compile=all myprogram.jl. It's also possible to load that .ji file later by roughly following the steps in Base._require_from_serialized. You can't just call it directly because Base.isvalid_file_crc returns false (I'm not entirely sure what that's checking or why it's different for cache files created with --output-ji).

If that could be fixed, the only real additional need is some way of being able to create .ji caches from within julia code instead of having to invoke julia with an additional flag. That may already exist, but I haven't found it yet.

EDIT: Base.compilecache creates ".ji" files by literally starting another julia process with the --output-ji flag, piping the file containing the module into the processes stdin, then running stdin through Meta.parse and then eval: https://github.com/JuliaLang/julia/blob/694d59a1a4d269e17f940dfdc4896161dd7f6738/base/loading.jl#L1172 . So calling the julia binary with the --output-ji flag appears to be the only way currently of creating ".ji" files.

I created a small package with a proof-of-concept for this. Let me know if I should try to upstream into Base.

https://github.com/non-Jedi/CompileModules.jl

I can confirm that this is confusing for new users. I spent months thinking my modules that I was including were getting precompiled; they're not. Read the documentation many times.

I' m thinking of a julia argument switch that does all the possible compile cache of everything when first running a julia script like:

julia --release my_script.jl

on first run and auto use compiled cache on next runs. Is it technically possible? The current slow speed on running quick and dirty scripts (which many potential users are used to in other languages) gives new comers a sense of being "cheated by advertising benchmark" if they don' t bother to learn the new way of doing things. Could the long command julia --output-ji ... in above comment be simplified to a single switch?

Was this page helpful?
0 / 5 - 0 ratings