Julia: ideas for reducing compiler latency via the interpreter

Created on 20 Sep 2019 · 5Comments · Source: JuliaLang/julia

This is a parallel issue to #33326, focused instead on the interpreter. The basic idea is that one of the best ways to reduce inference time is to not infer at all.

It's worth noting that JuliaInterpreter may be a useful playground for a subset of these ideas.

latency

Source

timholy

❤16 👍2

Most helpful comment

Idea 5: This is another approach to incrementally switching from the interpreter to the compiler.

Currently, if we want to compile a given method, we also need to compile all of the methods it directly calls (i.e. not inlined, and not dynamically dispatched via jl_apply_generic). Instead, we could emit a call to a stub that invokes the interpreter for the called method. If and when we decide to compile that method, we replace the pointer to refer to the compiled version.

Then we could try various strategies for picking methods to compile. One simple one would be to initially only compile methods with loops (backwards branches). For other methods, keep a counter and compile them once they are called a certain number of times or with a certain frequency.

jl_compile_method_internal in gf.c generally decides whether to compile things; for instance it handles the logic for --compile=min.

JeffBezanson on 20 Sep 2019

👍7 ❤6

All 5 comments

Idea 1: infer less stuff from top level. The heuristic might be that from top level, any call that (1) doesn't call a method already on the stack (i.e., no recursion) and (2) doesn't contain a backwards-pointing :goto or :gotoifnot (i.e., no loop) is unlikely to be performance sensitive and doesn't deserve inference time.

timholy on 20 Sep 2019

👍2

Idea 2: split method bodies, using the interpreter for everything except basic blocks that are inside loops. This is a more extreme version of idea 1, isolating the performance-sensitive parts of methods for compilation and interpreting everything else.

timholy on 20 Sep 2019

👍1

Idea 3: measure run time performance and compile the hot spots. This is harder than it sounds, because (1) measuring performance costs performance, and (2) you may have to solve the "entry point" problem if you discover that you're in the middle of something that's taking forever and need to switch to compiled mode mid-stream (https://github.com/JuliaDebug/JuliaInterpreter.jl/issues/44).

timholy on 20 Sep 2019

👍1

Idea 4: make the interpreter faster. Relying on the interpreter for more stuff inevitably puts pressure on its performance, and right now it's not great. Some thoughts:

introduce local method tables to circumvent much of the cost of method dispatch (already pioneered in JuliaInterpreter)
more efficient :foreigncall handling (already pioneered in JuliaInterpreter, which just compiles these)
an efficiently-interpretable bytecode for Julia's compressed ASTs. A possible sketch can be found in https://github.com/JuliaDebug/JuliaInterpreter.jl/pull/309. (There are some things I'd definitely change, but hopefully some of the ideas are good.)
(edit by @JeffBezanson ) The interpreter currently looks up all globals, even constants, in the module's hash table. Removing that overhead is a big piece of low-hanging fruit.