Omr: Produce an object file with Compiler/JitBuilder

Created on 26 May 2017  路  13Comments  路  Source: eclipse/omr

I'd like to dump a set of functions which is generated by JitBuilder into an object file of ELF or COFF format to get it linked later using a standard linker. I wasn't able to find any way to do this.
Is this possible to implement this feature?

compiler jitbuilder question

Most helpful comment

@amitin Sorry, this one got behind me for a while (even the LLVM blog got sidelined, I'm afraid, though I did manage to do a presentation via dwOpen that includes the data I collected myself, in case you or anyone else is interested).

Back to the issue at hand: last night, in an insomniatic fit, I finally finagled a working ELF object file as part of a prototype. I basically modified the "perfTool" option support to generate an object file rather than an executable, with an empty list of relocations. That let me reproduce the "Simple" code sample from JitBuilder (see jitbuilder/release/src/Simple.cpp) but managed to save the code cache as an object file which I could subsequently link against the main code of "Simple" stripped of any JitBuilder stuff . i.e. it just called some extern function called "increment" and managed to do the same tests as the original Simple code sample.

So, I'll call that a proof of concept, I guess, but it's a pretty weak solution at the moment so I'm not yet excited about inflicting it upon anyone else. It only works for functions that take int32_t and return int32_t (because generating C++ mangled signatures from names is annoying); the generated code cannot reference any external symbols (because JitBuilder doesn't ask for names for global variables other than functions), and the resulting object file is in the neighbourhood of 16MB even though it only contains a function with 3 instructions (yay) (because the default code cache size for JitBuilder is 16MB and code memory allocations can be made from "both" ends -- one end for functions and the other end for trampolines). Also, the object file is currently output as a file called /tmp/perf-<pid>.jit :) .

I'll try to make it a bit more realistic over the next little while, but wanted to give an update on my progress and let you know I hadn't (completely) dropped it.

All 13 comments

Thanks for the question @amitin There is no technical reason it could not be done, but there are certainly missing pieces in the current state of things.

The OMR compiler component and JitBuilder are really aimed at more dynamic compilation scenarios than standard static compilers, although the fundamental technology has certainly been used in other "static" compilation scenarios (just not this open source project).

Could you maybe expand a little bit on the scenarios where you'd like to use JitBuilder to generate straight to an ELF or other linkable format and be able to link that in directly? Presumably compile time is a pretty important concern for you, or is it just the sheer number of functions you need to compile that makes dynamic compilation undesirable?

Some background on the topic:

I'll talk about ELF in this post because there is actually some support for ELF objects in the OMR compiler, but the discussion points should be equally valid for other formats like COFF.

imo, ELF is not a great format for compiling code for dynamic languages unless your AOT story involves locking down many dynamic aspects of language or leaving optimization opportunities on the table because ELF doesn't give you the facilities to validate dynamic properties of your program. For example, to my mind ELF does not easily let you talk about a particular fully resolved Java class (including superclasses and interfaces. ELF lets you define or record undefined symbols that have names. Resolution matches up definitions with uses. But what if the class loaded in the current process isn't guaranteed to be the same class, but usually is? ELF has no way to express such a dependence. Either an undefined symbol is defined by someone else or it isn't. One can build additional meta data to represent such things, but then it's no longer "just" an ELF object.

With that point made, there are basically two "aspects" of the OMR compiler component that could be used as the basis of the feature you're proposing: the AOT compiler support embedded in the OMR compiler (which is really only used by the not-quite-yet open sourced J9 Java JIT right now), and the support we have for annotating samples on compiled code when using the Linux perf tool.

The OMR compiler code generators understand the notion of creating "relocations" associated with the generated code. By processing the relocations, code can be bound into a particular runtime process. Binding that code into a process can involved arbitrarily complicated relocations, from just updating a particular value in the generated code to validating in the current dynamic process that all the conditions on which an inlined code body depend currently hold and registering assumptions should any of those conditions subsequently change. This AOT support is currently quite Java-centric, however, and is not yet connected to the JitBuilder interface. In fact, some of this support only exists in the currently closed (but won't be for much longer) source J9 JVM which is based upon, and rebases regularly from, the OMR project.

The second aspect of the OMR compiler technology that could play a role is the support for the Linux perf tool, which writes the entire code cache out (on shutdown) as an ELF shared object. But that ELF object isn't complete in that it's only designed for tools like objdump to be able to root around to map symbol to addresses from the perf data samples. The generated ELF object could be extended with more symbol information so that libdl could bind the shared object into the executable, but that would rely on all the things the generated code references (or that need to be resolved) being visible as symbols to libdl. JitBuilder would probably be the easiest way to move forward on this path, since JitBuilder already uses names at the API level to distinguish runtime elements. Modulo all the earlier comments around the (lack of) flexibility of ELF for compiling dynamic languages, this approach with JitBuilder could conceivably be prototyped reasonably quickly. It would be a pretty interesting project.

@mstoodle, first of all thank you for providing such a detailed reply!

I'll provide a little more explanation of who we are and how we would use the object files
and will simplify a bit because, of course, the reality is always more complicated.

We work on a performance-critical cross-platform project for the Smalltalk programming language that has had a long history at this point. I'm going to guess you and the J9 team are intimately familiar with that history since it was originally built by IBM/OTI, which is VisualAge for Smalltalk (now rebranded as VA Smalltalk).

A large part of the system, namely the interpreter (bytecodes, prims) and native code translation, was written in Smalltalk DSL generated assembly code a long time ago and basically has gone unchanged for years.

A few years ago we took on the task of porting VA Smalltalk to 64-bit, but lacked the backend code generation expertise to appropriately graft the required 64-bit architecture code gen, of our supported operating systems, to the existing "Smalltalk Model" ASM generator.

A couple of years ago we completed a successful port of the assembly code into C but we were not able to convince the C compiler, even with register pinning intrinsics, to do what a tightly controlled asm interpreter could do. The primitives were faster by many factors, but we lost the bytecodes/sends battle because it was about 20-30% slower.
Any tweaking to correct this, particularly on x86, became a constant battle with the compiler we could never win, so we adapted and moved on.

We began testing out various technologies to give us 'cross-platform assembly' so we could get the control we wanted without being required to do the backend code generation.
For this purpose we essentially rewrote the "Smalltalk Model" using the LLVM framework. We wrote it using LLVM's IRBuilder (C++) whose generated IR is compiled to object files, which we then link with the rest of the non-critical code written in C. This all becomes our vm shared library. And for clarification, this is all during the build stage, we don't use LLVM for anything at runtime.

Our 64-bit vm works great, and we are pretty happy with LLVM with some caveats.

  1. LLVM is large and brings along with it a lot of stuff we didn't need
  2. For us, LLVM could not be used out of the box, we had to apply some patches to make it work for our purposes.
  3. Due to it's SSA nature, LLVM IR is hard to use by a human. There are no temp variables, loops and so on. Just basic blocks and transitions between them. Need a loop? Use a phi node. 'alloca' is available, but we tightly control the stack and LLVM's mem2reg pass could not always eliminate them. We started down the path of building our own API to abstract out the SSA but it started to turn into our own DSL that was getting to be a burden.
  4. Even though we have been focused on a performant interpreter, we ran a separate task to try and use LLVM for jitting purposes. It takes a long time to spin up the native code for runtime purposes so we were not convinced that was the correct approach, at least at that time.

We have just started down the path of getting acclimated to OMR, and the JitBuilder is looking like an ideal solution for implementing the kind of build time code generation that we do, as well as a JIT. The only part we think we need for the first part is the object file dumping.

Thank you again for taking the time. OMR is a fantastic project.

Very cool! Welcome, and thank you for your kind words! :)

In fact, I've been doing some simple LLVM comparative analysis lately and have been trying (struggling, really) to devote enough time to write a blog article with the data backing up my findings, which include some of the very points you mention above. Nice to know it wasn't just me :) .

I'll try to put in some time to think about the ELF story for JitBuilder code, since I think (with a few minutes thought) is the easiest path forward and sounds like it's closer to what you really want in the end.

To drive it down one more level: you want to be able to write a JitBuilder program that requests a set of MethodBuilder objects to be compiled (like any of the code samples in jitbuilder/release/src), but, at the end, be able write out the compiled code for those methods into an ELF object. You want to invoke that JitBuilder program as part of your build process.

Then, when your runtime initializes itself, you want to be able to load that ELF object with dlopen() and resolve those method entry points (using the same names as the MethodBuilder specfied) via dlsym() calls (and maybe invoke them via ThunkBuilders :) ) ?

One thing that might simplify it, as an initial prototype anyway, would be if those MethodBuilder objects generate self contained code. That's probably not super practical for your purposes, but as a proof point/early milestone might be possible.

While I'm asking questions, is a single ELF object acceptable, or would you need more flexibility than that?

I can't promise anything quickly, but I will definitely put some thought into it to see what I can come up with. Maybe prototype something, or come up with a plan to prototype something. Maybe after I finish that blog article, but we'll see....building prototypes are usually more entertaining than writing blogs so you never know :) .

Hi Mark,

Thank you for willingness to help, we at Instantiations appreciate it!

You are correct. The build-time generation of a special JitBuilder program, which we would feed with MethodBuilder instances, is exactly what we were thinking.

This program, let's call it 'ObjGen', would write out the compiled result into an object file, let's call it 'vmcore.o'. However, instead of runtime linking it using dlopen/dlsym, we would link it statically. At least, this is how LLVM plays a role in our vm.
So the build process for the vm core shared library looks something like:

  1. Build ObjGen
  2. Run ObjGen to produce vmcore.o
  3. Compile the rest of the C source code
  4. Run the linker which statically links *.o into the vm shared library

We are absolutely ok with a self-contained initial prototype, I'm going to implement it using MethodBuilder objects and I think we'll be able to show the power that ORM provides as a toolkit, even for use cases such as generating performant interpreters (well, performant for an
interpreter anyway. :)

A single ELF file is quite enough.
If ELF functionality gets too complicated to implement, then we would also be happy with assembly code listing, which could be compiled with gcc or ml.exe (MSVS assembler), if that somehow makes it easier.

Please reach out to me if you need any help with LLVM. Instantiations would be happy to provide you some vm examples that we use to help solidify your thinking for your blog article. You can contact me directly by e-mail (see my github profile).

Thanks again!

Ok, static link goal may make it a smidge simpler (avoiding GOT/PLT shenanigans to start with) and probably a touch faster, which is good for your use case. I did some research and thinking / dreaming on it over the weekend, but haven't made any significant progress. I'll continue to work this one in my free moments, and will update back here if I get anywhere.

Thanks for motivating me to look harder at this topic (which comes up from time to time from different sources :) ).

Also thanks for the offer for the LLVM help! I'll send you an early draft for what I've done, once its ready. I think a collaboration would improve the article tremendously since you've got real-world experience with the issues I'm just hitting via an experiment.

@amitin Sorry, this one got behind me for a while (even the LLVM blog got sidelined, I'm afraid, though I did manage to do a presentation via dwOpen that includes the data I collected myself, in case you or anyone else is interested).

Back to the issue at hand: last night, in an insomniatic fit, I finally finagled a working ELF object file as part of a prototype. I basically modified the "perfTool" option support to generate an object file rather than an executable, with an empty list of relocations. That let me reproduce the "Simple" code sample from JitBuilder (see jitbuilder/release/src/Simple.cpp) but managed to save the code cache as an object file which I could subsequently link against the main code of "Simple" stripped of any JitBuilder stuff . i.e. it just called some extern function called "increment" and managed to do the same tests as the original Simple code sample.

So, I'll call that a proof of concept, I guess, but it's a pretty weak solution at the moment so I'm not yet excited about inflicting it upon anyone else. It only works for functions that take int32_t and return int32_t (because generating C++ mangled signatures from names is annoying); the generated code cannot reference any external symbols (because JitBuilder doesn't ask for names for global variables other than functions), and the resulting object file is in the neighbourhood of 16MB even though it only contains a function with 3 instructions (yay) (because the default code cache size for JitBuilder is 16MB and code memory allocations can be made from "both" ends -- one end for functions and the other end for trampolines). Also, the object file is currently output as a file called /tmp/perf-<pid>.jit :) .

I'll try to make it a bit more realistic over the next little while, but wanted to give an update on my progress and let you know I hadn't (completely) dropped it.

An easy "fix" (let's call it a workaround) for the name mangling challenge is to use C linkage rather than C++. That puts the correctness onus onto the user (because the symbol is just the name of the function, so up to you to get the signature right), but means I can hook up any kind of function now.

Wow, that's really cool! Using C linkage is quite enough. Thanks a lot!

Update: I've done some modification to the previous prototype that allows C code to link statically with the test_calls function from the call example, and have its references to doublesum get properly relocated to a symbol provided at link time.

@amitin please see Luc's progress reported above :) .

@lmaisons can you please include a link to your prototype once it's clean enough for others to look at? I think being able to support calls in both directions is probably enough for @amitin to make some initial progress, even if performance won't be immediately awesome.

Provided what you've done does not break the existing support we have for the perf tool, I would be willing to merge a relatively clean prototype so we can start ratcheting this support forwards.

once it's clean enough for others to look at

I'm not sure I'm a great judge of that. The code is here: https://github.com/lmaisons/omr/tree/elf-shared-object

Commands for building the call example statically:

make -f run_configure.mk SPEC=linux_x86-64 OMRGLUE=./example/glue
cd jitbuilder/
make -j$(grep -c '^processor' /proc/cpuinfo)
cd release/
make call
rm /tmp/perf-*.jit
TR_Options='perfTool' ./call
gcc -c useCall.c
gcc -o useCall useCall.o /tmp/perf-*.jit
./useCall

Provided what you've done does not break the existing support we have for the perf tool

I'm pretty sure that got hosed as part of making this, and will require some back-tracking to mend the original perf functionality.

Update: now have a version that shouldn't collide with the existing perf infrastructure at https://github.com/lmaisons/omr/tree/elf-relocatable-object

An initial prototype for this was merged as part of #1612

Was this page helpful?
0 / 5 - 0 ratings