Omr: Define JitBuilder client API

Created on 27 Mar 2018  路  23Comments  路  Source: eclipse/omr

One of the high level goals for the JitBuilder library is to enable different language bindings for the JitBuilder API. Right now, that effort is sabotaged a bit because the public JitBuilder API is mixed together with the JitBuilder implementation classes public APIs. We should really separate the implementation from the client API.

I have been working on this for a while under another umbrella item (earlier prototypes have been discussed a bit in the context of #1819), and will submit a PR soon. The solution I've come down on creates a parallel hierarchy of "client" objects that define the API independently of the "implementation" objects that implement the API. The current set of TR::IlBuilder class and friends basically become the "implementation" classes, with each implementation object having a pointer to its corresponding client object (and vice versa). The implementation classes also define functions that other implementation classes call.

So, clients will create a (for example) MethodBuilder client object which automatically creates an implementation TR::MethodBuilder object and cross links them so that anytime a service is called on the client object, it can be forwarded to the appropriate implementation object (and vice versa for callbacks like IlBuilder::buildIL() ). You can think of the client object as a shim to call services against the implementation object.

Although it feels a bit cumbersome to have a parallel set of classes, I have a strategy to make it more manageable :) . That's not the focus of this issue (link to be added to relevant issue discussing that strategy).

There will be some breaking fallout from this kind of change, so I wanted to open this issue so we can discuss that fallout.

JitBuilder classes are currently OMR compiler classes, which means you use the class name as mapped into the TR namespace (e.g. TR::IlBuilder, TR::MethodBuilder). After the client objects are created, clients will work with those classes which are not OMR compiler classes. That means we can put them anywhere we like (except into the TR namespace :) ).

I see a few options:

  1. Put them into the global namespace
  2. Put them into a long namespace like JitBuilder:: or jitbuilder::
  3. Put them into an abbreviated namespace like JB::

I don't think 1 is the best idea. It's easy, but it just doesn't feel responsible or like it's good naming hygiene. 2 feels most satisfying to me, but makes code look a bit more "wordy" unless you rely on "using". 3 is just another two-letter acronym so the code will look more or less similar to what it is now, but "JB" could be a lot of different things and doesn't "read" well to my eye.

If I had to decide myself, I would probably come down on JitBuilder:: or jitbuilder:: because I know not everyone likes those pesky shift keys, but I thought I'd open it up to people who are using JitBuilder to place a vote or even to suggest an alternative I haven't thought of.

Another possible change we could make is to the names of the services themselves. When I created JitBuilder, I had already recognized the dual public API problem and therefore chose capitalized service names so that clients could distinguish the services that were supposed to be using from the "internal" services that also needed to have C++ public access. But, I have to confess, capitalized instance functions look really wonky to me :( .

Once the client API is isolated into the client classes, this distinction is no longer required, and so the names could be changed to possibly more palatable uncapitalized names (like add, sub, mul, etc). It wouldn't completely work (Goto, for example, cannot be straight-forwardly converted to "goto") but for most services, the conversion would be pretty easy.

This change would be a lot more painful on existing clients, though. Despite my unhappiness with capitalized names, be default I would probably keep them as is to preserve compatibility.

Does anyone have an opinion on whether this change is worth it?

Since we're going through the motions, are there any other cosmetic changes people are looking for in the client JitBuilder API?

@jduimovich @charliegracie @Leonardo2718 @0xdaryl @DanHeidinga @amitin

jitbuilder

Most helpful comment

I have produced a prototype for how the JitBuilder client API could be defined in my fork of OMR:

This prototype is not complete, but it shows how the client API could be defined by a file (see https://github.com/mstoodle/omr/blob/jitbuilder_api/jitbuilder/client/JitBuilderAPI.cpp) and then processed to generate a set of client objects that are backed by the actual implementation objects. In this model, clients only need include a file called JitBuilder.hpp which is generated into this directory: https://github.com/mstoodle/omr/tree/jitbuilder_api/jitbuilder/release/cpp/include (for the cpp API). The full C++ JitBuilder API is built into the OMR::JitBuilder namespace as we discussed above.

One nicety: there are no longer any OMR compiler header files needed to write JitBuilder clients, which simplifies the build process quite a bit (yay!) and is a first leg up towards being able to do compiles out-of-process.

I haven't done the cmake work yet, but just typing make from the jitbuilder directory works well for me on Macos. It builds jitbuilder/release/cpp/libjitbuilder.a with jitbuilder/release/cpp/include/JitBuilder.hpp (et al.) for the C++ API bindings. I actually began initially on a C API, but haven't touched it in a while so C++ is the one to focus on for now. That C API will eventually be built into jitbuilder/release/c (with its own libjitbuilder.a and include directory) and other directories for other language bindings. It doesn't have to work this way, but it was the simplest way to keep things straight as a starting point.

To vet the proof of concept implementation, I rewrote all the C++ samples to use this new client API, which only required a few small changes for most tests. You can find them in https://github.com/mstoodle/omr/tree/jitbuilder_api/jitbuilder/release/cpp/samples and the Makefile in the .../cpp directory can build them all just like in the original directory. For example, make testall will build and run almost all of the test samples (including Mandelbrot).

I can't say I'm 100% proud of the client API generator for C++ in jitbuilder/client but it does the trick. I expect we can improve on it dramatically going forward, so please just consider this a starting point to iterate from.

I'll clean up the commits a bit, since the branch is a bit of a mess right now (I wanted to get it "published" so people could take a look at it asap), and work to get it submitted as a PR that doesn't break the existing API. Then we can work to migrate the jitbuilder based testing over to use the new client C++ API. In parallel because it's the most likely enabler for most other language API bindings, I'll revisit the C API generator (starts from jitbuilder/client/C_Binding.[ch]pp for the intrepid) and convert the samples to C to verify that can work.

All 23 comments

I would like to suggest the extra long OMR::JitBuilder::* namespace (if that wasn't already the intention).

For public methods, I support the change to from PascalCase to lowerCamelCase. It might be worth pointing out: we'll have to avoid _any_ keyword in _any_ language, not just C. It could be worthwhile to prefix method names with emit or build. (EG emitAdd, emitGoTo).

I tend to lean towards OMR::JitBuilder::* as well.

I vote for OMR::JitBuilder::*.

My vote is for OMR::JitBuilder::.

I'm also in favour of using a prefix like emit or build for public methods (I have a slight preference for emit).

One question I have: Will (or perhaps should) the IlInjector API still be accessible?

IlInjector is more tightly tied to compiler IL, so I'm imagining IlBuilder is the base class for the client API. If we need to pull stuff up, we can do that, but it cannot pull any compiler IL with it.

I confess I am not fond of "emit" on every service. First, it's 4 extra characters on everything. If everything has "emit" does it really serve a purpose? Plus, "emit" (to my mind) has a flavour of code generation (i.e. the compiler step that happens after optimization). You're supposed to be describing the operations you want to perform, not the operations you want the compiler to specifically "emit". Maybe I'm being pedantic, though :( .

I had not been thinking OMR::JitBuilder::* for the simple reason that I'm not sure the extra nesting will always make sense in other languages (pre-shadowing my webcast later today, I suppose).

For example, C has no namespaces. In Java and other languages, putting all the code into a package/module makes sense (something called JitBuilder, maybe :) ) but I'm not sure there's extra value in the "OMR" being there (OMR_JitBuilder?) other than to highlight where JitBuilder came from. Or maybe people are imagining things other than JitBuilder will show up there at some point?

Thanks for the feedback so far, please keep it coming!

-1 on the emit* or build* on the API names as it seems like overkill to me. That being said I would not object to the extra character typing if that is the consensus (lol think of the extra calories I would burn everyday :) )

I had not been thinking OMR::JitBuilder::* for the simple reason that I'm not sure the extra nesting will always make sense in other languages (pre-shadowing my webcast later today, I suppose).

I missed your webcast so please ignore if this was already covered.

It might be beneficial to work through some language examples for the extra level of nesting to see how it may work in other languages.

Language | Namespacing
------------ | -------------
C++ | OMR::JitBuilder::*
java | package omr.jitbuilder.*
c | OMRJITBuilder or maybe OMR_JITBuilder
Ruby | OMR::JitBuilder with gem name omr-jitbuilder?
Python | omr.jitbuilder in package omr with module jitbuilder?
Lua | omr.jitbuilder where omr and jitbuilder are modules

@mstoodle / others - please feel free to edit this comment to add more language examples or correct the ones here.

The OMR::JitBuilder:: namespace looks good and conceptually seems "right" (especially if everything else is using OMR:: as a starting namespace) but I expect I'll be annoyed typing it over and over. I'll either default to using OMR; or complain a lot =)

This change would be a lot more painful on existing clients, though. Despite my unhappiness with capitalized names, be default I would probably keep them as is to preserve compatibility.

I like lowercase too. Now seems like the right time to do make this change. Kind of now or never given the small set of users and the work their going to have to do once the namespace changes anyway.

I think the Java package would probably end up as org.eclipse.omr.jitbuilder which is a mouthful but shouldn't have to be typed too many times. The 'omr' does make sense to me there, so maybe I'm starting to see the light on that one...

I added some off-the-top-of-my-head thinking for Ruby and Python to Dan's comment (thanks for kicking that off, @DanHeidinga !)

I can't edit @DanHeidinga's comments so this will have to do:

| Language | Namespacing |
| ------|-------------------------------------------------------------------|
| Lua | omr.jitbuilder where omr and jitbuilder are modules |

I updated it for you, @Leonardo2718 . Thanks for the input!

I have produced a prototype for how the JitBuilder client API could be defined in my fork of OMR:

This prototype is not complete, but it shows how the client API could be defined by a file (see https://github.com/mstoodle/omr/blob/jitbuilder_api/jitbuilder/client/JitBuilderAPI.cpp) and then processed to generate a set of client objects that are backed by the actual implementation objects. In this model, clients only need include a file called JitBuilder.hpp which is generated into this directory: https://github.com/mstoodle/omr/tree/jitbuilder_api/jitbuilder/release/cpp/include (for the cpp API). The full C++ JitBuilder API is built into the OMR::JitBuilder namespace as we discussed above.

One nicety: there are no longer any OMR compiler header files needed to write JitBuilder clients, which simplifies the build process quite a bit (yay!) and is a first leg up towards being able to do compiles out-of-process.

I haven't done the cmake work yet, but just typing make from the jitbuilder directory works well for me on Macos. It builds jitbuilder/release/cpp/libjitbuilder.a with jitbuilder/release/cpp/include/JitBuilder.hpp (et al.) for the C++ API bindings. I actually began initially on a C API, but haven't touched it in a while so C++ is the one to focus on for now. That C API will eventually be built into jitbuilder/release/c (with its own libjitbuilder.a and include directory) and other directories for other language bindings. It doesn't have to work this way, but it was the simplest way to keep things straight as a starting point.

To vet the proof of concept implementation, I rewrote all the C++ samples to use this new client API, which only required a few small changes for most tests. You can find them in https://github.com/mstoodle/omr/tree/jitbuilder_api/jitbuilder/release/cpp/samples and the Makefile in the .../cpp directory can build them all just like in the original directory. For example, make testall will build and run almost all of the test samples (including Mandelbrot).

I can't say I'm 100% proud of the client API generator for C++ in jitbuilder/client but it does the trick. I expect we can improve on it dramatically going forward, so please just consider this a starting point to iterate from.

I'll clean up the commits a bit, since the branch is a bit of a mess right now (I wanted to get it "published" so people could take a look at it asap), and work to get it submitted as a PR that doesn't break the existing API. Then we can work to migrate the jitbuilder based testing over to use the new client C++ API. In parallel because it's the most likely enabler for most other language API bindings, I'll revisit the C API generator (starts from jitbuilder/client/C_Binding.[ch]pp for the intrepid) and convert the samples to C to verify that can work.

Very cool! I think this is an excellent start @mstoodle :slightly_smiling_face:

Thanks for the positive feedback, @Leonardo2718 !

Incidentally, I think the "C" namespacing listed above is misleading "OMRJITBuilder" or "OMR_JITBuilder" ... C doesn't have namespaces and I'm not really interested to add such a long string prefix to every function call.

Instead, what I was imagining (and you can see in the prototype I linked to above if you look for it) is that every function will be prefixed by a "short name" of the JitBuilder class.

So IlBuilder functions are called IB_Add, IB_Sub, etc. and take an explicit IlBuilder * parameter.

BytecodeBuilder functions are called BB_Add, BB_Sub, etc. and take an explicit BytecodeBuilder * parameter.

TypeDictionary functions are called TD_DefineStruct, etc. and take an explicit TypeDictionary * pointer.

Thoughts on the C-specific naming notions for the client API?

Let's take discussion on this particular topic to #2049 since that issue is specific to the C API for JitBuilder...

Some discussion in #2049 and on the dev list has raised an interesting possible evolution for the JitBuilder client API towards less of a "callback" approach (i.e. compileMethodBuilder(myMethodBuilder, ... ) ends up calling myMethodBuilder->buildIL() to inject the desired IL into the compilation process) and more of an "directed" approach (call services on myMethodBuilder whenever you want and, when you're done, call compileMethodBuilder(myMethodBuilder, ...) to have it compiled to native code.

Since I created the API with the first approach in mind, the first one makes more sense to me overall though I think both approaches should be work-able, accepting certain trade-offs.

An important consideration is how one imagines inlining will work. I don't think the caller of compileMethodBuilder should be expected to anticipate all the callees that should/could be inlined, so I think there will always have to be a callback like RequestFunction around to produce the IL for an inlineable method candidate. That means callbacks cannot be avoided.

I can see advantages in both approaches.

For example, if the builder services have already been called on the MethodBuilder object before it comes in, there is at least a nominal notion of the "size" of a method, where-as it's more difficult to answer that question in the "callback" based approach (other than by adding another callback asking the MethodBuilder object to predict the "size").

On the other hand, if the IL is only built when buildIL() is called, some memory and time efficiency can be gained since IL will only ever be generated when it's truly needed and no intermediate format is required prior to the compileMethodBuilder call to "remember" what IL is needed.

One of the worst down sides of changing the API would be that managing memory for the incoming IL versus the new IL allocated by the JIT during compilation potentially gets more complicated (particularly if it's actual IL being allocated ahead of the compilation). In the "best" case, I think there is another intermediate format used to record JitBuilder calls that are then replayed during compileMethodBuilder to create actual IL, which brings both memory and CPU overheads. Memory management for the JIT is already a more complicated subject than anyone wants to know, and nobody seems to appreciate the JIT taking more time than absolutely necessary.

There is another drawback in that IL tends to be as good as the most current information; it's conceivable that pre-built IL would not be as good / appropriate as built-on-demand IL, though I think I can see some ways around this kind of issue too.

Are there other pro's / con's that I haven't thought of yet?

In theory, it's possible (and possibly defensible because C is procedural) for the C API to have a different "model" than other languages. But since C tends to be the easiest language to use to build other language bindings to a native library like JitBuilder, I expect whatever model is used for C will end up becoming the model that other language bindings will most naturally inherit. So I think it's an important question to consider.

Thoughts? @charliegracie @Leonardo2718 @jduimovich @0xdaryl @xliang6 @ymanton @rwy0717 @youngar @amitin @dibyendumajumdar

Hi @mstoodle

Perhaps both options can coexist. Correct me if I am wrong here:

It seems that what happens before buildIL() is some prep stuff. Maybe that can be done in the MethodBuilder constructor (if an optional parameter is set?) and a flag set to say its done so that it isn't repeated. Then client can simply use MethodBuilder (which is an ILBuilder anyway) to add instructions. Perhaps at the end in the compileMethodBuilder call an extra argument can be supplied to say 'we are all done - don't bother calling buildIL()'. Or a method can be provided by MethodBuilder to say we are done generating IL. Or the client can simply return true in buildIL().

I think that this can be the basis of a much nicer C api - and dare I say, nicer api for other languages too.

The other aspect of the api is to not expose any structures - just use opaque pointers that are cast to/from internal C++ pointers. This will give you maximum flexibility in future as you evolve internal structures.

p.s. I should add that I feel C+ clients should have the benefit of the full api in all its gory details. That is consider them "power users".

In an effort to move this forward, I have taken the work started by @mstoodle, improved on it, and opened #2915.

@mstoodle

Given that #2968 and #2915 have now been merged, should we consider closing this issue? We can create new issues for any further discussions about the evolution of Client APIs.

Yes, I think that's reasonable. Thanks for all your effort, @Leonardo2718 !

Was this page helpful?
0 / 5 - 0 ratings

Related issues

0xdaryl picture 0xdaryl  路  5Comments

0xdaryl picture 0xdaryl  路  6Comments

aviansie-ben picture aviansie-ben  路  6Comments

0xdaryl picture 0xdaryl  路  5Comments

0xdaryl picture 0xdaryl  路  5Comments