The following is a proposed template for the information a contributor must provide when proposing creating a new Eclipse OMR compiler IL opcode.
Ultimately, I would like to create a GitHub "issue template" for creating a new compiler IL opcode that a contributor can select when creating an OMR issue that prompts them to provide the minimally required information. However, before I create an issue template I'd like to be sure we're in agreement on what the contents of the request form should be. Ideally, we'd have some form of TR IL specification in place to have some standard to measure against, but that's not likely to appear for a few weeks yet.
I'd appreciate your input. At this point I'm more interested in content rather than how it's presented.
FYI: @vijaysun-omr @andrewcraik @jdmpapin @mstoodle @Leonardo2718 @fjeremic @ymanton @gita-omr @mpirvu @joransiu
Provide the name for a single IL opcode, or list each opcode in a family of IL opcodes
Describe the semantics, structure, and operation of the new IL opcode. You must include:
Will this opcode be useful in all projects that consume OMR? If not, please justify.
Will this opcode be useful on all code generator backends in OMR? If not, please justify.
Describe any restrictions on the use of this IL opcode. For example, it must only appear as a treetop, it must not be commoned, etc.
Describe any implementation dependencies this IL opcode incurs. For example, does this IL opcode:
Describe how this proposed IL opcode will be tested in a project-independent manner that can be automated in Eclipse OMR CI builds. If Tril unit tests cannot be provided, please justify and propose an alternative.
Describe the checks that will be added to the IL validator to test this new IL opcode.
In the "Scope" section we may also want to ask a question such as:
"How will this opcode be handled by the optimizer ? i.e. is there a reasonable enough chance of optimization occurring to warrant creating an IL opcode rather than use some kind of intrinsic ?"
Maybe any "node flags" usage for the new IL opcode should also be called out explicitly in the "Operation" section since that is such a tricky area generally.
"How will this opcode be handled by the optimizer ? i.e. is there a reasonable enough chance of optimization occurring to warrant creating an IL opcode rather than use some kind of intrinsic ?"
I was trying to poke at that another way with the "require special support to be implemented in one or more optimizations before it can be used" question. I think it can be said clearer though and incorporate more of your point.
Maybe any "node flags" usage for the new IL opcode should also be called out explicitly in the "Operation" section since that is such a tricky area generally.
Agreed.
In operation we may want to specifically address anchoring semantics - eg is it a treetop, is it like a call or is it free floating.
Thanks for pulling this together, @0xdaryl ! What you have is a very good start!
"justify" sounds a bit, I don't know, confrontational? Maybe the more wordy "Please explain why a new opcode is appropriate".
Maybe have a section describing other options that were considered and why a new opcode is better than those other options? It doesn't have to be a full academic paper, but at least outlining the options and pros/cons in point form generally helps to seed the discussion around a new opcode (though can also frame it).
Other properties to explicitly call out might be "does it introduce control flow" , "does it end a basic block", "how many results does it produce (hopefully one)".
How about a provocative: "Does this opcode have the same basic feel as other compiler opcodes, or does it behave differently in any significant ways?"
Yes Mark, I also thought that a "most similar opcode" like question would be useful.
Thanks @0xdaryl for the initial take on this! Loving the initiative.
Describe any restrictions on the use of this IL opcode. For example, it must only appear as a treetop, it must not be commoned, etc.
@vijaysun-omr @andrewcraik is "must not be commoned" even a notion? I can't off top my head think of a result-producing IL which cannot be commoned. Do we have precedent for this?
Other things to possibly consider adding:
@fjeremic "must not be commoned" as a notion is an interesting question.
Obviously there are opcodes that must not be commoned in our existing set, e.g. calls (under most circumstances when we are not dealing with a "pure" call) and new allocation opcodes are both result producing but cannot be commoned since they have side-effects. These are not commoned by local CSE and PRE (our two optimization passes that do most of our local and global commoning respectively). There are other examples such as commoning of internal pointers across a GC point in a language where GC cannot deal with such pointers.
Having said that, apart from side effecting IL opcodes, I do feel that a majority of the time when someone talks about forbidding commoning at the IL opcode level, it is a bit of a red flag that there may be something not quite right with the design. Commoning a node is a very fundamental operation that enables other optimizations to keep their implementations simpler by relying on node identity rather than doing some kind of more complex match to determine if two nodes are the same (many of our local opts rely on this for optimal performance, not functional correctness).
@fjeremic, to clarify just a bit further, the main restrictions @vijaysun-omr mentioned above have to do with cases where a second identical subexpression isn't necessarily redundant. Our terminology is a bit loose, however, and we often (_commonly_ ;) say that any node with multiple parents is "commoned" into those parents. In this broader sense, any node with a result can _be_ commoned, but there are certain ways in which it might not be allowed to _get_ commoned. For example, localCSE also copy propagates within extended blocks, which is allowed to result in commoning of even call or allocation nodes with no issue. It might be clearer if we rephrase this restriction, e.g. "distinct nodes with this opcode must not be commoned with each other," or perhaps "this opcode is not idempotent in general."
A few thoughts:
In the Operation section:
- data type and semantics of each child
I'm a bit confused by the "semantics of each child" part. Is "semantics" referring to the of operations children are allowed to perform, the properties or other restrictions that children must follow, or properties that are imposed on the children by virtue of having the opcode in question as parent (or something else)?
In the Scope section:
- Will this opcode be useful in all projects that consume OMR? If not, please justify.
I can imagine cases where an opcode is useful to many projects that comsume OMR, but not all. Can the language be changed to be a little less absolute, for example: will this opcode be generally useful to projects that consume OMR? Or, how will this opcode be useful to projects that consume OMR?
For the IL Validation section, an undocumented goal of the validator is for it to require minimal special casing. Instead, it tries to rely on data tables to get information about what is considered valid IL. There are exceptions, such as the GlRegDeps opcode that behaves very differently from any other opcode. As such, making modifications to the validator should only be a consideration when a property or characteristic that can be validated is added or changed. Otherwise, modifying the validator to handle a new opcode should only be required in very exceptional cases and should be an indication that very careful consideration should be given to the addition of the opcode.
As a last remark, I think there should be a section stating the purpose of adding or the need for the new opcode, roughly answering the question: this opcode didn't exist before, why is it needed now?
I think that by "semantics of each child" it means that the meaning and use of each child operand should be described so we know what each child is for, along with any assumptions or requirements on the children.
Big +1 on the "why now?" point :) .
Most helpful comment
Maybe any "node flags" usage for the new IL opcode should also be called out explicitly in the "Operation" section since that is such a tricky area generally.