There are a couple of potential problems in this section of the X/codegen's bitpermuteEvaluator:
https://github.com/eclipse/omr/blob/b9d1250dea50a28faf03024394ff36ed99c4c2d1/compiler/x/codegen/OMRTreeEvaluator.cpp#L4866-L4906
shiftDependencies->addPreCondition(indexReg, TR::RealRegister::ecx, cg); --> is a preCondition required?valueReg and tmpReg are used inside the ICF, but are not part of TR::RegisterDependencyConditions* deps. Tagging @0dvictor and @fjeremic for further input.
The pre condition should not be necessary. We can save some space, compile time, and code path by avoiding this.
This is an actual bug. If register pressure is high enough around this internal control flow (ICF) region tmpReg and/or valueReg may be spilled onto the stack when first encoutnered. Since their first occurrence during register assignment is within the ICF region we would have a spill there and as such if at runtime we were to branch around the ICF region we would have inconsistent register state. All occurrences of virtual registers should appear on a post register dependency of any ICF region merge label.
Both are common practices since long time ago in X CodeGen. @andrewcraik and @0xdaryl may know better.
Both are common practices since long time ago in X CodeGen.
If so then there are likely multiple hidden bugs waiting to occur. In my opinion just because there is precedence for specific idioms in the code it doesn't mean they're bug-free nor that we should continue replicating them. We should try and be forward thinking in improving such areas of the code base.
@fjeremic is correct on both accounts.
1) It does describe the state the register must be in just prior to executing the instruction, but the precondition isn't necessary based on the way the current x86 local register assigner works.
2) Unless there is an outer register dependency that will already force assignment at a merge point (and there's no guarantee of one in the code in question) then this code is subtly incorrect. I'm not sure what is meant by "common practices" because this kind of code has never been correct since the introduction of internal control flow many years ago. If there are other examples of this in the code base then they should be found and fixed because they are subtle bugs waiting to happen (I suggest an issue to track this, and for good measure it should encompass all code generators).
Thanks for asking these questions @dchopra001 !
@dchopra001 given your recent experience in the area of the TR::xbitpermute evaluators do you want to take a crack at fixing this one?
That's a good idea! I can work on this.
Using registers not part of ICF's dependencies is seen in multiple places across OMR and OpenJ9. If it is believed incorrect, I'll create an Issue to track those.
For pre-condition, doesn't it mean a register has to satisfy such conditions before the instruction? In the case of SHLRegCL GPR0, GPR1, the instruction reads the value from GPR1, then GPR1 should be set to CL before the instruction. Shouldn't a pre-condition be required?
For pre-condition, doesn't it mean a register has to satisfy such conditions before the instruction? In the case of SHLRegCL GPR0, GPR1, the instruction reads the value from GPR1, then GPR1 should be set to CL before the instruction. Shouldn't a pre-condition be required?
When a register is assigned as part of an instruction it is marked as blocked:
for the remainder of the assignment of the current instruction. This means a pre-codition is not required assuming a backward register assignment pass which all codegens are doing.
Pre-conditions are typically only used for call linkage. The register assigner will obey the pre-codition but it will not update the register state. For calls, typically the arguments will be listed as pre-conditions. The arguments are also typically volatile registers so they will get killed by the call, which means the same volatile registers will also appear as dummy conditions on the post-conditions. So things look like this (in pseudo-codegen):
PRE: {VR_001 : R1}, {VR_002 : R2}
icall <foo>
PRE: {VR_001 : R1}, {VR_002 : R2}, ...
It is common however for the same arguments to be passed to multiple consecutive calls, such as:
PRE: {VR_001 : R1}, {VR_002 : R2}
icall <bar>
POST: {VR_001 : Dummy}, {VR_002 : Dummy}, ... {all volatile registers}
...
PRE: {VR_001 : R1}, {VR_002 : R2}
icall <foo>
POST: {VR_001 : Dummy}, {VR_002 : Dummy}, ... {all volatile registers}
If pre-conditions were treated equally as post-conditions on the previous instruction you would needlessly end up shuffling from preserved to volatile registers at each call! That is, the post-dependencies at icall <foo> would shuffle the argument registers to preserved register, then the pre-dependencies would shuffle them back to volatile register, then post-dependencies at icall <bar> would shuffle them again to preserved registers, etc.
The reason pre-conditions behave differently is because of this special fact. Pre-conditions will not update the register state but will still emit the shuffle instruction, i.e. satisfying the pre-condition at icall <foo> will emit a load to shuffle VR_001 into R1 but it will not update the register state. This means VR_001 would still be associated with a preserved register (since the dummy register dependnecies in the post condition have kicked anything out of the volatile registers). This means when we get to the post-condition of icall <bar> we no longer have to shuffle from a volatile to preserved register thus avoiding unnecessary register shuffling.
Most helpful comment
Using registers not part of ICF's dependencies is seen in multiple places across OMR and OpenJ9. If it is believed incorrect, I'll create an Issue to track those.