While running an internal test a JUnit exception is thrown with an assertion that a NullPointerException was not thrown when it was expected to be. I've reduced the problem down to a simple unit test as follows which exposes the bug on Z and Power:
import java.nio.channels.SelectableChannel;
import java.nio.channels.SelectionKey;
import java.nio.channels.Selector;
import java.nio.channels.spi.AbstractSelectionKey;
class Test
{
private static class FakeSelectionKey extends AbstractSelectionKey
{
FakeSelectionKey()
{
super();
}
public SelectableChannel channel()
{
return null;
}
public Selector selector()
{
return null;
}
public int interestOps()
{
return 0;
}
public SelectionKey interestOps(int arg0)
{
return null;
}
public int readyOps()
{
return 0;
}
}
public static void main(String[] args)
{
for (int i = 0; i < Integer.parseInt(args[0]); ++i)
{
foo();
}
}
public static void foo()
{
FakeSelectionKey fakeKey = new FakeSelectionKey();
try
{
fakeKey.cancel();
System.out.println("Error: Did not throw NullPointerException");
System.exit(1);
}
catch (NullPointerException e)
{
// Passed!
}
}
}
You can run the unit test via:
java -Xcompressedrefs -Xjit:'count=1,disableAsyncCompilation,limit={java/nio/channels/spi/AbstractSelectionKey.cancel()V}' -Xgcpolicy:gencon Test 1000000000
However it will not reproduce with only the command above. The reason is because we need nextGenHCR [1] to kick in but it is currently disabled at startup because of performance issues. Because this is a unit test we end up defaulting back to the old HCR which has an extra call in the cold path which prevents the bug from being exposed. To fix this do either of the following two:
export TR_DisableHCR=1 then re-run the test[1] https://github.com/eclipse/openj9/blob/master/doc/compiler/hcr/NextGenHCR.md
[2] https://github.com/eclipse/openj9/blob/28dd8748bf66a51c030949ea93286e3794478662/runtime/compiler/control/CompilationThread.cpp#L7376
@fjeremic @r30shah We checked the log files and found out that the first child of checkcastAndNullCHK would be optimized to aconst NULL in some cases. And then we traced the checkcastAndNullCHK evaluator to the function calculateInstanceOfOrCheckCastSequences which decides if the NullTest should be performed. We found that for the aconst NULL case, the code doesn’t decide to do the NullTest. In that case, we added two lines of codes to add the NullTest if the current op code is checkcastAndNullCHK and the first child is “Null” (the first child is “Null” when the isNull() returns true). After the code change, it will throw the NullPointerException in the NullTest when the first child is Null.
Following is the trace log for checkcastAndNULLCHK before the code change, where the first child is aconst NULL and the instructions below don't do the NULL check, therefore not throwing the NullPointerException.
============================================================
; Live regs: GPR=0 FPR=0 VRF=0 {}
------------------------------
n43n ( 0) checkcastAndNULLCHK [#84] [ 0x3ff71637220] bci=[-1,23,89] rc=0 vc=903 vn=- li=- udi=- nc=2
n167n ( 1) aconst NULL (X==0 X>=0 sharedMemory ) [ 0x3ff716b43f0] bci=[-1,20,89] rc=1 vc=903 vn=- li=- udi=- nc=0 flg=0x102
n42n ( 1) loadaddr java/nio/channels/spi/AbstractSelector[#366 Static] [flags 0x18307 0x0 ] [ 0x3ff716371d0] bci=[-1,23,89] rc=1 vc=903 vn=- li=- udi=- nc=0
------------------------------
checkcastAndNULLCHK:Maximum Profiled Classes = 3
Outline Super Class Test: 0
------------------------------
n43n ( 0) checkcastAndNULLCHK [#84] [ 0x3ff71637220] bci=[-1,23,89] rc=0 vc=903 vn=- li=- udi=- nc=2
n167n ( 0) aconst NULL (in GPR_0242) (X==0 X>=0 sharedMemory ) [ 0x3ff716b43f0] bci=[-1,20,89] rc=0 vc=903 vn=- li=- udi=- nc=0 flg=0x102
n42n ( 0) loadaddr java/nio/channels/spi/AbstractSelector[#366 Static] [flags 0x18307 0x0 ] [ 0x3ff716371d0] bci=[-1,23,89] rc=0 vc=903 vn=- li=- udi=- nc=0
------------------------------
[ 0x3ff71887b80] LGHI GPR_0242,0x0
[ 0x3ff71888400] ASSOCREGS
[ 0x3ff71887e30] Label L0064:
POST:
{AssignAny:GPR_0242:R}
After the code change, we add the code (930adf0627b3b38697b029047ec69c25510280d1) to take the NullTest in this case. Following is the trace log after code change:
============================================================
; Live regs: GPR=0 FPR=0 VRF=0 {}
------------------------------
n43n ( 0) checkcastAndNULLCHK [#84] [ 0x3ffb4337220] bci=[-1,23,89] rc=0 vc=903 vn=- li=- udi=- nc=2
n167n ( 1) aconst NULL (X==0 X>=0 sharedMemory ) [ 0x3ffb43b43f0] bci=[-1,20,89] rc=1 vc=903 vn=- li=- udi=- nc=0 flg=0x102
n42n ( 1) loadaddr java/nio/channels/spi/AbstractSelector[#366 Static] [flags 0x18307 0x0 ] [ 0x3ffb43371d0] bci=[-1,23,89] rc=1 vc=903 vn=- li=- udi=- nc=0
------------------------------
checkcastAndNULLCHK:Maximum Profiled Classes = 3
Outline Super Class Test: 0
checkcastAndNULLCHK: Emitting NullTest
------------------------------
n43n ( 0) checkcastAndNULLCHK [#84] [ 0x3ffb4337220] bci=[-1,23,89] rc=0 vc=903 vn=- li=- udi=- nc=2
n167n ( 0) aconst NULL (in GPR_0242) (X==0 X>=0 sharedMemory ) [ 0x3ffb43b43f0] bci=[-1,20,89] rc=0 vc=903 vn=- li=- udi=- nc=0 flg=0x102
n42n ( 0) loadaddr java/nio/channels/spi/AbstractSelector[#366 Static] [flags 0x18307 0x0 ] [ 0x3ffb43371d0] bci=[-1,23,89] rc=0 vc=903 vn=- li=- udi=- nc=0
------------------------------
[ 0x3ffb4587b80] LGHI GPR_0242,0x0
[ 0x3ffb4587c80] CGIT GPR_0242,0,BH(mask=0x8),
[ 0x3ffb4588510] ASSOCREGS
[ 0x3ffb4587f40] Label L0064:
POST:
{AssignAny:GPR_0242:R}
@0xdaryl @Leonardo2718 FYI another use case for Tril on OpenJ9. This would have definitely been caught in primitive IL tests where we pass constants to evaluators. In this case an aconst NULL child of a checkcastAndNULLCHK
Most helpful comment
@fjeremic @r30shah We checked the log files and found out that the first child of
checkcastAndNullCHKwould be optimized toaconst NULLin some cases. And then we traced thecheckcastAndNullCHKevaluator to the functioncalculateInstanceOfOrCheckCastSequenceswhich decides if the NullTest should be performed. We found that for theaconst NULLcase, the code doesn’t decide to do the NullTest. In that case, we added two lines of codes to add the NullTest if the current op code ischeckcastAndNullCHKand the first child is “Null” (the first child is “Null” when the isNull() returnstrue). After the code change, it will throw the NullPointerException in the NullTest when the first child is Null.Following is the trace log for
checkcastAndNULLCHKbefore the code change, where the first child isaconst NULLand the instructions below don't do the NULL check, therefore not throwing the NullPointerException.After the code change, we add the code (930adf0627b3b38697b029047ec69c25510280d1) to take the NullTest in this case. Following is the trace log after code change: