https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_sanity.functional_x86-32_windows_OMR_testList_0/105
There is a core file in the artifact.
May be the same problem as https://github.com/eclipse/openj9/issues/10490
09:48:02 [javac] Compiling 20 source files to C:\Users\****\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_OMR_testList_0\openjdk-tests\functional\cmdLineTests\shareClassTests\SCHelperCompatTests\bin\BallSports
09:48:03 [javac] Unhandled exception
09:48:03 [javac] Type=Segmentation error vmState=0x00000000
09:48:03 [javac] J9Generic_Signal_Number=00000004 ExceptionCode=c0000005 ExceptionAddress=719FD174 ContextFlags=0001007f
09:48:03 [javac] Handler1=72209F60 Handler2=72135120 InaccessibleReadAddress=FFFFFFF7
09:48:03 [javac] EDI=21899818 ESI=24CEF718 EAX=FFFFFFFB EBX=00000009
09:48:03 [javac] ECX=112500DB EDX=21F63E54
09:48:03 [javac] EIP=719FD174 ESP=00C4F06C EBP=FFFFFFFB EFLAGS=00210202
09:48:03 [javac] GS=002B FS=0053 ES=002B DS=002B
09:48:03 [javac] Module=C:\Users\****\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_OMR_testList_0\openjdkbinary\j2sdk-image\jre\bin\default\j9jit29.dll
09:48:03 [javac] Module_base_address=718B0000 Offset_in_DLL=0014d174
09:48:03 [javac] Target=2_90_20200916_721 (Windows Server 2012 R2 6.3 build 9600)
09:48:03 [javac] CPU=x86 (8 logical CPUs) (0x1ffb9c000 RAM)
09:48:03 [javac] ----------- Stack Backtrace -----------
09:48:03 [javac] Java_java_lang_invoke_MutableCallSite_invalidate+0xc8054 (0x719FD174 [j9jit29+0x14d174])
09:48:03 [javac] J9VMDllMain+0x4a6b (0x7190AAFB [j9jit29+0x5aafb])
09:48:03 [javac] ---------------------------------------
@andrewcraik fyi
that backtrace looks very dubious - esp since it is windows - the offset is insane....
yup, but there is a core file to look at, available for a short time.
Another "similar" crash, adding to the 0.23 milestone plan.
MauveSingleInvocationLoadTest_special_22
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_special.system_x86-64_windows_Nightly_mauveLoadTest/134
LT 23:12:45.733 - Completed 80.1%. Number of tests started=2256
LT stderr Unhandled exception
LT stderr Type=Segmentation error vmState=0x00000000
LT stderr Windows_ExceptionCode=c0000005 J9Generic_Signal=00000004 ExceptionAddress=00007FFA4ED40052 ContextFlags=0010005f
LT stderr Handler1=00007FFA5024FD00 Handler2=00007FFA50168C50 InaccessibleReadAddress=FFFFFFFFFFFFFFFF
LT stderr RDI=00007FFA3BBE85C3 RSI=00007FFA3BBE85C8 RAX=FFBC1C10FFBC1C00 RBX=0000000001DA3100
LT stderr RCX=00000000FFBC1900 RDX=00007FFA3BBE85C8 R8=0000000000000000 R9=0000000001DA3500
LT stderr R10=00000000013EFFF0 R11=00000000FFFF0000 R12=00000000FFFD13E0 R13=0000000000000010
LT stderr R14=0000000000000000 R15=00000000FFFD1390
LT stderr RIP=00007FFA4ED40052 RSP=0000000001A0C660 RBP=0000000001A05000 GS=002B
LT stderr FS=0053 ES=002B DS=002B
LT stderr XMM0 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT stderr XMM1 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT stderr XMM2 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT stderr XMM3 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT stderr XMM4 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT stderr XMM5 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT stderr XMM6 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT stderr XMM7 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT stderr XMM8 3fdfd535dd2acfe8 (f: 3710570496.000000, d: 4.973883e-001)
LT stderr XMM9 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT stderr XMM10 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT stderr XMM11 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT stderr XMM12 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT stderr XMM13 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT stderr XMM14 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT stderr XMM15 0000000000000000 (f: 0.000000, d: 0.000000e+000)
LT stderr Module=C:\Users\jenkins\workspace\Test_openjdk8_j9_special.system_x86-64_windows_Nightly_mauveLoadTest\openjdkbinary\j2sdk-image\jre\bin\compressedrefs\j9jit29.dll
LT stderr Module_base_address=00007FFA4EBB0000 Offset_in_DLL=0000000000190052
LT stderr Target=2_90_20200921_521 (Windows Server 2012 R2 6.3 build 9600)
LT stderr CPU=amd64 (8 logical CPUs) (0x1ffb9c000 RAM)
LT stderr ----------- Stack Backtrace -----------
LT stderr Java_java_lang_invoke_MutableCallSite_invalidate+0xf6ad2 (0x00007FFA4ED40052 [j9jit29+0x190052])
LT stderr (0x00000000FFBC1900)
LT stderr (0x00000000FFBC1900)
LT stderr (0x00000000FFFD0040)
LT stderr (0x00000000FFFD0060)
LT stderr Java_java_lang_invoke_MutableCallSite_invalidate+0x5df750 (0x00007FFA4F228CD0 [j9jit29+0x678cd0])
LT stderr (0x00007FFA3BBE85C8)
LT stderr (0x00007FFA3BB39A14)
LT stderr (0x00007FFA3BB39A14)
LT stderr (0x00000000FFBC1798)
LT stderr ---------------------------------------
@rpshukla Could you do a grinder to get the failure rate?
Not really sure what's going on. I took a look at the core from the original crash this issue was opened against. Even with symbols, windbg wasn't able to print out the backtrace properly. However, looking at it manually:
[javac] EDI=21899818 ESI=24CEF718 EAX=FFFFFFFB EBX=00000009
[javac] ECX=112500DB EDX=21F63E54
[javac] EIP=719FD174 ESP=00C4F06C EBP=FFFFFFFB EFLAGS=00210202
Crashing instruction 0x719FD174:
j9jit29!J9::Recompilation::getJittedBodyInfoFromPC:
719fd170 8b442404 mov eax, dword ptr [esp+4]
719fd174 f640fc30 test byte ptr [eax-4], 30h
719fd178 7404 je j9jit29!J9::Recompilation::getJittedBodyInfoFromPC+0xe (719fd17e)
719fd17a 8b40f8 mov eax, dword ptr [eax-8]
719fd17d c3 ret
Stack:
00000000`00C4F06C 7190AAFB FFFFFFFB 2189BD80 00000000 00000003 21899500 00000009 00C4F338
00000000`00C4F08C 2189C4D8 01730918 21899500 2189BD80 2189C4D8 00000001 716ED859 719026EB
00000000`00C4F0AC 21899500 00C4F338 00000000 00000002 00000000 00C4F332 71902792 00000000
Method before calling getJittedBodyInfoFromPC (i.e. near 0x7190AAFB):
7190aabe 8b6a0c mov ebp, dword ptr [edx+0Ch]
7190aac1 83fdfd cmp ebp, 0FFFFFFFDh
7190aac4 0f8472050000 je j9jit29!DLTLogic+0x6cc (7190b03c)
7190aaca 83e0f0 and eax, 0FFFFFFF0h
7190aacd 8b00 mov eax, dword ptr [eax]
7190aacf f7400c00000004 test dword ptr [eax+0Ch], 4000000h
7190aad6 0f8560050000 jne j9jit29!DLTLogic+0x6cc (7190b03c)
7190aadc 8b842428010000 mov eax, dword ptr [esp+128h]
7190aae3 85c0 test eax, eax
7190aae5 0f8e51050000 jle j9jit29!DLTLogic+0x6cc (7190b03c)
7190aaeb 8a4a0c mov cl, byte ptr [edx+0Ch]
7190aaee f6d1 not cl
7190aaf0 f6c101 test cl, 1
7190aaf3 742c je j9jit29!DLTLogic+0x1b1 (7190ab21)
7190aaf5 55 push ebp
7190aaf6 e875260f00 call j9jit29!J9::Recompilation::getJittedBodyInfoFromPC (719fd170)
7190aafb 83c404 add esp, 4
Note, ebp comes from
7190aabe 8b6a0c mov ebp, dword ptr [edx+0Ch]
and is pushed on to the stack before the call to getJittedBodyInfoFromPC, which can be seen as FFFFFFFB, which is why the crash happens at inaccessible addr 0xFFFFFFF7 because FFFFFFFB - 4 = FFFFFFF7.
However, note also that edx does not change between the assignment of ebp (in instr 0x7190aabe) all the way to the crash in getJittedBodyInfoFromPC. So, looking at edx from the reg value printed out during the crash:
00000000`21F63E54 24CEF72C 21F64430 00000018 23166A24 24CEF834 21F64430 00000006 00000011
00000000`21F63E74 24CEF878 21F64430 00000006 000007D1 24CEF8C0 21F64430 00000018 23164BE4
00000000`21F63E94 24CEFA74 21F64430 00000018 230EAA84 24CEFBA0 21F64430 00000006 000007D1
So edx + 0x0Ch should be 0x23166A24. Furthermore, if edx + 0x0Ch was indeed 0xFFFFFFFB, then the test at instr 0x7190aaf0
7190aaeb 8a4a0c mov cl, byte ptr [edx+0Ch]
7190aaee f6d1 not cl
7190aaf0 f6c101 test cl, 1
7190aaf3 742c je j9jit29!DLTLogic+0x1b1 (7190ab21)
would have resulted in the thread jumping over the call to getJittedBodyInfoFromPC. I have no idea how ebp got the value of 0xFFFFFFFB..
Heh, actually, thinking on it a bit more I believe I figured out what's going on. It's caused by a race condition because of the compiler (visual studio) privatizing a field but not consistently using the privatized version.
At the time ebp is assigned, walkState.method->extra == J9_JIT_QUEUED_FOR_COMPILATION == 0xFFFFFFFB. At the time the isCompiled call is made, walkState.method->extra == 0x23166A24. edx == walkState.method.
The compiler chose to use edx+0x0Ch when it inlined the isCompiled test, but ebp (the privatized field) as the parm for getJittedBodyInfoFromPC.
Not 100% sure what a "fix" should be here... @andrewcraik do you have any suggestions? The relevant code is https://github.com/eclipse/openj9/blob/352f71b11badb3a9fd137c84237810224c36855a/runtime/compiler/control/HookedByTheJit.cpp#L867-L869
Not 100% sure what a "fix" should be here...
The reason for stating this is the C++ code is technically correct here because:
if TR::CompilationInfo::isCompiled(walkState.method) returns false, then even if the extra were to change, it doesn't matter
if TR::CompilationInfo::isCompiled(walkState.method) returns true, then walkState.method->extra is necessarily a valid startPC.
Leo mentioned the issue might be happening because of a lack of volatile somewhere. Perhaps one solution is to explicitly privatize walkState.method, something like
volatile J9Method* method = walkState.method;`
My first reaction was that we are missing a volatile. If it is that we are not using a single canonical definition of the extra field and others could be making changes then we need to mark extra volatile to stop the compiler privatizing and doing other things to stop reads or writes from memory IMO.
10x grinder for MauveSingleInvocationLoadTest_special_22 passed:
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3951/
running more grinders:
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3952/
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3953/
@rpshukla Thanks for the runs. The second crash is a different issue, it is a crash in populateVPicSlotCall. I created #10665 to track this issue.
Leo mentioned the issue might be happening because of a lack of
volatilesomewhere. Perhaps one solution is to explicitly privatizewalkState.method, something likevolatile J9Method* method = walkState.method;`
I am strongly against this - it's a hack which fixes exactly one place. Does this even have any real meaning in the specification? We had to do this in a few places in the interpreter to fix ZOS XLC, but the correct solution would have been to make the field volatile. It also results in the compiler doing very foolish things (even though it correctly read the field and stored it to the stack, it would go to the stack every time instead of registerizing the value, which makes no sense).
Is it possible for j9method->extra to move from "startPC" (i.e. last bit 0) to something else? If so, then with volatile we have the original problem: we do the test, find the method compiled, then read extra again to find the jitted body info, but this changed to something else and we crash.
Yes, the field would need to be volatile and also read only once into a local.
Another similar crash.
https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_extended.system_x86-32_windows_Nightly/496
MT2 stderr Type=Segmentation error vmState=0x00000000
MT2 stderr J9Generic_Signal_Number=00000004 ExceptionCode=c0000005 ExceptionAddress=714FE254 ContextFlags=0001007f
MT2 stderr Handler1=71D1A070 Handler2=71C45120 InaccessibleReadAddress=FFFFFFF7
MT2 stderr EDI=2AFDFD18 ESI=35629A0C EAX=FFFFFFFB EBX=00000001
MT2 stderr ECX=1124005B EDX=3543FAB4
MT2 stderr EIP=714FE254 ESP=324DF710 EBP=FFFFFFFB EFLAGS=00010202
MT2 stderr GS=002B FS=0053 ES=002B DS=002B
MT2 stderr Module=C:\Users\jenkins\workspace\Test_openjdk8_j9_extended.system_x86-32_windows_Nightly\openjdkbinary\j2sdk-image\jre\bin\default\j9jit29.dll
MT2 stderr Module_base_address=713B0000 Offset_in_DLL=0014e254
MT2 stderr Target=2_90_20200922_517 (Windows Server 2012 R2 6.3 build 9600)
MT2 stderr CPU=x86 (8 logical CPUs) (0x1ffb9c000 RAM)
MT2 stderr ----------- Stack Backtrace -----------
STF 22:47:26.463 - Found dump at: C:\Users\jenkins\workspace\Test_openjdk8_j9_extended.system_x86-32_windows_Nightly\openjdk-tests\TKG\test_output_16008298786808\SharedClasses.SCM23.MultiThread_0\20200922-224406-SharedClasses\results\core.20200922.224726.3384.0001.dmp
MT2 stderr Java_java_lang_invoke_MutableCallSite_invalidate+0xc9574 (0x714FE254 [j9jit29+0x14e254])
MT2 stderr J9VMDllMain+0x4a6b (0x7140ABAB [j9jit29+0x5abab])
MT2 stderr ---------------------------------------