openjdk version "11.0.7" 2020-04-14
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.7+10)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.20.0, JRE 11 Linux amd64-64-Bit Compressed References 20200416_574 (JIT enabled, AOT enabled)
OpenJ9 - 05fa2d361
OMR - d4365f371
JCL - 838028fc9d based on jdk-11.0.7+10)
Segmentation fault:
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007F9E1F2FDED0 Handler2=00007F9E1EBEF9F0 InaccessibleAddress=0000000000000008
RDI=0000000001CDAC68 RSI=0000000000000008 RAX=0000000000000850 RBX=00007F9DBE5DF187
RCX=0000000001CDABA0 RDX=0000000000000008 R8=0000000000000000 R9=00000000FD970CA8
R10=00000000FD970CA8 R11=00000000000000FF R12=0000000001CDAB98 R13=0000000002C8F4D0
R14=00007F9DBE5DF18C R15=00007F9DF83049D0
RIP=00007F9E1F285ADB GS=0000 FS=0000 RSP=00007F9DF8304650
EFlags=0000000000010202 CS=0033 RBP=0000000000000001 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000008
xmm0 0000000000000006 (f: 6.000000, d: 2.964394e-323)
xmm1 00000000fd9765f8 (f: 4254557696.000000, d: 2.102031e-314)
xmm2 00007f9df8304a00 (f: 4163914240.000000, d: 6.932554e-310)
xmm3 00007f9df8304a00 (f: 4163914240.000000, d: 6.932554e-310)
xmm4 43e0000000000000 (f: 0.000000, d: 9.223372e+18)
xmm5 000000003d01e9e2 (f: 1023535616.000000, d: 5.056938e-315)
xmm6 000000004aef5a72 (f: 1257200256.000000, d: 6.211394e-315)
xmm7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm8 0000000048123dc0 (f: 1209155072.000000, d: 5.974019e-315)
xmm9 000000003f8dd600 (f: 1066259968.000000, d: 5.268024e-315)
xmm10 00000000000025c9 (f: 9673.000000, d: 4.779097e-320)
xmm11 00000000411a9820 (f: 1092261888.000000, d: 5.396491e-315)
xmm12 000000004246d6ae (f: 1111938688.000000, d: 5.493707e-315)
xmm13 0000000044de0000 (f: 1155399680.000000, d: 5.708433e-315)
xmm14 0000000045310000 (f: 1160839168.000000, d: 5.735308e-315)
xmm15 414f79ffb262a5ad (f: 2992809472.000000, d: 4.125695e+06)
Module=/opt/java/openjdk/lib/compressedrefs/libj9vm29.so
Module_base_address=00007F9E1F26B000
Target=2_90_20200416_574 (Linux 4.15.0-1082-azure)
CPU=amd64 (4 logical CPUs) (0x3e92f0000 RAM)
----------- Stack Backtrace -----------
(0x00007F9E1F285ADB [libj9vm29.so+0x1aadb])
(0x00007F9E1F27DB30 [libj9vm29.so+0x12b30])
(0x00007F9E1F339B02 [libj9vm29.so+0xceb02])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2020/05/27 14:48:26 - please wait.
JVMDUMP032I JVM requested System dump using '//core.20200527.144826.6.0001.dmp' in response to an event
JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport %p %s %c %d %P %E" specifies that the core dump is to be piped to an external program. Attempting to rename either core or core.230.
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007F9E1F2FDED0 Handler2=00007F9E1EBEF9F0 InaccessibleAddress=0000000000000008
RDI=0000000000C08618 RSI=0000000000000008 RAX=0000000000000850 RBX=00007F9DBE5DF187
RCX=0000000000C08550 RDX=0000000000000008 R8=0000000000000000 R9=00000000FD921428
R10=00000000FD921428 R11=00000000000000FF R12=0000000000C08548 R13=0000000002C8F4D0
R14=00007F9DBE5DF18C R15=00007F9DF97B29D0
RIP=00007F9E1F285ADB GS=0000 FS=0000 RSP=00007F9DF97B2650
EFlags=0000000000010202 CS=0033 RBP=0000000000000001 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000008
xmm0 0000000000000006 (f: 6.000000, d: 2.964394e-323)
xmm1 00000000fdd32bb8 (f: 4258475008.000000, d: 2.103966e-314)
xmm2 00007f9df97b2a00 (f: 4185598464.000000, d: 6.932555e-310)
xmm3 00007f9df97b2a00 (f: 4185598464.000000, d: 6.932555e-310)
xmm4 43e0000000000000 (f: 0.000000, d: 9.223372e+18)
xmm5 000000003d01e9e2 (f: 1023535616.000000, d: 5.056938e-315)
xmm6 000000004aef5a72 (f: 1257200256.000000, d: 6.211394e-315)
xmm7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm8 0000000047e38180 (f: 1206092160.000000, d: 5.958887e-315)
xmm9 000000003f6451c0 (f: 1063539136.000000, d: 5.254582e-315)
xmm10 42b255f50e94e148 (f: 244638016.000000, d: 2.016039e+13)
xmm11 00000000411567aa (f: 1091921792.000000, d: 5.394811e-315)
xmm12 000000004252ece6 (f: 1112730880.000000, d: 5.497621e-315)
xmm13 3f847ae147ae147b (f: 1202590848.000000, d: 1.000000e-02)
xmm14 4014000000000000 (f: 0.000000, d: 5.000000e+00)
xmm15 414f79ffb262a5ad (f: 2992809472.000000, d: 4.125695e+06)
Module=/opt/java/openjdk/lib/compressedrefs/libj9vm29.so
Module_base_address=00007F9E1F26B000
Target=2_90_20200416_574 (Linux 4.15.0-1082-azure)
CPU=amd64 (4 logical CPUs) (0x3e92f0000 RAM)
----------- Stack Backtrace -----------
(0x00007F9E1F285ADB [libj9vm29.so+0x1aadb])
(0x00007F9E1F27DB30 [libj9vm29.so+0x12b30])
(0x00007F9E1F339B02 [libj9vm29.so+0xceb02])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2020/05/27 14:48:31 - please wait.
JVMDUMP012E Error in System dump: The core file created by child process with pid = 230 was not found. Expected to find core file with name "//core"
JVMDUMP032I JVM requested Java dump using '//javacore.20200527.144826.6.0003.txt' in response to an event
JVMDUMP032I JVM requested System dump using '//core.20200527.144831.6.0002.dmp' in response to an event
JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport %p %s %c %d %P %E" specifies that the core dump is to be piped to an external program. Attempting to rename either core or core.231.
JVMDUMP010I Java dump written to //javacore.20200527.144826.6.0003.txt
JVMDUMP012E Error in System dump: The core file created by child process with pid = 231 was not found. Expected to find core file with name "//core"
JVMDUMP032I JVM requested Snap dump using '//Snap.20200527.144826.6.0004.trc' in response to an event
JVMDUMP032I JVM requested Java dump using '//javacore.20200527.144831.6.0005.txt' in response to an event
JVMDUMP010I Snap dump written to //Snap.20200527.144826.6.0004.trc
JVMDUMP007I JVM Requesting JIT dump using '//jitdump.20200527.144826.6.0006.dmp'
JVMDUMP010I JIT dump written to //jitdump.20200527.144826.6.0006.dmp
JVMDUMP013I Processed dump event "gpf", detail "".
JAVA_OPTS:
-Xtune:virtualized -Xshareclasses:cacheDir=/jvmcache,nonfatal,name=core-openj9-020 -Xaot:loadExclude={akka/stream/impl/GraphStageIsland.onIslandReady*} -Xscmx512M -Xjit:enableSelfTuningScratchMemoryUsageBeforeCompile
We have seen this when the cache had been previously propulated with a different version of some of the classes. Specifically, the SCC had classes for Akka 2.5.26 that conflicted with the classes from Akka 2.6.5 that the crashing JVM needed.
My expectation would be that this problem shouldn't generate a crash. One of two things could happen:
Access to the (jextract'ed) core file would help progress this, but I see there was an error creating it. Running java -Xcheck:dump may help determine why.
AOT is mentioned, does the problem still occur when -Xnoaot is specified?
It does not. -Xnoaot fixes the problem, as well as specifying a different SCC name to avoid the class conflicts.
@hangshao0 @dsouzai pls take a look.
@edrevo could you run with
-Xaot:traceRelocatableDataDetailsRT,traceRelocatableDataRT,log=log,aotrtDebugLevel=30,rtLog=rtLog
on both runs (ie populate SCC run and load SCC run)? Btw, you'll have to merge all your AOT options into one -Xaot string as the compiler only looks at the rightmost one.
Also, providing the corefile is probably essential.
@edrevo do you have a reproducible test case we can use? it would help to get this resolved.
Hi @DanHeidinga. After 4 hard hours trying to get the necessary conditions to repro this bug again, I have been able to get the segfault once again!
I have run the JVMs with the aot options @dsouzai mentioned. This is a k8s environment, so k8s is constantly rebooting the failing pods, which is why there are so many logs (appologies for that). Logs are quite large, so here's a link so you can download the file: https://1drv.ms/u/s!AvHYjMzLfKM5g8Vaoel_h6nPEYlMBA?e=4VEzoN
The core dump continues to fail because of apport interaction with cgroups. This is a common issue that spans both docker (https://github.com/kubernetes/kubernetes/issues/48787) and k8s (https://github.com/kubernetes/kubernetes/issues/48787). If there's any JVM option that would allow the dump to bypass the normal kernel handling of crash dumps, I would be able to get a core dump.
I have a core dump! In case anyone stumbles on Google on this thread: if you need a core dump of OpenJ9 on a container or a Kubernetes pod, it will fail to due the apport configuration in the host. In order to solve the problem, you can run your pod/container as privileged and configure the JVM with -Xdump:directory=<host mount> so you can retrieve the dump after the container restarts (or access the host, if that is an option).
Anyway, here's the dump!
baikal2.zip
Thanks @edrevo for getting the logs and core!
@dsouzai can you jump on this? Logs and core are now available
@edrevo baikal2.zip doesn't have a coredump; it only has the javacore, jitdump, and Snapfile. The jitdump is a badly named file (.dmp) because it's not a coredump, but a tracelog.
馃槶
are the logs I attached enough to investigate the issue?
Unfortunately no :/. Those logs would've been helpful in conjuction with the coredump, because (and this is me hoping how things would go heh) I would somehow find what was causing the seg fault, use that to find what class caused that wrong thing to happen, and then see if we relocated/validated that class in the logs you sent, and then kind of go from there.
Labeling as JIT since the problem doesn't occur with -Xnoaot.
@dsouzai what's the next step here? This is still tagged for 0.21 so are we going to try and fix it or do we need to move it to 0.22?
I don't have enough information right now; without a coredump in conjunction with those relo tracing logs, there's not much I can do really.
Given the lack of diagnostics I think we have to defer to 0.22 - FYI @pshipton in case you feel strongly otherwise, but there isn't much to go on and nothing to investigate with so I don't think this can be fixed for M2.
OK, I think I do have a core dump now (core*.dmp), but it is quite large, so here's a link to my OneDrive. Let me know if you have any problems accessing it: https://1drv.ms/u/s!AvHYjMzLfKM5g8YmEFqoYs542t6Lwg?e=b1T3Pe
Sorry for the long delay and many many thanks for being on top of this issue.
Thanks for the coredumps @edrevo. I don't know what's wrong with them, but they seem to be missing information. For example, according to javacore.20200702.132859.6.0003.txt
2XHREGISTER RIP: 00007FC49BA89ADB
but in gdb:
(gdb) x/i 0x00007FC49BA89ADB
0x7fc49ba89adb: Cannot access memory at address 0x7fc49ba89adb
I also can't view the addresses listed out in the native memory stack:
3XMTHREADINFO3 Native callstack:
4XENATIVESTACK (0x00007FC49B422D32 [libj9prt29.so+0x49d32])
4XENATIVESTACK (0x00007FC49B3F4753 [libj9prt29.so+0x1b753])
4XENATIVESTACK (0x00007FC49B422DAE [libj9prt29.so+0x49dae])
4XENATIVESTACK (0x00007FC49B422EA4 [libj9prt29.so+0x49ea4])
4XENATIVESTACK (0x00007FC49B3F4753 [libj9prt29.so+0x1b753])
4XENATIVESTACK (0x00007FC49B422C0B [libj9prt29.so+0x49c0b])
...
I think the coredumps might be truncated; I'm guessing this based on the fact that according to the javacore:
1MEMUSER JRE: 1,563,106,856 bytes / 40491 allocations
but the coredump is only 300M and 500M respectively; given that the heap is ~300M I would've expected the coredump to be much bigger. Also the second javacore is truncated.
Probably worth verifying your ulimits (both host and container) as well as ensuring you have enough disk space for the coredumps.
Oh also, the output from -Xaot:traceRelocatableDataDetailsRT,traceRelocatableDataRT,log=/path/to/log,aotrtDebugLevel=30,rtLog=/path/to/rtLog (namely the log.* and rLog.* files generated) is likely going to be necessary (in conjunction with the coredump) to see why a stale class would cause a problem.
Try this one: https://1drv.ms/u/s!AvHYjMzLfKM5g8Yn9afn9uUVyidH3Q?e=7zEoYd
I realized that my previous link didn't have the aot logs. Here's one I got (from a different run). Tomorrow, if you need me to, I can get the rtLogs, coredump, aotlogs and all the other debug artifacts from a single run. Sorry about that.
I'll take a look at the corefile, but yeah having the rtlogs+coredump from the same process is going to be ideal because the J9Class (the C struct that represents a Java class in the JVM) pointers that get printed out in the rtlog can then be followed in the coredump.
Actually, it might also be valuable to have the SCC file; you can find that file in the cacheDir specified on the -Xshareclasses cmd line, though this is more of a "nice to have" than absolutely necessary (...I hope :P)
I believe it's crashing here: https://github.com/eclipse/openj9/blob/8c8ce1e50962a7e8f1678d678cd69561a1a7c8bb/runtime/vm/BytecodeInterpreter.hpp#L6776
I can't automatically get the backtrace, but the corefile does encapsulate the entire address space, so I'm at least able to manaully find information. The stackslot walker complains about a corrupt stack when performing the walk, but it is able to print out some useful info:
> !stackslots 0x1d48d00
<1d48d00> *** BEGIN STACK WALK, flags = 00400001 walkThread = 0x0000000001D48D00 ***
<1d48d00> ITERATE_O_SLOTS
<1d48d00> RECORD_BYTECODE_PC_OFFSET
<1d48d00> Initial values: walkSP = 0x00000000025EFE40, PC = 0x00007F525948E4B7, literals = 0x00000000027F12D0, A0 = 0x00000000025EFEE8, j2iFrame = 0x00000000025F0190, ELS = 0x00007F523FF7ABA0, decomp = 0x0000000000000000
<1d48d00> Bytecode frame: bp = 0x00000000025EFE68, sp = 0x00000000025EFE40, pc = 0x00007F525948E4B7, cp = 0x00000000027F0D00, arg0EA = 0x00000000025EFEE8, flags = 0x0000000000000000
<1d48d00> Method: akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3$$anon$4.onPush()V !j9method 0x00000000027F12D0
<1d48d00> Bytecode index = 163
<1d48d00> Using local mapper
<1d48d00> Locals starting at 0x00000000025EFEE8 for 0x0000000000000010 slots
<1d48d00> O-Slot: a0[0x00000000025EFEE8] = 0x00000000FFD60A10
<1d48d00> I-Slot: t1[0x00000000025EFEE0] = 0x00000000FFD72BD0
<1d48d00> O-Slot: t2[0x00000000025EFED8] = 0x00000000FFD72B90
<1d48d00> I-Slot: t3[0x00000000025EFED0] = 0x00000000FFD72BD0
<1d48d00> I-Slot: t4[0x00000000025EFEC8] = 0x00000000FFD72BC0
<1d48d00> I-Slot: t5[0x00000000025EFEC0] = 0x00000000F06013D0
<1d48d00> I-Slot: t6[0x00000000025EFEB8] = 0x00000000F06A6DD0
<1d48d00> O-Slot: t7[0x00000000025EFEB0] = 0x00000000F06013D0
<1d48d00> I-Slot: t8[0x00000000025EFEA8] = 0x00000000F06A6DD0
<1d48d00> O-Slot: t9[0x00000000025EFEA0] = 0x00000000FFD72BE0
<1d48d00> I-Slot: t10[0x00000000025EFE98] = 0x000000000000000A
<1d48d00> I-Slot: t11[0x00000000025EFE90] = 0x00000000FFD72B80
<1d48d00> I-Slot: t12[0x00000000025EFE88] = 0x00000000FFD72B40
<1d48d00> I-Slot: t13[0x00000000025EFE80] = 0x00000000F06013D0
<1d48d00> I-Slot: t14[0x00000000025EFE78] = 0x00000000025EFF10
<1d48d00> I-Slot: t15[0x00000000025EFE70] = 0x00007F51F86E805C
Jul 02, 2020 2:31:34 PM com.ibm.j9ddr.vm29.events.DefaultEventListener corruptData
WARNING: CorruptData encountered iterating o-slots. walkThread = 0x0000000001D48D00
com.ibm.j9ddr.CorruptDataException: Operand stack underflow in StackMap
...
Looking at the bytecodes for !j9method 0x00000000027F12D0:
> !bytecodes 0x27f12d0
...
156 getfield 7 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3$$anon$4.$outer Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;
159 aload0getfield
160 getfield 7 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3$$anon$4.$outer Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;
163 invokevirtual 31 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts()Lscala/collection/immutable/Queue;
...
we're crashing on 163 invokevirtual based on the Bytecode index = 163 & pc = 0x00007F525948E4B7 printed in the stackslot above.
From the javacore:
1XHREGISTERS Registers:
...
2XHREGISTER RBX: 00007F525948E4B7
...
2XHREGISTER RDX: 0000000000000008
...
2XHREGISTER R12: 00000000025EFE40
...
2XHREGISTER RIP: 00007F52BA8D0ADB
...
2XHREGISTER RSP: 00007F523FF7A650
...
from gdb & objdump:
(gdb) x/10i 0x7F52BA8D0AAF
0x7f52ba8d0aaf <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+32623>: mov QWORD PTR [rsp+0x140],rax
0x7f52ba8d0ab7 <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+32631>: mov rax,QWORD PTR [rsp+0x140]
0x7f52ba8d0abf <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+32639>: mov rdx,QWORD PTR [rsp+0x140]
0x7f52ba8d0ac7 <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+32647>: shr rax,0x8
0x7f52ba8d0acb <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+32651>: movzx edx,dl
0x7f52ba8d0ace <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+32654>: mov rdx,QWORD PTR [r12+rdx*8]
0x7f52ba8d0ad2 <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+32658>: test rdx,rdx
0x7f52ba8d0ad5 <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+32661>:
je 0x7f52ba8ddb97 <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+86103>
0x7f52ba8d0adb <_ZN22VM_BytecodeInterpreter3runEP10J9VMThread+32667>: mov ebx,DWORD PTR [rdx]
Where the last instruction is where the crash happens. The associated code in the interpreter is
UDATA volatile methodIndexAndArgCount = ramMethodRef->methodIndexAndArgCount;
UDATA methodIndex = methodIndexAndArgCount >> 8;
j9object_t receiver = ((j9object_t*)_sp)[methodIndexAndArgCount & 0xFF];
if (J9_UNEXPECTED(NULL == receiver)) {
...
} else {
J9Class *receiverClass = J9OBJECT_CLAZZ(_currentThread, receiver);
...
}
After cross-referencing with the core, javacore, and disassembly, it looks like r12 is _sp and rdx at the time of the crash is supposed to be an object reference. However, it ends up having the value 0x8 which is an invalid object reference, hence the crash.
Given that the stackwalker choked after 0x00000000025EFE70, I figured it was worth looking at the stack:
(gdb) x/20gx 0x25efe40
0x25efe40: 0x00000000ffd72be0 0x0000000000000008
0x25efe50: 0x00000000ffd60908 0x000000000113dfb8
0x25efe60: 0x00007f51f86e805c 0x00000000025eff10
0x25efe70: 0x00007f51f86e805c 0x00000000025eff10
0x25efe80: 0x00000000f06013d0 0x00000000ffd72b40
0x25efe90: 0x00000000ffd72b80 0x000000000000000a
0x25efea0: 0x00000000ffd72be0 0x00000000f06a6dd0
0x25efeb0: 0x00000000f06013d0 0x00000000f06a6dd0
0x25efec0: 0x00000000f06013d0 0x00000000ffd72bc0
0x25efed0: 0x00000000ffd72bd0 0x00000000ffd72b90
It looks like the stackwalker choked because 0x25efe60 and 0x25efe68 are exactly the same as 0x25efe70 & 0x25efe78, which I suppose is unexpected. I imagine this is probably relevant to the issue but I don't know right now.
I'll have to go deeper, maybe I'll chase after the ramMethodRef and see where that takes me.
From the code:
U_16 index = *(U_16*)(_pc + 1);
J9ConstantPool *ramConstantPool = J9_CP_FROM_METHOD(_literals);
J9RAMVirtualMethodRef *ramMethodRef = ((J9RAMVirtualMethodRef*)ramConstantPool) + index;
From the stackslots, _pc is 0x00007F525948E4B7. and the constant pool of the method at the top of the stack is 0x00000000027F0D00, so:
(gdb) x/hx 0x00007F525948E4B7+8
0x7f525948e4bf: 0x22b6
(gdb) x/2gx 0x00000000027F0D0+0x22b6
0x281386: 0x0000000000000000 0x0000000000000000
So it looks like ramMethodRef has NULL for both methodIndexAndArgCount and method. @DanHeidinga does this mean the methodref is unresolved, or does NULL mean uninitailized?
The method at the top of the stack is 0x00000000027F12D0 which is akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3$$anon$4.onPush()V. At bytecode 163, it's trying to call
akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts()Lscala/collection/immutable/Queue;
Searching for *openTimeouts()Lscala/collection/immutable/Queue; in the coredump yields only one method:
> !j9method 0x00000000025885b0
J9Method at 0x25885b0 {
Fields for J9Method:
0x0: U8* bytecodes = !j9x 0x00007F5259311A78
0x8: struct J9ConstantPool* constantPool = !j9constantpool 0x0000000002588150 (flags = 0x0)
0x10: void* methodRunAddress = !j9x 0x0000000000000018
0x18: void* extra = !j9x 0x00007F529E297900
}
Signature: akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts()Lscala/collection/immutable/Queue; !bytecodes 0x00000000025885B0
ROM Method: !j9rommethod 0x00007F5259311A64
Next Method: !j9method 0x00000000025885D0
Now, while openTimeouts() is compiled, what looks to be wrong here is the constant pool of the class of onPush() (which is interpreted), which is
> !j9class 0x00000000027F1100
...
Class name: akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3$$anon$4
So i guess now it's about why the constant pool entry has NULL in it.
From the code:
U_16 index = *(U_16*)(_pc + 1); J9ConstantPool *ramConstantPool = J9_CP_FROM_METHOD(_literals); J9RAMVirtualMethodRef *ramMethodRef = ((J9RAMVirtualMethodRef*)ramConstantPool) + index;From the stackslots,
_pcis0x00007F525948E4B7. and the constant pool of the method at the top of the stack is0x00000000027F0D00, so:(gdb) x/hx 0x00007F525948E4B7+8 0x7f525948e4bf: 0x22b6 (gdb) x/2gx 0x00000000027F0D0+0x22b6 0x281386: 0x0000000000000000 0x0000000000000000
Can you confirm that offset is correct? Given the line of code:
((J9RAMVirtualMethodRef*)ramConstantPool) + index;
The constantpool entry should be 0x00000000027F0D0 + (0x10 * 0x22b6) as each J9RAMVirtualMethodRef is two pointers worth of data.
Can you dump the memory at 0x2A1C30 instead (assuming I've done the math right)?
So it looks like
ramMethodRefhas NULL for bothmethodIndexAndArgCountandmethod. @DanHeidinga does this mean the methodref is unresolved, or does NULL mean uninitailized?
In general, any initialized class should have the argCount portion of the methodIndexAndArgCount initialized. The method slot being null would mean unresolved.
Ok, hopefully this time I got all the required info.
Dumps: https://1drv.ms/u/s!AvHYjMzLfKM5g8YqsiZtYY9rMVC6BQ?e=8WIxiz
Stdout log: logs.log
Can you confirm that offset is correct? Given the line of code:
((J9RAMVirtualMethodRef*)ramConstantPool) + index;The constantpool entry should be
0x00000000027F0D0 + (0x10 * 0x22b6)as eachJ9RAMVirtualMethodRefis two pointers worth of data.Can you dump the memory at
0x2A1C30instead (assuming I've done the math right)?
Ah yes, missed the pointer cast.
(gdb) x/2gx 0x00000000027F0D0+0x22b60
0x2a1c30: 0x0000000000000240 0x0000000000000248
Does that look more reasonable to you @DanHeidinga ? Seems like the struct J9Method *volatile method field doesn't look right.
Messed up the pointer being printed out in gdb, the constant pool should've been 0x00000000027F0D00 (missed an extra 0 at the end). This is what the J9RAMVirtualMethodRef looks like:
(gdb) x/2gx 0x00000000027F0D00+0x22b60
0x2813860: 0x0000000000017800 0x00007f52bac37f00
Because 0x0000000000017800 & 0xFF still yields 0x0,
j9object_t receiver = ((j9object_t*)_sp)[methodIndexAndArgCount & 0xFF];
will still cause the receiver to be 0x8.
How are you walking the java stack? Using !stackslots <J9VMThread>? For a crash in the interpreter, we're not guaranteed that the values in the J9VMThread are up to date. We only store them back when we might GC, etc. Most of the time they are live in registers / locals.
It's a pain but we need to look at the disassembly and map the vm state to the right registers (SP & PC) and the right locals
How are you walking the java stack? Using
!stackslots <J9VMThread>? For a crash in the interpreter, we're not guaranteed that the values in the J9VMThread are up to date. We only store them back when we might GC, etc. Most of the time they are live in registers / locals.
I was using the !stackslots cmd from jdmpview. However, given your comment I took a look at the assembly again assuming that we can't trust _pc and _sp. It turns out that I kind of got lucky; the actual pc is 0x00007F525948E4BC which is at bytecode 168, which is also an invokevirtual.
Also, the data printed out in https://github.com/eclipse/openj9/issues/9710#issuecomment-653595122 is wrong because the index is wrong, because in https://github.com/eclipse/openj9/issues/9710#issuecomment-653222047 I didn't realize that _pc is a reference to a U_8 *. So, to get the right J9RAMVirtualMethodRef (finally):
(gdb) x/hx 0x00007F525948E4BC+1
0x7f525948e4bd: 0x0021
(gdb) x/2gx 0x00000000027f0d00+0x210
0x27f0f10: 0x0000000000085001 0x00007f52bac37f00
This makes more sense; what I said above was incorrect, namely:
Because
0x0000000000017800 & 0xFFstill yields0x0,j9object_t receiver = ((j9object_t*)_sp)[methodIndexAndArgCount & 0xFF];will still cause the receiver to be
0x8.
methodIndexAndArgCount (0x0000000000085001 & 0xFF) has to be 0x1 in order to get the bad receiver value of 0x8. Also, 0x00007f52bac37f00 is definitely not a j9method.
bytecode 168 is
168 invokevirtual 33 scala/collection/immutable/Queue.enqueue(Ljava/lang/Object;)Lscala/collection/immutable/Queue;
searching for it:
> !methodforname scala/collection/immutable/Queue.enqueue*
Searching for methods named 'scala/collection/immutable/Queue.enqueue*' in VM=0x00007F52B4016A50...
!j9method 0x0000000000E259F0 --> scala/collection/immutable/Queue.enqueue(Ljava/lang/Object;)Lscala/collection/immutable/Queue;
!j9method 0x0000000000E25A10 --> scala/collection/immutable/Queue.enqueue(Lscala/collection/immutable/Iterable;)Lscala/collection/immutable/Queue;
Found 2 method(s) named scala/collection/immutable/Queue.enqueue*
> !j9method 0x0000000000E259F0
J9Method at 0xe259f0 {
Fields for J9Method:
0x0: U8* bytecodes = !j9x 0x00007F525948B16C
0x8: struct J9ConstantPool* constantPool = !j9constantpool 0x0000000000E24AA0 (flags = 0x0)
0x10: void* methodRunAddress = !j9x 0x0000000000000006
0x18: void* extra = !j9x 0x0000000000001753
}
Signature: scala/collection/immutable/Queue.enqueue(Ljava/lang/Object;)Lscala/collection/immutable/Queue; !bytecodes 0x0000000000E259F0
ROM Method: !j9rommethod 0x00007F525948B158
Next Method: !j9method 0x0000000000E25A10
0x0000000000E259F0 looks legitimate and there's only one method enqueue(Ljava/lang/Object;)Lscala/collection/immutable/Queue; so there aren't any overrides. So I guess the question is still why the J9RAMVirtualMethodRef in the constant pool of the top method isn't right.
Is 0x00007f52bac37f00 equal to J9JavaVM.initialMethods.initialVirtualMethod? I gave you bad info earlier when I said null meant unresolved. We initialize the method to point to an initialMethod that does the resolution so we don't have to check if it's resolved on every invoke
Heh, looks like it's an initialSpecialMethod:
> !j9initializermethods 0x00007F52B401B6B0
J9InitializerMethods at 0x7f52b401b6b0 {
Fields for J9InitializerMethods:
0x0: struct J9Method* initialStaticMethod = !j9method 0x00007F52BAC3CB20 // <FAULT>
0x8: struct J9Method* initialSpecialMethod = !j9method 0x00007F52BAC37F00 // <FAULT>
0x10: struct J9Method* initialVirtualMethod = !j9method 0x00007F52BAC37EE0 // <FAULT>
0x18: struct J9Method* invokePrivateMethod = !j9method 0x00007F52BAC37EC0 // <FAULT>
}
That makes sense. The cpEnty can be shared by an invokevirtual and an invokespecial or an invokestatic. So the layout is:
[vtableOffset | argCount]
[initialSpecialMethod or initialStaticMethod]
And we have a special slot in the vtable to hold the initialvirtualmethod so initialize the vtableOffset to point to that vtable slot. (I need to write this down in detail sometime)
So 0x0000000000085001 == [vtableOffset | argCount]. To get the method from the vtable, take vtableOffset, 0x0000000000085001 >> 8 and add it to the J9Class from the receiver.
So (UDATA)receiverClass + 0x850 will be slot with the resolved method
Ah ok I see, thanks @DanHeidinga. Therefore then the question is why the object ref is bad. From the bytecodes:
163 invokevirtual 31 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts()Lscala/collection/immutable/Queue;
166 aload 9
168 invokevirtual 33 scala/collection/immutable/Queue.enqueue(Ljava/lang/Object;)Lscala/collection/immutable/Queue;
the value returned by 163 is the receiver for the invokevirtual in 168. Looks like there's only one version of it:
> !methodforname *openTimeouts
Searching for methods named '*openTimeouts' in VM=0x00007F52B4016A50...
!j9method 0x00000000025885B0 --> akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts()Lscala/collection/immutable/Queue;
Found 1 method(s) named *openTimeouts
> !j9method 0x00000000025885b0
J9Method at 0x25885b0 {
Fields for J9Method:
0x0: U8* bytecodes = !j9x 0x00007F5259311A78
0x8: struct J9ConstantPool* constantPool = !j9constantpool 0x0000000002588150 (flags = 0x0)
0x10: void* methodRunAddress = !j9x 0x0000000000000018
0x18: void* extra = !j9x 0x00007F529E297900
}
Signature: akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts()Lscala/collection/immutable/Queue; !bytecodes 0x00000000025885B0
ROM Method: !j9rommethod 0x00007F5259311A64
Next Method: !j9method 0x00000000025885D0
This method is JIT'd and likely AOT'd (given that the JVM was run with -Xtune:virtualized). This method is likely the one that's causing the problem. I guess I shall continue this next week and see where it goes.
One possibly relevant detail though is that the J9RAMVirtualMethodRef for 163 is still unresolved:
163 invokevirtual 31 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts()Lscala/collection/immutable/Queue;
index = 31 = 0x1F
(gdb) x/2gx 0x00000000027f0d00+0x1F0
0x27f0ef0: 0x0000000000042800 0x00007f52bac37f00
Considering that we already executed 163, I would've thought that the methodref would've been resolved by the time of the crash.
One possibly relevant detail though is that the
J9RAMVirtualMethodReffor163is still unresolved:163 invokevirtual 31 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts()Lscala/collection/immutable/Queue;index = 31 = 0x1F
(gdb) x/2gx 0x00000000027f0d00+0x1F0 0x27f0ef0: 0x0000000000042800 0x00007f52bac37f00Considering that we already executed 163, I would've thought that the methodref would've been resolved by the time of the crash.
It's an invokevirtual so the first slot in the cp is [vtableOffset | argCount] == 0x0000000000042800 and vtableOffset == 0x428. So definitely resolved.
So definitely resolved.
Interesting; I was thinking it wasn't because the method is 0x00007f52bac37f00 which is
0x8: struct J9Method* initialSpecialMethod = !j9method 0x00007F52BAC37F00 // <FAULT>
Each bytecode has unique resolution status, even though there are only two slots that can be filled in. The whole picture involves splitting some slots during classload to resolve conflicts
The bytecodes of openTimeouts:
0 aload0getfield
1 getfield 8 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts Lscala/collection/immutable/Queue;
4 return1
The J9RAMFieldRef for the getfield 8:
(gdb) x/2gx 0x25881D0
0x25881d0: 0x000000000000003c 0xffffffffc8020002
The JIT'd code for openTimeouts:
0x7f529e297900 +0 488b442408 mov rax, qword ptr [rsp+8]
0x7f529e297905 +5 483b6550 cmp rsp, qword ptr [rbp+0x50] J9VMThread.stackOverflowMark
0x7f529e297909 +9 7611 jbe 0x7f529e29791c C>> +28
0x7f529e29790b +11 4883ec08 sub rsp, 8 <<< ^+38
0x7f529e29790f +15 4889442410 mov qword ptr [rsp+0x10], rax
0x7f529e297914 +20 8b403c mov eax, dword ptr [rax+0x3c]
0x7f529e297917 +23 4883c408 add rsp, 8
0x7f529e29791b +27 c3 ret
The JIT'd code looks right; I guess the object the field is being loaded from is bad.
Ok I take it back, the AOT code is incorrect. I thought because mov eax, dword ptr [rax+0x3c] is loading from offset 0x3c and the fieldref says the offset is 0x3c it was correct. However, an object does have some data before the field at "offset 0".
These are the bytecodes around the call to the AOT'd method & the invocation where we crash:
159 aload0getfield
160 getfield 7 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3$$anon$4.$outer Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;
163 invokevirtual 31 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts()Lscala/collection/immutable/Queue;
166 aload 9
168 invokevirtual 33 scala/collection/immutable/Queue.enqueue(Ljava/lang/Object;)Lscala/collection/immutable/Queue;
Regarding 159&160, we're loading fieldref 7 from the this pointer, which is arg0EA == 0x00000000FFD60A10:
> !fj9object 0x00000000FFD60A10
!J9Object 0x00000000FFD60A10 {
struct J9Class* clazz = !j9class 0x27F1100 // akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3$$anon$4
Object flags = 0x00000000;
I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3; $outer = !fj9object 0xffd60908 (offset = 4) (akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3$$anon$4)
}
Sanity check: the class matches the class of the j9method currently being executed.
So the object at the top of the stack before the call at 163 is 0xffd60908:
> !fj9object 0xffd60908
!J9Object 0x00000000FFD60908 {
struct J9Class* clazz = !j9class 0x2587D00 // akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3
Object flags = 0x00000000;
I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
I inCount = 0x00000002 (offset = 48) (akka/stream/stage/GraphStageLogic)
I outCount = 0x00000002 (offset = 52) (akka/stream/stage/GraphStageLogic)
I stageId = 0x00000008 (offset = 56) (akka/stream/stage/GraphStageLogic)
Lakka/stream/Attributes; attributes = !fj9object 0xffd608d8 (offset = 12) (akka/stream/stage/GraphStageLogic)
Lakka/stream/stage/GraphStageWithMaterializedValue; originalStage = !fj9object 0xf0558918 (offset = 16) (akka/stream/stage/GraphStageLogic)
[Ljava/lang/Object; handlers = !fj9object 0xffd60958 (offset = 20) (akka/stream/stage/GraphStageLogic)
[Lakka/stream/impl/fusing/GraphInterpreter$Connection; portToConn = !fj9object 0xffd60970 (offset = 24) (akka/stream/stage/GraphStageLogic)
Lakka/stream/impl/fusing/GraphInterpreter; _interpreter = !fj9object 0xff989158 (offset = 28) (akka/stream/stage/GraphStageLogic)
Ljava/lang/Throwable; lastCancellationCause = !fj9object 0x0 (offset = 32) (akka/stream/stage/GraphStageLogic)
Lscala/collection/immutable/List; callbacksWaitingForInterpreter = !fj9object 0xeddd8138 (offset = 36) (akka/stream/stage/GraphStageLogic)
Ljava/util/concurrent/atomic/AtomicReference; akka$stream$stage$GraphStageLogic$$asyncCallbacksInProgress = !fj9object 0xffd60988 (offset = 40) (akka/stream/stage/GraphStageLogic)
Lakka/stream/stage/GraphStageLogic$StageActor; _stageActor = !fj9object 0x0 (offset = 44) (akka/stream/stage/GraphStageLogic)
J asyncCleanupCounter = 0x0000000000000000 (offset = 4) (akka/stream/stage/GraphStageLogic)
Lscala/collection/immutable/Queue; akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts = !fj9object 0xffd60998 (offset = 60) (akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3)
Lakka/stream/stage/AsyncCallback; akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$callback = !fj9object 0xffd609b8 (offset = 64) (akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3)
Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport; $outer = !fj9object 0xf0558918 (offset = 68) (akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3)
}
According to bytecodes in https://github.com/eclipse/openj9/issues/9710#issuecomment-654969941, the field that the JIT'd code should return is at field offset 0x3c == 60 which is
Lscala/collection/immutable/Queue; akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts = !fj9object 0xffd60998 (offset = 60) (akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3)
However, from the stackslots from way above, the object on the java stack that gets passed to the call at 168 (ie the receiver) is 0x00000008. If you look at 0xffd60908 carefully, you'll see that theres this field:
I stageId = 0x00000008 (offset = 56) (akka/stream/stage/GraphStageLogic)
which is what you get if you just do 0xffd60908+0x3c. The field load is off by 4 bytes.
I believe I've reached the limit of what I can do with this corefile. I shall have to take a look at the second core @edrevo provided, since I can take a look at the rtlog and see how we passed validation. It's obvious the shape of the class is different from what the AOT'd method expects, so now will have to determine what validation we missed.
I took a look latest set of cores.
bytecodes of the method that was AOT'd:
> !bytecodes 0x000000000272deb0
Name: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts
Signature: ()Lscala/collection/immutable/Queue;
Access Flags (3050201): public
Max Stack: 1
Argument Count: 1
Temp Count: 0
0 aload0getfield
1 getfield 8 akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3.akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts Lscala/collection/immutable/Queue;
4 return1
...
When I run printAllStats on the SCC @edrevo provided:
...
1: 0x00007FF040D26290 ROMCLASS: akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3 at 0x00007FF02531D538.!STALE!
Index 36 in classpath 0x00007FF041591C04
ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts Signature: ()Lscala/collection/immutable/Queue; Address: 0x00007FF02531D96C
ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts_$eq Signature: (Lscala/collection/immutable/Queue;)V Address: 0x00007FF02531D990
ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$callback Signature: ()Lakka/stream/stage/AsyncCallback; Address: 0x00007FF02531D9BC
ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$$outer Signature: ()Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport; Address: 0x00007FF02531D9E0
ROMMETHOD: $anonfun$callback$2 Signature: (Lakka/http/scaladsl/TimeoutAccess;Lakka/http/impl/engine/server/HttpServerBluePrint$TimeoutAccessImpl;)Z Address: 0x00007FF02531DA00
ROMMETHOD: $anonfun$callback$3 Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;)V Address: 0x00007FF02531DA40
ROMMETHOD: $anonfun$callback$1 Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;Lscala/Tuple2;)V Address: 0x00007FF02531DA68
ROMMETHOD: <init> Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport;)V Address: 0x00007FF02531DB2C
ROMMETHOD: $anonfun$callback$2$adapted Signature: (Lakka/http/scaladsl/TimeoutAccess;Lakka/http/impl/engine/server/HttpServerBluePrint$TimeoutAccessImpl;)Ljava/lang/Object; Address: 0x00007FF02531DBC4
ROMMETHOD: $anonfun$callback$1$adapted Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;Lscala/Tuple2;)Ljava/lang/Object; Address: 0x00007FF02531DBF8
ROMMETHOD: $deserializeLambda$ Signature: (Ljava/lang/invoke/SerializedLambda;)Ljava/lang/Object; Address: 0x00007FF02531DC2C
...
7: 0x00007FF03E6F6A14 ROMCLASS: akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3 at 0x00007FF02531D538.!STALE!
Index 79 in classpath 0x00007FF03E707088
ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts Signature: ()Lscala/collection/immutable/Queue; Address: 0x00007FF02531D96C
ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts_$eq Signature: (Lscala/collection/immutable/Queue;)V Address: 0x00007FF02531D990
ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$callback Signature: ()Lakka/stream/stage/AsyncCallback; Address: 0x00007FF02531D9BC
ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$$outer Signature: ()Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport; Address: 0x00007FF02531D9E0
ROMMETHOD: $anonfun$callback$2 Signature: (Lakka/http/scaladsl/TimeoutAccess;Lakka/http/impl/engine/server/HttpServerBluePrint$TimeoutAccessImpl;)Z Address: 0x00007FF02531DA00
ROMMETHOD: $anonfun$callback$3 Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;)V Address: 0x00007FF02531DA40
ROMMETHOD: $anonfun$callback$1 Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;Lscala/Tuple2;)V Address: 0x00007FF02531DA68
ROMMETHOD: <init> Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport;)V Address: 0x00007FF02531DB2C
ROMMETHOD: $anonfun$callback$2$adapted Signature: (Lakka/http/scaladsl/TimeoutAccess;Lakka/http/impl/engine/server/HttpServerBluePrint$TimeoutAccessImpl;)Ljava/lang/Object; Address: 0x00007FF02531DBC4
ROMMETHOD: $anonfun$callback$1$adapted Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;Lscala/Tuple2;)Ljava/lang/Object; Address: 0x00007FF02531DBF8
ROMMETHOD: $deserializeLambda$ Signature: (Ljava/lang/invoke/SerializedLambda;)Ljava/lang/Object; Address: 0x00007FF02531DC2C
...
1: 0x00007FF03C061484 AOT: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts Signature: ()Lscala/collection/immutable/Queue; Address: 0x00007FF02531D96C
for ROMClass akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3 at 0x00007FF02531D538.
...
8: 0x00007FF03AFD229C ROMCLASS: akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3 at 0x00007FF02531D538.
Index 68 in classpath 0x00007FF03B1EFA28
ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts Signature: ()Lscala/collection/immutable/Queue; Address: 0x00007FF02531D96C
ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts_$eq Signature: (Lscala/collection/immutable/Queue;)V Address: 0x00007FF02531D990
ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$callback Signature: ()Lakka/stream/stage/AsyncCallback; Address: 0x00007FF02531D9BC
ROMMETHOD: akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$$outer Signature: ()Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport; Address: 0x00007FF02531D9E0
ROMMETHOD: $anonfun$callback$2 Signature: (Lakka/http/scaladsl/TimeoutAccess;Lakka/http/impl/engine/server/HttpServerBluePrint$TimeoutAccessImpl;)Z Address: 0x00007FF02531DA00
ROMMETHOD: $anonfun$callback$3 Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;)V Address: 0x00007FF02531DA40
ROMMETHOD: $anonfun$callback$1 Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;Lscala/Tuple2;)V Address: 0x00007FF02531DA68
ROMMETHOD: <init> Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport;)V Address: 0x00007FF02531DB2C
ROMMETHOD: $anonfun$callback$2$adapted Signature: (Lakka/http/scaladsl/TimeoutAccess;Lakka/http/impl/engine/server/HttpServerBluePrint$TimeoutAccessImpl;)Ljava/lang/Object; Address: 0x00007FF02531DBC4
ROMMETHOD: $anonfun$callback$1$adapted Signature: (Lakka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3;Lscala/Tuple2;)Ljava/lang/Object; Address: 0x00007FF02531DBF8
ROMMETHOD: $deserializeLambda$ Signature: (Ljava/lang/invoke/SerializedLambda;)Ljava/lang/Object; Address: 0x00007FF02531DC2C
...
The ROMMethod of akka$http$impl$engine$server$HttpServerBluePrint$RequestTimeoutSupport$$anon$$openTimeouts Signature: ()Lscala/collection/immutable/Queue; always has the same address. I would have thought that if a class in the SCC became stale, the new class would have a new ROMClass address (and therefore all of its methods would have new J9ROMMethod addresses). How come the J9ROMMethod address stays the same? @hangshao0 or @pshipton do you guys know?
The other point of concern is that the problem here is that we're loading from the constant pool of the receiver (i.e. the class of the AOT'd method). I was under the impression that we validate the class of the AOT'd method being loaded, but it doesn't look like we do; I think the assumption has always been that if vmThread->javaVM->sharedClassConfig->findCompiledMethodEx1 does not return NULL, then the shape of the ROMClass of the AOT'd method being loaded hasn't changed. I guess depending on the answer to the previous question, we may have to store the class chain of the class of the AOT'd method. However, to me it seems more intuitive that all AOT bodies of a class should be marked as stale if the class is marked as stale.
A class becomes stale when the classpath changes. This forces the class to be loaded from the classpath (jar) again (which may result in storing new metadata into the cache for the new classpath, if there is room), but if the ROM class loaded is the same as the one already stored in the shared cache, the same ROM class in the cache continues to be used. Classpath changes can change the RAM class, while continuing to use the same ROM class.
Hm, in this case, once the class is stale, the AOT code ends up being off by 4 bytes when trying to load the field from an object. That sounds like the layout of the romclass changed doesn't it?
Could one of the superclasses have affected that? The ROM class only records the name of the superclass, and it's own fields, it doesn't control the final layout of the Object. The layout is controlled by the RAM class hierarchy. i.e. perhaps one of the superclass ROM classes changed.
Yeah you're right:
> !j9class 0x000000000272D600
J9Class at 0x272d600 {
Fields for J9Class:
0x0: UDATA eyecatcher = 0x0000000099669966 (2573637990)
0x8: struct J9ROMClass* romClass = !j9romclass 0x00007F18A131D538
0x10: void** superclasses = !j9x 0x000000000272D290
0x18: UDATA classDepthAndFlags = 0x00000000020E0002 (34471938)
...
Class name: akka/http/impl/engine/server/HttpServerBluePrint$RequestTimeoutSupport$$anon$3
(gdb) x/2gx 0x000000000272D290
0x272d290: 0x000000000003fd00 0x00000000010b1e00
md5-6a5eda82f9f20f8774a3c601a5965d0c
> !j9class 0x00000000010b1e00
J9Class at 0x10b1e00 {
Fields for J9Class:
0x0: UDATA eyecatcher = 0x0000000099669966 (2573637990)
0x8: struct J9ROMClass* romClass = !j9romclass 0x00007F18A40B9A68
...
Class name: akka/stream/stage/GraphStageLogic
md5-6a5eda82f9f20f8774a3c601a5965d0c
> !shrc findclass akka/stream/stage/GraphStageLogic
!j9sharedclassconfig 0x00007F18FC183A90
Looking for class "akka/stream/stage/GraphStageLogic"
Meta data region to be used: 0x00007F18B6E74D54..0x00007F18BD711FFC
1: 0x00007F18BCD28C90 ORPHAN: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A12C4210
1: 0x00007F18BCD28C64 ROMCLASS: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A12C4210 !STALE!
Index 117 in !shrc classpath 0x00007F18BD591C04
2: 0x00007F18BCB4F90C ROMCLASS: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A12C4210 !STALE!
Index 157 in !shrc classpath 0x00007F18BCB81C40
3: 0x00007F18BC87638C ROMCLASS: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A12C4210 !STALE!
Index 90 in !shrc classpath 0x00007F18BC910D5C
4: 0x00007F18BC39F940 ROMCLASS: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A12C4210 !STALE!
Index 157 in !shrc classpath 0x00007F18BC3A0FBC
5: 0x00007F18BC279F04 ROMCLASS: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A12C4210 !STALE!
Index 240 in !shrc classpath 0x00007F18BC2F25BC
6: 0x00007F18BB06451C ROMCLASS: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A12C4210 !STALE!
Index 161 in !shrc classpath 0x00007F18BB0652C4
7: 0x00007F18BA75F6C4 ROMCLASS: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A12C4210 !STALE!
Index 177 in !shrc classpath 0x00007F18BA7A4A4C
8: 0x00007F18B7139EF8 ORPHAN: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A40B9A68
8: 0x00007F18B7139ECC ROMCLASS: akka/stream/stage/GraphStageLogic at !j9romclass 0x00007F18A40B9A68
Index 188 in !shrc classpath 0x00007F18B71EFA28
Opened https://github.com/eclipse/openj9/pull/10159 to fix this.
Thanks @edrevo for all the diagnostic info. Also thanks @DanHeidinga & @pshipton for the VM help.
Many thanks for jumping on the issue and fixing it! Do you know which version of Openj9 will contain the fix?
It's missed the 0.21.0 release for July, the next release for Java 11 is the 0.23.0 release in Oct.
Once the fix is merged, it will be in the nightly builds at https://adoptopenjdk.net/nightly.html?variant=openjdk11&jvmVariant=openj9