openjdk version "15.0.1" 2020-10-20
OpenJDK Runtime Environment AdoptOpenJDK (build 15.0.1+9)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.23.0, JRE 15 Linux amd64-64-Bit Compressed References 20201022_81 (JIT enabled, AOT enabled)
OpenJ9 - 0394ef754
OMR - 582366ae5
JCL - ad583de3b5 based on jdk-15.0.1+9)
I tried to use the same class cache using across different jenkins builds. Unfortunately, it crashed the JVM. Without using the class cache the build works.
I can send an email of the core dump or via Slack.
[javac] warning: [options] bootstrap class path not set in conjunction with -source 7
[javac] warning: [options] source value 7 is obsolete and will be removed in a future release
[javac] warning: [options] target value 7 is obsolete and will be removed in a future release
[javac] warning: [options] To suppress warnings about obsolete options, use -Xlint:-options.
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000002
Handler1=00007F6DA31E6800 Handler2=00007F6DA2ACAF90 InaccessibleAddress=0000000099669AC0
RDI=00007F6D7C45F9D3 RSI=00007F6D7C45F9D8 RAX=0000000000000050 RBX=0000000000480300
RCX=0000000099669900 RDX=00007F6D7C45F9D8 R8=00000000FFA7B030 R9=00000000FFA7B030
R10=0000000099669900 R11=00000000FFA7B030 R12=0000000089D49498 R13=00000000FFA7B050
R14=0000000089D42208 R15=0000000000000011
RIP=00007F6DA13224CA GS=0000 FS=0000 RSP=00000000001098F0
EFlags=0000000000010207 CS=0033 RBP=0000000000014E00 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000099669AC0
xmm0 0000000000000000 (f: 0,000000, d: 0,000000e+00)
xmm1 0000000048f696c0 (f: 1224120064,000000, d: 6,047956e-315)
xmm2 00007f6da4127690 (f: 2752673536,000000, d: 6,922298e-310)
xmm3 00007f6da4127690 (f: 2752673536,000000, d: 6,922298e-310)
xmm4 43e0000000000000 (f: 0,000000, d: 9,223372e+18)
xmm5 000000003dd34d0c (f: 1037257984,000000, d: 5,124735e-315)
xmm6 0000000049a54880 (f: 1235568768,000000, d: 6,104521e-315)
xmm7 000000003bd80000 (f: 1004011520,000000, d: 4,960476e-315)
xmm8 0000000000000000 (f: 0,000000, d: 0,000000e+00)
xmm9 000000003e17cee7 (f: 1041747712,000000, d: 5,146917e-315)
xmm10 3f847ae147ae147b (f: 1202590848,000000, d: 1,000000e-02)
xmm11 3fefae147ae147ae (f: 2061584256,000000, d: 9,900000e-01)
xmm12 3ff0000000000000 (f: 0,000000, d: 1,000000e+00)
xmm13 410075c000000000 (f: 0,000000, d: 1,348400e+05)
xmm14 000000004984f158 (f: 1233449344,000000, d: 6,094049e-315)
xmm15 0000000047d4b560 (f: 1205122432,000000, d: 5,954096e-315)
Module=/opt/jdk-15/lib/compressedrefs/libj9jit29.so
Module_base_address=00007F6DA0FC6000
Target=2_90_20201022_81 (Linux 3.10.0-1127.19.1.el7.x86_64)
CPU=amd64 (2 logical CPUs) (0x1d8e2a000 RAM)
----------- Stack Backtrace -----------
(0x00007F6DA2AFDFD2 [libj9prt29.so+0x57fd2])
(0x00007F6DA2ACBCF3 [libj9prt29.so+0x25cf3])
(0x00007F6DA2AFE04E [libj9prt29.so+0x5804e])
(0x00007F6DA2AFE144 [libj9prt29.so+0x58144])
(0x00007F6DA2ACBCF3 [libj9prt29.so+0x25cf3])
(0x00007F6DA2AFDEAB [libj9prt29.so+0x57eab])
(0x00007F6DA31E6344 [libj9vm29.so+0x96344])
(0x00007F6DA2ACBCF3 [libj9prt29.so+0x25cf3])
(0x00007F6DA31E65B5 [libj9vm29.so+0x965b5])
(0x00007F6DA3213D65 [libj9vm29.so+0xc3d65])
--------------------------------------
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at com/sun/tools/javac/parser/JavacParser.parseCompilationUnit(JavacParser.java:3521)
4XESTACKTRACE at com/sun/tools/javac/main/JavaCompiler.parse(JavaCompiler.java:639)
4XESTACKTRACE at com/sun/tools/javac/main/JavaCompiler.parse(JavaCompiler.java:676)
4XESTACKTRACE at com/sun/tools/javac/main/JavaCompiler.parseFiles(JavaCompiler.java:1033)
4XESTACKTRACE at com/sun/tools/javac/main/JavaCompiler.parseFiles(JavaCompiler.java:1020)
4XESTACKTRACE at com/sun/tools/javac/main/JavaCompiler.compile(JavaCompiler.java:938)
4XESTACKTRACE at com/sun/tools/javac/main/Main.compile(Main.java:317)
4XESTACKTRACE at com/sun/tools/javac/main/Main.compile(Main.java:176)
4XESTACKTRACE at com/sun/tools/javac/Main.compile(Main.java:59)
I tried to use the same class cache using across different builds. Unfortunately, it crashed the JVM.
Sounds like an AOT issue. The build is quite recent so it should contain the CPU detection AOT changes. Subscribing @harryyu1994 and @dsouzai for next steps on how to investigate this one.
Note it refers to different jenkins jobs, not JVM builds.
From Slack: one build for project X, another build for project Y. Note that both project X and Y are very similar, mainly the changes are our source code, and the build basically invokes the Javac
InaccessibleAddress=0000000099669AC0
This is a bit suspect since the J9Class eyecatcher is 0x99669966. This is likely somehow related to the problem. I think we'll need to corresponding system core dump to load under gdb and figure out where exactly this crashed.
Could you upload the diagnostic files (ie the coredump, and javacore)? I could take a look and then could suggest other diagnostic options depending on what the coredump looks like.
Note the core file should be compressed first. They can be quite large, but usually compress quite a bit. It can also be shared privately if desired.
@fjeremic @dsouzai I've shared the core coredump.zip in a Box folder "Issue 11179", you should both have an invitation.
Sorry, it wasn't quite finished uploading, but it has finished now.
If the shared class cache was not built inside Docker then Portable Shared Cache is not enabled by default.
So the chances that this is related to Portable Shared Cache is small.
It's not using Docker. I'm using OPENJ9_JAVA_OPTIONS="-Xshareclasses:name=builds -Xscmx2g"
If possible, could you please try ./java -Xshareclasses:name=builds,printStats, it should give us the processor feature information.
If possible, could you please try ./java -Xshareclasses:name=builds,printStats, it should give us the processor feature information.
Seems this data should be added to the shareclasses section in the javacore.
If possible, could you please try ./java -Xshareclasses:name=builds,printStats, it should give us the processor feature information.
Seems this data should be added to the shareclasses section in the javacore.
Okay I created an issue for this, https://github.com/eclipse/openj9/issues/11184. I will address it when I get the chance.
Processor Features = fpu cx8 cmov mmx sse
sse2 ssse3 fma sse4_1
popcnt aesni osxsave
avx
I did not see anything suspicious about processor detection. And we can also rule out portable shared cache since this is not the set of features used by -XX:+PortableSharedCache.
The stack trace was a bit messed up. So manually looking at the stack:
> !jitstack 0x0000000000014E00,0x8c82b8,0x00007fa29edec240
<14e00> *** BEGIN STACK WALK, flags = 01000001 walkThread = 0x0000000000014E00 ***
<14e00> START_AT_JIT_FRAME
<14e00> RECORD_BYTECODE_PC_OFFSET
<14e00> Initial values: walkSP = 0x00000000008C82B8, PC = 0x00007FA29EDEC240, literals = 0x0000000000000000, A0 = 0x0000000000000000, j2iFrame = 0x00000000008C8A08, ELS = 0x00007FA2C4E80840, decomp = 0x0000000000000000
<14e00> JIT inline frame: bp = 0x00000000008C8350, pc = 0x00007FA29EDEC240, unwindSP = 0x00000000008C82B8, cp = 0x00000000005D15F0, arg0EA = 0x0000000000000000, jitInfo = 0x00007FA1F3396338
<14e00> Method: com/sun/tools/javac/parser/JavacParser$SimpleEndPosTable.storeEnd(Lcom/sun/tools/javac/tree/JCTree;I)V !j9method 0x00000000005D1778
<14e00> JIT inline frame: bp = 0x00000000008C8350, pc = 0x00007FA29EDEC240, unwindSP = 0x00000000008C82B8, cp = 0x00000000005CD8B0, arg0EA = 0x0000000000000000, jitInfo = 0x00007FA1F3396338
<14e00> Method: com/sun/tools/javac/parser/JavacParser.storeEnd(Lcom/sun/tools/javac/tree/JCTree;I)V !j9method 0x00000000005C2DB8
<14e00> JIT frame: bp = 0x00000000008C8350, pc = 0x00007FA29EDEC240, unwindSP = 0x00000000008C82B8, cp = 0x00000000005CD8B0, arg0EA = 0x00000000008C8360, jitInfo = 0x00007FA1F3396338
<14e00> Method: com/sun/tools/javac/parser/JavacParser.modifiersOpt(Lcom/sun/tools/javac/tree/JCTree$JCModifiers;)Lcom/sun/tools/javac/tree/JCTree$JCModifiers; !j9method 0x00000000005C3838
<14e00> JIT frame: bp = 0x00000000008C83F0, pc = 0x00007FA29EDEB0AA, unwindSP = 0x00000000008C8358, cp = 0x00000000005CD8B0, arg0EA = 0x00000000008C8400, jitInfo = 0x00007FA1F3396338
<14e00> Method: com/sun/tools/javac/parser/JavacParser.modifiersOpt(Lcom/sun/tools/javac/tree/JCTree$JCModifiers;)Lcom/sun/tools/javac/tree/JCTree$JCModifiers; !j9method 0x00000000005C3838
Nov. 13, 2020 1:45:38 P.M. com.ibm.j9ddr.vm29.events.DefaultEventListener corruptData
WARNING: CorruptDataException thrown walking stack. walkThread = 0x0000000000014E00
com.ibm.j9ddr.AddressedCorruptDataException: Invalid JIT return address
The top most method that is not inlined is com/sun/tools/javac/parser/JavacParser.modifiersOpt(Lcom/sun/tools/javac/tree/JCTree$JCModifiers;)Lcom/sun/tools/javac/tree/JCTree$JCModifiers
(kca) x/10i 0x00007fa29edec240-25
0x7fa29edec227 {com/.../JavacParser.modifiersOpt} +4487 90 nop
0x7fa29edec228 {com/.../JavacParser.modifiersOpt} +4488 48bf0099669900000000 movabs rdi, 0x99669900
0x7fa29edec232 {com/.../JavacParser.modifiersOpt} +4498 483bf9 cmp rdi, rcx
0x7fa29edec235 {com/.../JavacParser.modifiersOpt} +4501 0f8595150000 jne 0x7fa29eded7d0 C>> +10032
0x7fa29edec23b {com/.../JavacParser.modifiersOpt} +4507 e85002d318 call 0x7fa2b7b1c490
the call to 0x7fa2b7b1c490 is:
(gdb) x/30i 0x7fa2b7b1c490
0x7fa2b7b1c490 <populateVPicSlotCall>: push rdi
0x7fa2b7b1c491 <populateVPicSlotCall+1>: push rax
0x7fa2b7b1c492 <populateVPicSlotCall+2>: push rsi
0x7fa2b7b1c493 <populateVPicSlotCall+3>: push rdx
0x7fa2b7b1c494 <populateVPicSlotCall+4>: push rcx
0x7fa2b7b1c495 <populateVPicSlotCall+5>: mov rdx,QWORD PTR [rsp+0x28]
0x7fa2b7b1c49a <populateVPicSlotCall+10>: cmp BYTE PTR [rdx-0xa],0x85
0x7fa2b7b1c49e <populateVPicSlotCall+14>: je 0x7fa2b7b1c56e <populateLastVPicSlot>
0x7fa2b7b1c4a4 <populateVPicSlotCall+20>: movzx rdi,BYTE PTR [rdx+0x1]
0x7fa2b7b1c4a9 <populateVPicSlotCall+25>: lea rdi,[rdi+rdx*1-0x3]
0x7fa2b7b1c4ae <mergeVPicSlotCall>: movsxd rsi,DWORD PTR [rdi-0x4]
0x7fa2b7b1c4b2 <mergeVPicSlotCall+4>: mov ecx,DWORD PTR [rax]
0x7fa2b7b1c4b4 <mergeVPicSlotCall+6>: and ecx,0xffffff00
0x7fa2b7b1c4ba <mergeVPicSlotCall+12>: lea rax,[rdi+rsi*1-0x20]
0x7fa2b7b1c4bf <mergeVPicSlotCall+17>: mov rsi,rdx
0x7fa2b7b1c4c2 <mergeVPicSlotCall+20>: call 0x7fa2b812de90 <jitResolveVirtualMethod>
0x7fa2b7b1c4c7 <mergeVPicSlotCall+25>: neg rax
0x7fa2b7b1c4ca <mergeVPicSlotCall+28>: mov rax,QWORD PTR [rcx+rax*1+0x170] <== crashing instruction
According to the javacore, RIP: 00007F6DA13224CA; however this is an inaccessible address for some reason... Thankfully an internal debugger was able to get the register context.
It seems like the problem is
0x7fa29edec228 {com/.../JavacParser.modifiersOpt} +4488 48bf0099669900000000 movabs rdi, 0x99669900
0x99669900 is not a class pointer, it looks like the eyecatcher of a j9class with the low bits masked out. Since this is hardcoded into the code, it is either missing a relocation or the relocation was applied incorrectly.
@Thihup I'm guessing you havent' deleted the SCC right? Could you run with -Xaot:aotrtDebugLevel=30,rtLog=<path to log dir>/rtLog -XcompilationThreads1. You'd have to upload the generated rtLog file(s) as well as the new coredump and javacore.
Thanks!
dumps.zip is uploaded to the folder.
So same situation:
1XHREGISTERS Registers:
...
2XHREGISTER RAX: 0000000000000050
...
2XHREGISTER RCX: 0000000099669900
...
2XHREGISTER RIP: 00007F8A3CFF74CA
> !jitstackslots 0x00014e00,0x8e40c8,0x00007f8a2515c430
<14e00> *** BEGIN STACK WALK, flags = 01400001 walkThread = 0x0000000000014E00 ***
<14e00> START_AT_JIT_FRAME
<14e00> ITERATE_O_SLOTS
<14e00> RECORD_BYTECODE_PC_OFFSET
<14e00> Initial values: walkSP = 0x00000000008E40C8, PC = 0x00007F8A2515C430, literals = 0x0000000000000000, A0 = 0x0000000000000000, j2iFrame = 0x00000000008E4690, ELS = 0x00007F8A4B03E840, decomp = 0x0000000000000000
<14e00> JIT inline frame: bp = 0x00000000008E4150, pc = 0x00007F8A2515C430, unwindSP = 0x00000000008E40C8, cp = 0x00000000005B9BF0, arg0EA = 0x0000000000000000, jitInfo = 0x00007F898CDE53B8
<14e00> Method: com/sun/tools/javac/parser/JavacParser$SimpleEndPosTable.storeEnd(Lcom/sun/tools/javac/tree/JCTree;I)V !j9method 0x00000000005B9D78
<14e00> Bytecode index = 26, inlineDepth = 2, PC offset = 0x00007F8A2515C407
<14e00> JIT inline frame: bp = 0x00000000008E4150, pc = 0x00007F8A2515C430, unwindSP = 0x00000000008E40C8, cp = 0x00000000005ABA90, arg0EA = 0x0000000000000000, jitInfo = 0x00007F898CDE53B8
<14e00> Method: com/sun/tools/javac/parser/JavacParser.storeEnd(Lcom/sun/tools/javac/tree/JCTree;I)V !j9method 0x00000000005B7598
<14e00> Bytecode index = 6, inlineDepth = 1, PC offset = 0x0000000000000F88
<14e00> JIT frame: bp = 0x00000000008E4150, pc = 0x00007F8A2515C430, unwindSP = 0x00000000008E40C8, cp = 0x00000000005ABA90, arg0EA = 0x00000000008E4168, jitInfo = 0x00007F898CDE53B8
<14e00> Method: com/sun/tools/javac/parser/JavacParser.literal(Lcom/sun/tools/javac/util/Name;I)Lcom/sun/tools/javac/tree/JCTree$JCExpression; !j9method 0x00000000005B76B8
md5-4966dfa72d5c285c6e72950569bcb846
0x7f8a2515c418 {com/.../JavacParser.literal} +1456 ~ 49bb0099669900000000 movabs r11, 0x99669900
0x7f8a2515c422 {com/.../JavacParser.literal} +1466 ~ 4c3bd9 cmp r11, rcx
0x7f8a2515c425 {com/.../JavacParser.literal} +1469 ~ 0f8565270000 jne 0x7f8a2515eb90 C>> +11560
0x7f8a2515c42b {com/.../JavacParser.literal} +1475 ~ e860b0e917 call 0x7f8a3cff7490
md5-4966dfa72d5c285c6e72950569bcb846
(gdb) x/30i 0x7f8a3cff7490
0x7f8a3cff7490 <populateVPicSlotCall>: push rdi
0x7f8a3cff7491 <populateVPicSlotCall+1>: push rax
0x7f8a3cff7492 <populateVPicSlotCall+2>: push rsi
0x7f8a3cff7493 <populateVPicSlotCall+3>: push rdx
0x7f8a3cff7494 <populateVPicSlotCall+4>: push rcx
0x7f8a3cff7495 <populateVPicSlotCall+5>: mov rdx,QWORD PTR [rsp+0x28]
0x7f8a3cff749a <populateVPicSlotCall+10>: cmp BYTE PTR [rdx-0xa],0x85
0x7f8a3cff749e <populateVPicSlotCall+14>: je 0x7f8a3cff756e <populateLastVPicSlot>
0x7f8a3cff74a4 <populateVPicSlotCall+20>: movzx rdi,BYTE PTR [rdx+0x1]
0x7f8a3cff74a9 <populateVPicSlotCall+25>: lea rdi,[rdi+rdx*1-0x3]
0x7f8a3cff74ae <mergeVPicSlotCall>: movsxd rsi,DWORD PTR [rdi-0x4]
0x7f8a3cff74b2 <mergeVPicSlotCall+4>: mov ecx,DWORD PTR [rax]
0x7f8a3cff74b4 <mergeVPicSlotCall+6>: and ecx,0xffffff00
0x7f8a3cff74ba <mergeVPicSlotCall+12>: lea rax,[rdi+rsi*1-0x20]
0x7f8a3cff74bf <mergeVPicSlotCall+17>: mov rsi,rdx
0x7f8a3cff74c2 <mergeVPicSlotCall+20>: call 0x7f8a3d608e90 <jitResolveVirtualMethod>
0x7f8a3cff74c7 <mergeVPicSlotCall+25>: neg rax
0x7f8a3cff74ca <mergeVPicSlotCall+28>: mov rax,QWORD PTR [rcx+rax*1+0x170] <== crashing instruction
However, from the rtLog, it looks like we failed during relocation of com/sun/tools/javac/parser/JavacParser.literal(Lcom/sun/tools/javac/util/Name;I)Lcom/sun/tools/javac/tree/JCTree$JCExpression; !j9method 0x00000000005B76B8. So that means that the crashing method is a JIT'd method - when we fail a relocation, we retry it as a JIT compilation. I guess I had just assumed that it was an AOT method 🤷♂️ .
Will have to hand it over to someone on the x86 codegen. @andrewcraik, looks like we're hardcoding 0x99669900 as the class pointer in the VPIC slot. I'm not exactly sure why this doesn't happen without the SCC; could be a timing related issue. I don't think bad AOT code somewhere else could result in us hardcoding a bad class pointer into the code.
Maybe related? https://github.com/eclipse/openj9/issues/10126
@liqunl could you please take a look?
Marius commented in Slack "It's possible that the crash with shareclasses (SCC) is due to AOT in which case you could continue to use the SCC, but just for storing classes. To achieve this, leave the SCC options as is and add -Xnoaot to the command line.
Thihup: "Running with -Xnoaot I got"
Assertion failed at ../omr/compiler/codegen/OMRTreeEvaluator.cpp:591: false
VMState: 0x0005ff04
badILOp BadILOp cannot be evaluated
compiling java/io/BufferedReader.read([CII)I at level: very-hot (profiling)
#0: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x8aa345) [0x7f0a21545345]
#1: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x8b4d90) [0x7f0a2154fd90]
#2: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x5718ce) [0x7f0a2120c8ce]
#3: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x57307d) [0x7f0a2120e07d]
#4: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x5040a7) [0x7f0a2119f0a7]
#5: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x4f325f) [0x7f0a2118e25f]
#6: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x13381e) [0x7f0a20dce81e]
#7: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x500f72) [0x7f0a2119bf72]
#8: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x4ff8a2) [0x7f0a2119a8a2]
#9: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x4fcee3) [0x7f0a21197ee3]
#10: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x5125a9) [0x7f0a211ad5a9]
#11: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x176504) [0x7f0a20e11504]
#12: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x1774ba) [0x7f0a20e124ba]
#13: /opt/jdk-15/lib/compressedrefs/libj9prt29.so(+0x25cf3) [0x7f0a2d79acf3]
#14: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x1794b5) [0x7f0a20e144b5]
#15: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x179a68) [0x7f0a20e14a68]
#16: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x1751a3) [0x7f0a20e101a3]
#17: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x175662) [0x7f0a20e10662]
#18: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x17570a) [0x7f0a20e1070a]
#19: /opt/jdk-15/lib/compressedrefs/libj9prt29.so(+0x25cf3) [0x7f0a2d79acf3]
#20: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x175b5d) [0x7f0a20e10b5d]
#21: /opt/jdk-15/lib/compressedrefs/libj9thr29.so(+0xe2f2) [0x7f0a2dc122f2]
#22: /lib64/libpthread.so.0(+0x7ea5) [0x7f0a2f8e9ea5]
#23: function clone+0x6d [0x7f0a2f40e8dd]
Added new_dumps.zip to the Box folder.
This is a profiling compile. IIRC @r30shah I think we've seen these BadILOp failures I think it was fixed in eclipse/omr#5644. Rahil could you take a quick peek and find out if this is another instance of that problem? Note that the user may have been running with an older JDK not containing the OMR fix referenced.
Failure in https://github.com/eclipse/openj9/issues/11179#issuecomment-730025374 seems like the issue fixed by https://github.com/eclipse/omr/pull/5644 and looking at the Java version in the description, if it is the driver in which the failure is found, it does not contains the fix. @pshipton Can you please provide access to the box folder so I can verify that?
@r30shah added you, you should have received the invite. It is the same JVM.
1CIJ9VMTAG openj9-0.23.0
@Thihup this means you could try a recent nighty build, which contains the fix for the BadILOp problem.
https://adoptopenjdk.net/nightly.html?variant=openjdk15&jvmVariant=openj9
Looking at the jitdump from the new_dumps.zip , I can verify that it is same issue where a regLoad appears under passThrough node for which we do not have regStore as it should already be in register and it generates a node with type of PassThrough which causes badILOp assert, latest nightly build should not see this failure.
Using the latest nightly build the crash didn't happen 🎉
Should I give it a try and run without -Xnoaot? Is there some flag that maybe helps in the investigation?
Looks like using the nightly build the SCC cache was reset, but maybe I've messed something up before using it.
Any time the JVM is switched the cache will be reset. Nobody has mentioned the original problem being fixed, so it's unlikely to work without -Xnoaot, but you can try if you like.
Looked at core file in dumps.zip, the one investigated in https://github.com/eclipse/openj9/issues/11179#issuecomment-727015219
The problem is in the jit code of {com/sun/tools/javac/parser/JavacParser.literal(Lcom/sun/tools/javac/util/Name;I)Lcom/sun/tools/javac/tree/JCTree$JCExpression;}
We inlined com/sun/tools/javac/parser/JavacParser$SimpleEndPosTable.storeEnd but the object is com/sun/tools/javac/parser/JavacParser$EmptyEndPosTable. Then we read endPosMap field out of the object, which doesn't exist in com/sun/tools/javac/parser/JavacParser$EmptyEndPosTable. We read out of object bound and get a j9class, and pass it to the pic.
I don't see any guard here, nor can I find the reference of the inlined class in the disassembly.
-1:683 invokevirtual 87 {com/sun/tools/javac/parser/JavacParser.storeEnd(Lcom/sun/tools/javac/tree/JCTree;I)V}
{ //Inlined #63 0x00000000005b7598 {com/sun/tools/javac/parser/JavacParser.storeEnd}
63:0 aload0getfield
63:1 getfield 45 com/sun/tools/javac/parser/JavacParser.endPosTable (Lcom/sun/tools/javac/parser/JavacParser$AbstractEndPosTable;)
63:4 aload1
63:5 iload2
63:6 invokevirtual 125 {com/sun/tools/javac/parser/JavacParser$AbstractEndPosTable.storeEnd(Lcom/sun/tools/javac/tree/JCTree;I)V}
{ //Inlined #64 0x00000000005b9d78 {com/sun/tools/javac/parser/JavacParser$SimpleEndPosTable.storeEnd}
64:0 aload0getfield
64:1 getfield 5 com/sun/tools/javac/parser/JavacParser$SimpleEndPosTable.endPosMap (Lcom/sun/tools/javac/util/IntHashTable;)
64:4 aload1
64:5 aload0getfield
64:6 getfield 7 com/sun/tools/javac/parser/JavacParser$SimpleEndPosTable.errorEndPos (I)
64:9 iload2
64:10 ificmple C>> +20
64:13 aload0getfield
64:14 getfield 7 com/sun/tools/javac/parser/JavacParser$SimpleEndPosTable.errorEndPos (I)
64:17 goto U>> +21
64:20 iload2 <<< +10
64:21 aload0getfield <<< +17
64:22 getfield 5 com/sun/tools/javac/parser/JavacParser$SimpleEndPosTable.endPosMap (Lcom/sun/tools/javac/util/IntHashTable;)
64:25 aload1
64:26 invokevirtual 8 {com/sun/tools/javac/util/IntHashTable.lookup(Ljava/lang/Object;)I}
64:29 invokevirtual 9 {com/sun/tools/javac/util/IntHashTable.putAtIndex(Ljava/lang/Object;II)I}
64:32 pop
64:33 return0
}
63:9 return0
}
@Thihup Could you run with -Xjit:{com/sun/tools/javac/parser/JavacParser.literal(Lcom/sun/tools/javac/util/Name;I)Lcom/sun/tools/javac/tree/JCTree*JCExpression;}(traceFull,log=<path to log dir>/trace) -XcompilationThreads1, and upload the trace log, core dump and javacore?
If the application is started from command line, you may need to enclose the JIT option in single quotes.
@liqunl added jit_dumps.zip to the Box folder.
The core dump from https://github.com/eclipse/openj9/issues/11179#issuecomment-731391177 is a crash in different method failed in the same way.
{com/sun/tools/javac/parser/JavacParser.merge(Lcom/sun/tools/javac/util/ListBuffer;Lcom/sun/tools/javac/util/ListBuffer;)Z}
@Thihup I notice that you changed to use nightly build, can you switch back to the original build? And try -Xjit:{com/sun/tools/javac/parser/JavacParser.literal(Lcom/sun/tools/javac/util/Name;I)Lcom/sun/tools/javac/tree/JCTree*JCExpression;}(traceFull,log=<path to log dir>/trace),disableAsyncCompilation -XcompilationThreads1
@liqunl added dumps_jvm.zip to the Box folder.
Unfortunately the crash is still in a different method. There are 16 methods that can potentially have the same problematic inlining, they are
1 {com/sun/tools/javac/parser/JavacParser.syntaxError}
2 {com/sun/tools/javac/parser/JavacParser.literal}
3 {com/sun/tools/javac/parser/JavacParser.foldStrings}
4 {com/sun/tools/javac/parser/JavacParser.merge}
5 {com/sun/tools/javac/parser/JavacParser.localVariableDeclarations}
6 {com/sun/tools/javac/parser/JavacParser.switchBlockStatementGroup}
7 {com/sun/tools/javac/parser/JavacParser.switchBlockStatementGroup}
8 {com/sun/tools/javac/parser/JavacParser.modifiersOpt}
9 {com/sun/tools/javac/parser/JavacParser.annotation}
10 {com/sun/tools/javac/parser/JavacParser.variableDeclaratorsRest}
11 {com/sun/tools/javac/parser/JavacParser.resources}
12 {com/sun/tools/javac/parser/JavacParser.parseCompilationUnit}
13 {com/sun/tools/javac/parser/JavacParser.enumeratorDeclaration}
14 {com/sun/tools/javac/parser/JavacParser.classOrInterfaceOrRecordBodyDeclaration}
15 {com/sun/tools/javac/parser/JavacParser.classOrInterfaceOrRecordBodyDeclaration}
16 {com/sun/tools/javac/parser/JavacParser.insertAnnotationsToMostInner}
@Thihup Could you help us try a few experiments?
exclude.txt, and we can use it with the following java option-Xoptionsfile=/path/to/exclude.txttraceMethods.txt and use it with the following java option-Xoptionsfile=/path/to/traceMethods.txtPlease change /path/to to the path of the files on your computer.
Inside the two files, I have options to collect verbose log and/or trace log, the location is /tmp, you can change them to the place you want.
Unfortunately, I'm unable to test these options in our Jenkins machine.
Fortunately, I could simulate the crash in my machine. I hope now the iteration can be faster. So core dump most certainly will differ from the other machine but I'm using the same JVM.
PS: the crash simulation is not consistent, sometimes it crashes in the first compilation and sometimes it crashes only at the end of a series of compilations.
I've sent the new dumps to @pshipton
@Thihup Sorry, I thought you were running on it locally. If you want to try it on your Jenkins machine, you can take the options in the file and append it to command line options. Notice that \ separates one option into multiple lines. The option will be very long though, that's why I initially put them in an option file.
If you're sending the core dumps on OpenJ9 slack, you can find me (Liqun Liu) there.
@liqunl Ok, I'll send over the Slack
@liqunl added dumps2.zip (sent to me as dumps.zip) to the Box folder.
From an offline chat with Thihup, excluding the 16 methods stop the crash.
I got trace log of the failing method.
{com/sun/tools/javac/parser/JavacParser$SimpleEndPosTable.storeEnd} is inlined with a profiled guard with vft test, but VP then converts into abstract guard because VP proved that the object class, which is an abstract class, has only one implementer of the method. This contradicts to the inliner decision, as inliner first tries to find single implementer target, if that fails, then profiled target. And before we commit the compiled body to VM, we check that the assumption is correct.
Either VP's transformation is wrong (or something's wrong in CHTable), or we failed to patch the guard. We need to get details of inlining and VP, as well as the runtime assumptions.
@Thihup To continue the investigation, could you help us try the following option file using -Xoptionsfile=?
traceAndVerbose.txt
Thanks to help from @Thihup, I'm able to find the problem.
The object class indeed has only one implementer
object class: com/sun/tools/javac/parser/JavacParser$AbstractEndPosTable (abstract class)
the call: com/sun/tools/javac/tree/EndPosTable.storeEnd (interface method)
Abstract class JavacParser$AbstractEndPosTable implements interface class EndPosTable but doesn't provide implementation for storeEnd, so even though the bytecode of the calls is
invokevirtual 125 {com/sun/tools/javac/parser/JavacParser$AbstractEndPosTable.storeEnd(Lcom/sun/tools/javac/tree/JCTree;I)V}
the JIT generated calls to the interface method in our IL. Because inliner didn't get receiver type, so it used class of method, which is the interface class as the receiver type. Then it inlined the call with profiled guard, and didn't try to find single abstract implementer because the receiver class it has is not abstract.
However, the hottest profiled class, JavacParser$SimpleEndPosTable is loaded but not initialized, it's storeEnd has never been executed, I don't know how it end up in the profiling data.
Because JavacParser$SimpleEndPosTable is not initialized, it's not considered as an extend for JavacParser$AbstractEndPosTable, so JavacParser$AbstractEndPosTable has only one implementer, which is JavacParser$EmptyEndPosTable.
VP didn't check if inliner inlined the right method, and just changed the guard type.
I think two changes are needed.
Awesome find @liqunl! May I suggest we try to come up with a unit test for this and check it in as a deterministic JIT unit test to run as part of functional sanity?
However, the hottest profiled class, JavacParser$SimpleEndPosTable is loaded but not initialized, it's storeEnd has never been executed, I don't know how it end up in the profiling data.
Hm, it could be because we store IProfiler data into the SCC...which might explain why this only happens with SCC enabled
Inliner to not inline a method whose class hasn't been initialized
Might need to add some code to the relo infra to also check for this; I can help you figure out where.
@dsouzai Yes, I think it's because of SCC, as this crash only happens with SCC.
Now I'm not sure if I need change 2, VP may just assert if this happens (might be risky in release). We should try to fix it at the source of the problem
Awesome find @liqunl! May I suggest we try to come up with a unit test for this and check it in as a deterministic JIT unit test to run as part of functional sanity?
Off my head, I don't have any idea how to unit test this. It requires a bad iprofiler data, which rarely happen, and the origin of the bad data might be hard to find. Do you have any advice?
I should probably check the initialize status in profiler instead of where it's used
Maybe sharing my SCC can help?
Maybe sharing my SCC can help?
Thanks for offering and all the help you've been giving, we don't need SCC now.
@mpirvu I have a question regarding to iprofiler data in SCC. It seems to me that we're reusing iprofiler data from last run, but how do know we can trust it? In this case, the hottest profiled class (with frequency 100%), is not even initialized, its method is never executed. Even if it's loaded and its method is executed, do we verify it's still the hottest? If not, do we allow new iprofiler data to be collected, and maybe fuse with or overwrite the old iprofiler data?
IProfiler data is collected and stored in the SCC so that we can turn off IProfiler collection during start-up phase in the subsequent runs to improve start-up time of applications. Note that IProfiler will be turned ON after start-up phase. This means that we may have two sources of IProfiler data: current run and SCC. When that happens, the JIT picks the entry with the most number of samples.
There is fusing of the two sources of IProfiler data in the following way: when asking about a particular bytecode, if we pick the SCC as the source of the IProfiler data, we take that IProfiler entry and put it into our runtime IProfiler hashtable. This bytecode will continue to receive samples (until the method gets compiled , of course) and its entry will be updated. However, the corresponding IProfiler entry in the SCC will not be overwritten/updated.
how do know we can trust it
The assumption is that the stored profiling information will continue to be relevant in the subsequent runs because the application hasn't changed (or it's very similar). Of course, that may not always be the case. We don't try to do any validation and the issue you discovered is an interesting one. Checking that the profiled class has been initialized makes a lot of sense.
@liqunl you'll have to open another PR for the 0.24 branch
@Thihup the fix for this will be in the 0.24 release.
Most helpful comment
Thanks to help from @Thihup, I'm able to find the problem.
The object class indeed has only one implementer
object class:
com/sun/tools/javac/parser/JavacParser$AbstractEndPosTable(abstract class)the call:
com/sun/tools/javac/tree/EndPosTable.storeEnd(interface method)Abstract class
JavacParser$AbstractEndPosTableimplements interface classEndPosTablebut doesn't provide implementation forstoreEnd, so even though the bytecode of the calls isthe JIT generated calls to the interface method in our IL. Because inliner didn't get receiver type, so it used class of method, which is the interface class as the receiver type. Then it inlined the call with profiled guard, and didn't try to find single abstract implementer because the receiver class it has is not abstract.
However, the hottest profiled class,
JavacParser$SimpleEndPosTableis loaded but not initialized, it'sstoreEndhas never been executed, I don't know how it end up in the profiling data.Because
JavacParser$SimpleEndPosTableis not initialized, it's not considered as an extend forJavacParser$AbstractEndPosTable, soJavacParser$AbstractEndPosTablehas only one implementer, which isJavacParser$EmptyEndPosTable.VP didn't check if inliner inlined the right method, and just changed the guard type.
I think two changes are needed.