Openj9: Crash during compilation using shared classes

Created on 13 Nov 2020  ·  51Comments  ·  Source: eclipse/openj9

Java -version output

openjdk version "15.0.1" 2020-10-20
OpenJDK Runtime Environment AdoptOpenJDK (build 15.0.1+9)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.23.0, JRE 15 Linux amd64-64-Bit Compressed References 20201022_81 (JIT enabled, AOT enabled)
OpenJ9 - 0394ef754
OMR - 582366ae5
JCL - ad583de3b5 based on jdk-15.0.1+9)

Summary of problem

I tried to use the same class cache using across different jenkins builds. Unfortunately, it crashed the JVM. Without using the class cache the build works.

Diagnostic files

I can send an email of the core dump or via Slack.

[javac] warning: [options] bootstrap class path not set in conjunction with -source 7
    [javac] warning: [options] source value 7 is obsolete and will be removed in a future release
    [javac] warning: [options] target value 7 is obsolete and will be removed in a future release
    [javac] warning: [options] To suppress warnings about obsolete options, use -Xlint:-options.
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000002
Handler1=00007F6DA31E6800 Handler2=00007F6DA2ACAF90 InaccessibleAddress=0000000099669AC0
RDI=00007F6D7C45F9D3 RSI=00007F6D7C45F9D8 RAX=0000000000000050 RBX=0000000000480300
RCX=0000000099669900 RDX=00007F6D7C45F9D8 R8=00000000FFA7B030 R9=00000000FFA7B030
R10=0000000099669900 R11=00000000FFA7B030 R12=0000000089D49498 R13=00000000FFA7B050
R14=0000000089D42208 R15=0000000000000011
RIP=00007F6DA13224CA GS=0000 FS=0000 RSP=00000000001098F0
EFlags=0000000000010207 CS=0033 RBP=0000000000014E00 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000099669AC0
xmm0 0000000000000000 (f: 0,000000, d: 0,000000e+00)
xmm1 0000000048f696c0 (f: 1224120064,000000, d: 6,047956e-315)
xmm2 00007f6da4127690 (f: 2752673536,000000, d: 6,922298e-310)
xmm3 00007f6da4127690 (f: 2752673536,000000, d: 6,922298e-310)
xmm4 43e0000000000000 (f: 0,000000, d: 9,223372e+18)
xmm5 000000003dd34d0c (f: 1037257984,000000, d: 5,124735e-315)
xmm6 0000000049a54880 (f: 1235568768,000000, d: 6,104521e-315)
xmm7 000000003bd80000 (f: 1004011520,000000, d: 4,960476e-315)
xmm8 0000000000000000 (f: 0,000000, d: 0,000000e+00)
xmm9 000000003e17cee7 (f: 1041747712,000000, d: 5,146917e-315)
xmm10 3f847ae147ae147b (f: 1202590848,000000, d: 1,000000e-02)
xmm11 3fefae147ae147ae (f: 2061584256,000000, d: 9,900000e-01)
xmm12 3ff0000000000000 (f: 0,000000, d: 1,000000e+00)
xmm13 410075c000000000 (f: 0,000000, d: 1,348400e+05)
xmm14 000000004984f158 (f: 1233449344,000000, d: 6,094049e-315)
xmm15 0000000047d4b560 (f: 1205122432,000000, d: 5,954096e-315)
Module=/opt/jdk-15/lib/compressedrefs/libj9jit29.so
Module_base_address=00007F6DA0FC6000
Target=2_90_20201022_81 (Linux 3.10.0-1127.19.1.el7.x86_64)
CPU=amd64 (2 logical CPUs) (0x1d8e2a000 RAM)
----------- Stack Backtrace -----------
(0x00007F6DA2AFDFD2 [libj9prt29.so+0x57fd2])
(0x00007F6DA2ACBCF3 [libj9prt29.so+0x25cf3])
(0x00007F6DA2AFE04E [libj9prt29.so+0x5804e])
(0x00007F6DA2AFE144 [libj9prt29.so+0x58144])
(0x00007F6DA2ACBCF3 [libj9prt29.so+0x25cf3])
(0x00007F6DA2AFDEAB [libj9prt29.so+0x57eab])
(0x00007F6DA31E6344 [libj9vm29.so+0x96344])
(0x00007F6DA2ACBCF3 [libj9prt29.so+0x25cf3])
(0x00007F6DA31E65B5 [libj9vm29.so+0x965b5])
(0x00007F6DA3213D65 [libj9vm29.so+0xc3d65])
--------------------------------------
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at com/sun/tools/javac/parser/JavacParser.parseCompilationUnit(JavacParser.java:3521)
4XESTACKTRACE                at com/sun/tools/javac/main/JavaCompiler.parse(JavaCompiler.java:639)
4XESTACKTRACE                at com/sun/tools/javac/main/JavaCompiler.parse(JavaCompiler.java:676)
4XESTACKTRACE                at com/sun/tools/javac/main/JavaCompiler.parseFiles(JavaCompiler.java:1033)
4XESTACKTRACE                at com/sun/tools/javac/main/JavaCompiler.parseFiles(JavaCompiler.java:1020)
4XESTACKTRACE                at com/sun/tools/javac/main/JavaCompiler.compile(JavaCompiler.java:938)
4XESTACKTRACE                at com/sun/tools/javac/main/Main.compile(Main.java:317)
4XESTACKTRACE                at com/sun/tools/javac/main/Main.compile(Main.java:176)
4XESTACKTRACE                at com/sun/tools/javac/Main.compile(Main.java:59)
compaot segfault userRaised

Most helpful comment

Thanks to help from @Thihup, I'm able to find the problem.

The object class indeed has only one implementer
object class: com/sun/tools/javac/parser/JavacParser$AbstractEndPosTable (abstract class)
the call: com/sun/tools/javac/tree/EndPosTable.storeEnd (interface method)

Abstract class JavacParser$AbstractEndPosTable implements interface class EndPosTable but doesn't provide implementation for storeEnd, so even though the bytecode of the calls is

invokevirtual 125 {com/sun/tools/javac/parser/JavacParser$AbstractEndPosTable.storeEnd(Lcom/sun/tools/javac/tree/JCTree;I)V}

the JIT generated calls to the interface method in our IL. Because inliner didn't get receiver type, so it used class of method, which is the interface class as the receiver type. Then it inlined the call with profiled guard, and didn't try to find single abstract implementer because the receiver class it has is not abstract.

However, the hottest profiled class, JavacParser$SimpleEndPosTable is loaded but not initialized, it's storeEnd has never been executed, I don't know how it end up in the profiling data.

Because JavacParser$SimpleEndPosTable is not initialized, it's not considered as an extend for JavacParser$AbstractEndPosTable, so JavacParser$AbstractEndPosTable has only one implementer, which is JavacParser$EmptyEndPosTable.

VP didn't check if inliner inlined the right method, and just changed the guard type.

I think two changes are needed.

  1. Inliner to not inline a method whose class hasn't been initialized
  2. VP to check the inlined method is single implementer before converting the guard. If the inlined method doesn't match the single implementer, convert the guard to a unconditional jump to slow path.

All 51 comments

I tried to use the same class cache using across different builds. Unfortunately, it crashed the JVM.

Sounds like an AOT issue. The build is quite recent so it should contain the CPU detection AOT changes. Subscribing @harryyu1994 and @dsouzai for next steps on how to investigate this one.

Note it refers to different jenkins jobs, not JVM builds.

From Slack: one build for project X, another build for project Y. Note that both project X and Y are very similar, mainly the changes are our source code, and the build basically invokes the Javac

InaccessibleAddress=0000000099669AC0

This is a bit suspect since the J9Class eyecatcher is 0x99669966. This is likely somehow related to the problem. I think we'll need to corresponding system core dump to load under gdb and figure out where exactly this crashed.

Could you upload the diagnostic files (ie the coredump, and javacore)? I could take a look and then could suggest other diagnostic options depending on what the coredump looks like.

Note the core file should be compressed first. They can be quite large, but usually compress quite a bit. It can also be shared privately if desired.

@fjeremic @dsouzai I've shared the core coredump.zip in a Box folder "Issue 11179", you should both have an invitation.

Sorry, it wasn't quite finished uploading, but it has finished now.

If the shared class cache was not built inside Docker then Portable Shared Cache is not enabled by default.
So the chances that this is related to Portable Shared Cache is small.

It's not using Docker. I'm using OPENJ9_JAVA_OPTIONS="-Xshareclasses:name=builds -Xscmx2g"

If possible, could you please try ./java -Xshareclasses:name=builds,printStats, it should give us the processor feature information.

If possible, could you please try ./java -Xshareclasses:name=builds,printStats, it should give us the processor feature information.

Seems this data should be added to the shareclasses section in the javacore.

If possible, could you please try ./java -Xshareclasses:name=builds,printStats, it should give us the processor feature information.

Seems this data should be added to the shareclasses section in the javacore.

Okay I created an issue for this, https://github.com/eclipse/openj9/issues/11184. I will address it when I get the chance.

Processor Features           = fpu cx8 cmov mmx sse
                           sse2 ssse3 fma sse4_1
                           popcnt aesni osxsave
                           avx

I did not see anything suspicious about processor detection. And we can also rule out portable shared cache since this is not the set of features used by -XX:+PortableSharedCache.

The stack trace was a bit messed up. So manually looking at the stack:

> !jitstack 0x0000000000014E00,0x8c82b8,0x00007fa29edec240                            
<14e00> *** BEGIN STACK WALK, flags = 01000001 walkThread = 0x0000000000014E00 ***
<14e00>     START_AT_JIT_FRAME
<14e00>     RECORD_BYTECODE_PC_OFFSET
<14e00> Initial values: walkSP = 0x00000000008C82B8, PC = 0x00007FA29EDEC240, literals = 0x0000000000000000, A0 = 0x0000000000000000, j2iFrame = 0x00000000008C8A08, ELS = 0x00007FA2C4E80840, decomp = 0x0000000000000000
<14e00> JIT inline frame: bp = 0x00000000008C8350, pc = 0x00007FA29EDEC240, unwindSP = 0x00000000008C82B8, cp = 0x00000000005D15F0, arg0EA = 0x0000000000000000, jitInfo = 0x00007FA1F3396338
<14e00>     Method: com/sun/tools/javac/parser/JavacParser$SimpleEndPosTable.storeEnd(Lcom/sun/tools/javac/tree/JCTree;I)V !j9method 0x00000000005D1778
<14e00> JIT inline frame: bp = 0x00000000008C8350, pc = 0x00007FA29EDEC240, unwindSP = 0x00000000008C82B8, cp = 0x00000000005CD8B0, arg0EA = 0x0000000000000000, jitInfo = 0x00007FA1F3396338
<14e00>     Method: com/sun/tools/javac/parser/JavacParser.storeEnd(Lcom/sun/tools/javac/tree/JCTree;I)V !j9method 0x00000000005C2DB8
<14e00> JIT frame: bp = 0x00000000008C8350, pc = 0x00007FA29EDEC240, unwindSP = 0x00000000008C82B8, cp = 0x00000000005CD8B0, arg0EA = 0x00000000008C8360, jitInfo = 0x00007FA1F3396338
<14e00>     Method: com/sun/tools/javac/parser/JavacParser.modifiersOpt(Lcom/sun/tools/javac/tree/JCTree$JCModifiers;)Lcom/sun/tools/javac/tree/JCTree$JCModifiers; !j9method 0x00000000005C3838
<14e00> JIT frame: bp = 0x00000000008C83F0, pc = 0x00007FA29EDEB0AA, unwindSP = 0x00000000008C8358, cp = 0x00000000005CD8B0, arg0EA = 0x00000000008C8400, jitInfo = 0x00007FA1F3396338
<14e00>     Method: com/sun/tools/javac/parser/JavacParser.modifiersOpt(Lcom/sun/tools/javac/tree/JCTree$JCModifiers;)Lcom/sun/tools/javac/tree/JCTree$JCModifiers; !j9method 0x00000000005C3838
Nov. 13, 2020 1:45:38 P.M. com.ibm.j9ddr.vm29.events.DefaultEventListener corruptData
WARNING: CorruptDataException thrown walking stack. walkThread = 0x0000000000014E00
com.ibm.j9ddr.AddressedCorruptDataException: Invalid JIT return address

The top most method that is not inlined is com/sun/tools/javac/parser/JavacParser.modifiersOpt(Lcom/sun/tools/javac/tree/JCTree$JCModifiers;)Lcom/sun/tools/javac/tree/JCTree$JCModifiers

(kca) x/10i 0x00007fa29edec240-25
0x7fa29edec227 {com/.../JavacParser.modifiersOpt} +4487      90                   nop       
0x7fa29edec228 {com/.../JavacParser.modifiersOpt} +4488      48bf0099669900000000 movabs    rdi, 0x99669900 
0x7fa29edec232 {com/.../JavacParser.modifiersOpt} +4498      483bf9               cmp       rdi, rcx
0x7fa29edec235 {com/.../JavacParser.modifiersOpt} +4501      0f8595150000         jne       0x7fa29eded7d0 C>> +10032 
0x7fa29edec23b {com/.../JavacParser.modifiersOpt} +4507      e85002d318           call      0x7fa2b7b1c490

the call to 0x7fa2b7b1c490 is:

(gdb) x/30i 0x7fa2b7b1c490
   0x7fa2b7b1c490 <populateVPicSlotCall>:   push   rdi
   0x7fa2b7b1c491 <populateVPicSlotCall+1>: push   rax
   0x7fa2b7b1c492 <populateVPicSlotCall+2>: push   rsi
   0x7fa2b7b1c493 <populateVPicSlotCall+3>: push   rdx
   0x7fa2b7b1c494 <populateVPicSlotCall+4>: push   rcx
   0x7fa2b7b1c495 <populateVPicSlotCall+5>: mov    rdx,QWORD PTR [rsp+0x28]
   0x7fa2b7b1c49a <populateVPicSlotCall+10>:    cmp    BYTE PTR [rdx-0xa],0x85
   0x7fa2b7b1c49e <populateVPicSlotCall+14>:    je     0x7fa2b7b1c56e <populateLastVPicSlot>
   0x7fa2b7b1c4a4 <populateVPicSlotCall+20>:    movzx  rdi,BYTE PTR [rdx+0x1]
   0x7fa2b7b1c4a9 <populateVPicSlotCall+25>:    lea    rdi,[rdi+rdx*1-0x3]
   0x7fa2b7b1c4ae <mergeVPicSlotCall>:  movsxd rsi,DWORD PTR [rdi-0x4]
   0x7fa2b7b1c4b2 <mergeVPicSlotCall+4>:    mov    ecx,DWORD PTR [rax]
   0x7fa2b7b1c4b4 <mergeVPicSlotCall+6>:    and    ecx,0xffffff00
   0x7fa2b7b1c4ba <mergeVPicSlotCall+12>:   lea    rax,[rdi+rsi*1-0x20]
   0x7fa2b7b1c4bf <mergeVPicSlotCall+17>:   mov    rsi,rdx
   0x7fa2b7b1c4c2 <mergeVPicSlotCall+20>:   call   0x7fa2b812de90 <jitResolveVirtualMethod>
   0x7fa2b7b1c4c7 <mergeVPicSlotCall+25>:   neg    rax
   0x7fa2b7b1c4ca <mergeVPicSlotCall+28>:   mov    rax,QWORD PTR [rcx+rax*1+0x170] <== crashing instruction

According to the javacore, RIP: 00007F6DA13224CA; however this is an inaccessible address for some reason... Thankfully an internal debugger was able to get the register context.

It seems like the problem is

0x7fa29edec228 {com/.../JavacParser.modifiersOpt} +4488      48bf0099669900000000 movabs    rdi, 0x99669900 

0x99669900 is not a class pointer, it looks like the eyecatcher of a j9class with the low bits masked out. Since this is hardcoded into the code, it is either missing a relocation or the relocation was applied incorrectly.

@Thihup I'm guessing you havent' deleted the SCC right? Could you run with -Xaot:aotrtDebugLevel=30,rtLog=<path to log dir>/rtLog -XcompilationThreads1. You'd have to upload the generated rtLog file(s) as well as the new coredump and javacore.

Thanks!

dumps.zip is uploaded to the folder.

So same situation:

1XHREGISTERS   Registers:
...
2XHREGISTER      RAX: 0000000000000050
...
2XHREGISTER      RCX: 0000000099669900
...
2XHREGISTER      RIP: 00007F8A3CFF74CA
> !jitstackslots 0x00014e00,0x8e40c8,0x00007f8a2515c430
<14e00> *** BEGIN STACK WALK, flags = 01400001 walkThread = 0x0000000000014E00 ***
<14e00>     START_AT_JIT_FRAME
<14e00>     ITERATE_O_SLOTS
<14e00>     RECORD_BYTECODE_PC_OFFSET
<14e00> Initial values: walkSP = 0x00000000008E40C8, PC = 0x00007F8A2515C430, literals = 0x0000000000000000, A0 = 0x0000000000000000, j2iFrame = 0x00000000008E4690, ELS = 0x00007F8A4B03E840, decomp = 0x0000000000000000
<14e00> JIT inline frame: bp = 0x00000000008E4150, pc = 0x00007F8A2515C430, unwindSP = 0x00000000008E40C8, cp = 0x00000000005B9BF0, arg0EA = 0x0000000000000000, jitInfo = 0x00007F898CDE53B8
<14e00>     Method: com/sun/tools/javac/parser/JavacParser$SimpleEndPosTable.storeEnd(Lcom/sun/tools/javac/tree/JCTree;I)V !j9method 0x00000000005B9D78
<14e00>     Bytecode index = 26, inlineDepth = 2, PC offset = 0x00007F8A2515C407
<14e00> JIT inline frame: bp = 0x00000000008E4150, pc = 0x00007F8A2515C430, unwindSP = 0x00000000008E40C8, cp = 0x00000000005ABA90, arg0EA = 0x0000000000000000, jitInfo = 0x00007F898CDE53B8
<14e00>     Method: com/sun/tools/javac/parser/JavacParser.storeEnd(Lcom/sun/tools/javac/tree/JCTree;I)V !j9method 0x00000000005B7598
<14e00>     Bytecode index = 6, inlineDepth = 1, PC offset = 0x0000000000000F88
<14e00> JIT frame: bp = 0x00000000008E4150, pc = 0x00007F8A2515C430, unwindSP = 0x00000000008E40C8, cp = 0x00000000005ABA90, arg0EA = 0x00000000008E4168, jitInfo = 0x00007F898CDE53B8
<14e00>     Method: com/sun/tools/javac/parser/JavacParser.literal(Lcom/sun/tools/javac/util/Name;I)Lcom/sun/tools/javac/tree/JCTree$JCExpression; !j9method 0x00000000005B76B8



md5-4966dfa72d5c285c6e72950569bcb846



0x7f8a2515c418 {com/.../JavacParser.literal} +1456 ~    49bb0099669900000000 movabs    r11, 0x99669900
0x7f8a2515c422 {com/.../JavacParser.literal} +1466 ~    4c3bd9               cmp       r11, rcx
0x7f8a2515c425 {com/.../JavacParser.literal} +1469 ~    0f8565270000         jne       0x7f8a2515eb90 C>> +11560
0x7f8a2515c42b {com/.../JavacParser.literal} +1475 ~    e860b0e917           call      0x7f8a3cff7490



md5-4966dfa72d5c285c6e72950569bcb846



(gdb) x/30i 0x7f8a3cff7490
   0x7f8a3cff7490 <populateVPicSlotCall>:   push   rdi
   0x7f8a3cff7491 <populateVPicSlotCall+1>: push   rax
   0x7f8a3cff7492 <populateVPicSlotCall+2>: push   rsi
   0x7f8a3cff7493 <populateVPicSlotCall+3>: push   rdx
   0x7f8a3cff7494 <populateVPicSlotCall+4>: push   rcx
   0x7f8a3cff7495 <populateVPicSlotCall+5>: mov    rdx,QWORD PTR [rsp+0x28]
   0x7f8a3cff749a <populateVPicSlotCall+10>:    cmp    BYTE PTR [rdx-0xa],0x85
   0x7f8a3cff749e <populateVPicSlotCall+14>:    je     0x7f8a3cff756e <populateLastVPicSlot>
   0x7f8a3cff74a4 <populateVPicSlotCall+20>:    movzx  rdi,BYTE PTR [rdx+0x1]
   0x7f8a3cff74a9 <populateVPicSlotCall+25>:    lea    rdi,[rdi+rdx*1-0x3]
   0x7f8a3cff74ae <mergeVPicSlotCall>:  movsxd rsi,DWORD PTR [rdi-0x4]
   0x7f8a3cff74b2 <mergeVPicSlotCall+4>:    mov    ecx,DWORD PTR [rax]
   0x7f8a3cff74b4 <mergeVPicSlotCall+6>:    and    ecx,0xffffff00
   0x7f8a3cff74ba <mergeVPicSlotCall+12>:   lea    rax,[rdi+rsi*1-0x20]
   0x7f8a3cff74bf <mergeVPicSlotCall+17>:   mov    rsi,rdx
   0x7f8a3cff74c2 <mergeVPicSlotCall+20>:   call   0x7f8a3d608e90 <jitResolveVirtualMethod>
   0x7f8a3cff74c7 <mergeVPicSlotCall+25>:   neg    rax
   0x7f8a3cff74ca <mergeVPicSlotCall+28>:   mov    rax,QWORD PTR [rcx+rax*1+0x170] <== crashing instruction

However, from the rtLog, it looks like we failed during relocation of com/sun/tools/javac/parser/JavacParser.literal(Lcom/sun/tools/javac/util/Name;I)Lcom/sun/tools/javac/tree/JCTree$JCExpression; !j9method 0x00000000005B76B8. So that means that the crashing method is a JIT'd method - when we fail a relocation, we retry it as a JIT compilation. I guess I had just assumed that it was an AOT method 🤷‍♂️ .

Will have to hand it over to someone on the x86 codegen. @andrewcraik, looks like we're hardcoding 0x99669900 as the class pointer in the VPIC slot. I'm not exactly sure why this doesn't happen without the SCC; could be a timing related issue. I don't think bad AOT code somewhere else could result in us hardcoding a bad class pointer into the code.

@liqunl could you please take a look?

Marius commented in Slack "It's possible that the crash with shareclasses (SCC) is due to AOT in which case you could continue to use the SCC, but just for storing classes. To achieve this, leave the SCC options as is and add -Xnoaot to the command line.

Thihup: "Running with -Xnoaot I got"

Assertion failed at ../omr/compiler/codegen/OMRTreeEvaluator.cpp:591: false
VMState: 0x0005ff04
    badILOp BadILOp cannot be evaluated
compiling java/io/BufferedReader.read([CII)I at level: very-hot (profiling)
#0: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x8aa345) [0x7f0a21545345]
#1: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x8b4d90) [0x7f0a2154fd90]
#2: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x5718ce) [0x7f0a2120c8ce]
#3: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x57307d) [0x7f0a2120e07d]
#4: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x5040a7) [0x7f0a2119f0a7]
#5: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x4f325f) [0x7f0a2118e25f]
#6: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x13381e) [0x7f0a20dce81e]
#7: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x500f72) [0x7f0a2119bf72]
#8: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x4ff8a2) [0x7f0a2119a8a2]
#9: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x4fcee3) [0x7f0a21197ee3]
#10: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x5125a9) [0x7f0a211ad5a9]
#11: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x176504) [0x7f0a20e11504]
#12: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x1774ba) [0x7f0a20e124ba]
#13: /opt/jdk-15/lib/compressedrefs/libj9prt29.so(+0x25cf3) [0x7f0a2d79acf3]
#14: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x1794b5) [0x7f0a20e144b5]
#15: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x179a68) [0x7f0a20e14a68]
#16: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x1751a3) [0x7f0a20e101a3]
#17: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x175662) [0x7f0a20e10662]
#18: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x17570a) [0x7f0a20e1070a]
#19: /opt/jdk-15/lib/compressedrefs/libj9prt29.so(+0x25cf3) [0x7f0a2d79acf3]
#20: /opt/jdk-15/lib/compressedrefs/libj9jit29.so(+0x175b5d) [0x7f0a20e10b5d]
#21: /opt/jdk-15/lib/compressedrefs/libj9thr29.so(+0xe2f2) [0x7f0a2dc122f2]
#22: /lib64/libpthread.so.0(+0x7ea5) [0x7f0a2f8e9ea5]
#23: function clone+0x6d [0x7f0a2f40e8dd]

Added new_dumps.zip to the Box folder.

This is a profiling compile. IIRC @r30shah I think we've seen these BadILOp failures I think it was fixed in eclipse/omr#5644. Rahil could you take a quick peek and find out if this is another instance of that problem? Note that the user may have been running with an older JDK not containing the OMR fix referenced.

Failure in https://github.com/eclipse/openj9/issues/11179#issuecomment-730025374 seems like the issue fixed by https://github.com/eclipse/omr/pull/5644 and looking at the Java version in the description, if it is the driver in which the failure is found, it does not contains the fix. @pshipton Can you please provide access to the box folder so I can verify that?

@r30shah added you, you should have received the invite. It is the same JVM.
1CIJ9VMTAG openj9-0.23.0

@Thihup this means you could try a recent nighty build, which contains the fix for the BadILOp problem.
https://adoptopenjdk.net/nightly.html?variant=openjdk15&jvmVariant=openj9

Looking at the jitdump from the new_dumps.zip , I can verify that it is same issue where a regLoad appears under passThrough node for which we do not have regStore as it should already be in register and it generates a node with type of PassThrough which causes badILOp assert, latest nightly build should not see this failure.

Using the latest nightly build the crash didn't happen 🎉
Should I give it a try and run without -Xnoaot? Is there some flag that maybe helps in the investigation?
Looks like using the nightly build the SCC cache was reset, but maybe I've messed something up before using it.

Any time the JVM is switched the cache will be reset. Nobody has mentioned the original problem being fixed, so it's unlikely to work without -Xnoaot, but you can try if you like.

Looked at core file in dumps.zip, the one investigated in https://github.com/eclipse/openj9/issues/11179#issuecomment-727015219

The problem is in the jit code of {com/sun/tools/javac/parser/JavacParser.literal(Lcom/sun/tools/javac/util/Name;I)Lcom/sun/tools/javac/tree/JCTree$JCExpression;}

We inlined com/sun/tools/javac/parser/JavacParser$SimpleEndPosTable.storeEnd but the object is com/sun/tools/javac/parser/JavacParser$EmptyEndPosTable. Then we read endPosMap field out of the object, which doesn't exist in com/sun/tools/javac/parser/JavacParser$EmptyEndPosTable. We read out of object bound and get a j9class, and pass it to the pic.

I don't see any guard here, nor can I find the reference of the inlined class in the disassembly.

-1:683   invokevirtual 87 {com/sun/tools/javac/parser/JavacParser.storeEnd(Lcom/sun/tools/javac/tree/JCTree;I)V}
{  //Inlined #63 0x00000000005b7598 {com/sun/tools/javac/parser/JavacParser.storeEnd}
   63:0     aload0getfield 
   63:1     getfield 45 com/sun/tools/javac/parser/JavacParser.endPosTable (Lcom/sun/tools/javac/parser/JavacParser$AbstractEndPosTable;)
   63:4     aload1 
   63:5     iload2 
   63:6     invokevirtual 125 {com/sun/tools/javac/parser/JavacParser$AbstractEndPosTable.storeEnd(Lcom/sun/tools/javac/tree/JCTree;I)V}
   {  //Inlined #64 0x00000000005b9d78 {com/sun/tools/javac/parser/JavacParser$SimpleEndPosTable.storeEnd}
      64:0     aload0getfield 
      64:1     getfield 5 com/sun/tools/javac/parser/JavacParser$SimpleEndPosTable.endPosMap (Lcom/sun/tools/javac/util/IntHashTable;)
      64:4     aload1 
      64:5     aload0getfield 
      64:6     getfield 7 com/sun/tools/javac/parser/JavacParser$SimpleEndPosTable.errorEndPos (I)
      64:9     iload2 
      64:10    ificmple C>> +20
      64:13    aload0getfield 
      64:14    getfield 7 com/sun/tools/javac/parser/JavacParser$SimpleEndPosTable.errorEndPos (I)
      64:17    goto U>> +21
      64:20    iload2  <<< +10
      64:21    aload0getfield  <<< +17
      64:22    getfield 5 com/sun/tools/javac/parser/JavacParser$SimpleEndPosTable.endPosMap (Lcom/sun/tools/javac/util/IntHashTable;)
      64:25    aload1 
      64:26    invokevirtual 8 {com/sun/tools/javac/util/IntHashTable.lookup(Ljava/lang/Object;)I}
      64:29    invokevirtual 9 {com/sun/tools/javac/util/IntHashTable.putAtIndex(Ljava/lang/Object;II)I}
      64:32    pop  
      64:33    return0 
   }
   63:9     return0 
}

@Thihup Could you run with -Xjit:{com/sun/tools/javac/parser/JavacParser.literal(Lcom/sun/tools/javac/util/Name;I)Lcom/sun/tools/javac/tree/JCTree*JCExpression;}(traceFull,log=<path to log dir>/trace) -XcompilationThreads1, and upload the trace log, core dump and javacore?

If the application is started from command line, you may need to enclose the JIT option in single quotes.

@liqunl added jit_dumps.zip to the Box folder.

The core dump from https://github.com/eclipse/openj9/issues/11179#issuecomment-731391177 is a crash in different method failed in the same way.
{com/sun/tools/javac/parser/JavacParser.merge(Lcom/sun/tools/javac/util/ListBuffer;Lcom/sun/tools/javac/util/ListBuffer;)Z}

@Thihup I notice that you changed to use nightly build, can you switch back to the original build? And try -Xjit:{com/sun/tools/javac/parser/JavacParser.literal(Lcom/sun/tools/javac/util/Name;I)Lcom/sun/tools/javac/tree/JCTree*JCExpression;}(traceFull,log=<path to log dir>/trace),disableAsyncCompilation -XcompilationThreads1

@liqunl added dumps_jvm.zip to the Box folder.

Unfortunately the crash is still in a different method. There are 16 methods that can potentially have the same problematic inlining, they are

  1 {com/sun/tools/javac/parser/JavacParser.syntaxError}
  2 {com/sun/tools/javac/parser/JavacParser.literal}
  3 {com/sun/tools/javac/parser/JavacParser.foldStrings}
  4 {com/sun/tools/javac/parser/JavacParser.merge}
  5 {com/sun/tools/javac/parser/JavacParser.localVariableDeclarations}
  6 {com/sun/tools/javac/parser/JavacParser.switchBlockStatementGroup}
  7 {com/sun/tools/javac/parser/JavacParser.switchBlockStatementGroup}
  8 {com/sun/tools/javac/parser/JavacParser.modifiersOpt}
  9 {com/sun/tools/javac/parser/JavacParser.annotation}
 10 {com/sun/tools/javac/parser/JavacParser.variableDeclaratorsRest}
 11 {com/sun/tools/javac/parser/JavacParser.resources}
 12 {com/sun/tools/javac/parser/JavacParser.parseCompilationUnit}
 13 {com/sun/tools/javac/parser/JavacParser.enumeratorDeclaration}
 14 {com/sun/tools/javac/parser/JavacParser.classOrInterfaceOrRecordBodyDeclaration}
 15 {com/sun/tools/javac/parser/JavacParser.classOrInterfaceOrRecordBodyDeclaration}
 16 {com/sun/tools/javac/parser/JavacParser.insertAnnotationsToMostInner}

@Thihup Could you help us try a few experiments?

  1. Exclude compilation of the 16 methods and see if it prevents the crash. I've put the JIT options in exclude.txt, and we can use it with the following java option
    -Xoptionsfile=/path/to/exclude.txt
    exclude.txt
  2. If 1 works, we want to get trace logs of the 16 methods from a crash. To do this, download traceMethods.txt and use it with the following java option
    -Xoptionsfile=/path/to/traceMethods.txt
    traceMethods.txt

Please change /path/to to the path of the files on your computer.
Inside the two files, I have options to collect verbose log and/or trace log, the location is /tmp, you can change them to the place you want.

Unfortunately, I'm unable to test these options in our Jenkins machine.
Fortunately, I could simulate the crash in my machine. I hope now the iteration can be faster. So core dump most certainly will differ from the other machine but I'm using the same JVM.

PS: the crash simulation is not consistent, sometimes it crashes in the first compilation and sometimes it crashes only at the end of a series of compilations.

I've sent the new dumps to @pshipton

@Thihup Sorry, I thought you were running on it locally. If you want to try it on your Jenkins machine, you can take the options in the file and append it to command line options. Notice that \ separates one option into multiple lines. The option will be very long though, that's why I initially put them in an option file.

If you're sending the core dumps on OpenJ9 slack, you can find me (Liqun Liu) there.

@liqunl Ok, I'll send over the Slack

@liqunl added dumps2.zip (sent to me as dumps.zip) to the Box folder.

From an offline chat with Thihup, excluding the 16 methods stop the crash.
I got trace log of the failing method.
{com/sun/tools/javac/parser/JavacParser$SimpleEndPosTable.storeEnd} is inlined with a profiled guard with vft test, but VP then converts into abstract guard because VP proved that the object class, which is an abstract class, has only one implementer of the method. This contradicts to the inliner decision, as inliner first tries to find single implementer target, if that fails, then profiled target. And before we commit the compiled body to VM, we check that the assumption is correct.

Either VP's transformation is wrong (or something's wrong in CHTable), or we failed to patch the guard. We need to get details of inlining and VP, as well as the runtime assumptions.

@Thihup To continue the investigation, could you help us try the following option file using -Xoptionsfile=?
traceAndVerbose.txt

Thanks to help from @Thihup, I'm able to find the problem.

The object class indeed has only one implementer
object class: com/sun/tools/javac/parser/JavacParser$AbstractEndPosTable (abstract class)
the call: com/sun/tools/javac/tree/EndPosTable.storeEnd (interface method)

Abstract class JavacParser$AbstractEndPosTable implements interface class EndPosTable but doesn't provide implementation for storeEnd, so even though the bytecode of the calls is

invokevirtual 125 {com/sun/tools/javac/parser/JavacParser$AbstractEndPosTable.storeEnd(Lcom/sun/tools/javac/tree/JCTree;I)V}

the JIT generated calls to the interface method in our IL. Because inliner didn't get receiver type, so it used class of method, which is the interface class as the receiver type. Then it inlined the call with profiled guard, and didn't try to find single abstract implementer because the receiver class it has is not abstract.

However, the hottest profiled class, JavacParser$SimpleEndPosTable is loaded but not initialized, it's storeEnd has never been executed, I don't know how it end up in the profiling data.

Because JavacParser$SimpleEndPosTable is not initialized, it's not considered as an extend for JavacParser$AbstractEndPosTable, so JavacParser$AbstractEndPosTable has only one implementer, which is JavacParser$EmptyEndPosTable.

VP didn't check if inliner inlined the right method, and just changed the guard type.

I think two changes are needed.

  1. Inliner to not inline a method whose class hasn't been initialized
  2. VP to check the inlined method is single implementer before converting the guard. If the inlined method doesn't match the single implementer, convert the guard to a unconditional jump to slow path.

Awesome find @liqunl! May I suggest we try to come up with a unit test for this and check it in as a deterministic JIT unit test to run as part of functional sanity?

However, the hottest profiled class, JavacParser$SimpleEndPosTable is loaded but not initialized, it's storeEnd has never been executed, I don't know how it end up in the profiling data.

Hm, it could be because we store IProfiler data into the SCC...which might explain why this only happens with SCC enabled

Inliner to not inline a method whose class hasn't been initialized

Might need to add some code to the relo infra to also check for this; I can help you figure out where.

@dsouzai Yes, I think it's because of SCC, as this crash only happens with SCC.

Now I'm not sure if I need change 2, VP may just assert if this happens (might be risky in release). We should try to fix it at the source of the problem

Awesome find @liqunl! May I suggest we try to come up with a unit test for this and check it in as a deterministic JIT unit test to run as part of functional sanity?

Off my head, I don't have any idea how to unit test this. It requires a bad iprofiler data, which rarely happen, and the origin of the bad data might be hard to find. Do you have any advice?

I should probably check the initialize status in profiler instead of where it's used

Maybe sharing my SCC can help?

Maybe sharing my SCC can help?

Thanks for offering and all the help you've been giving, we don't need SCC now.

@mpirvu I have a question regarding to iprofiler data in SCC. It seems to me that we're reusing iprofiler data from last run, but how do know we can trust it? In this case, the hottest profiled class (with frequency 100%), is not even initialized, its method is never executed. Even if it's loaded and its method is executed, do we verify it's still the hottest? If not, do we allow new iprofiler data to be collected, and maybe fuse with or overwrite the old iprofiler data?

IProfiler data is collected and stored in the SCC so that we can turn off IProfiler collection during start-up phase in the subsequent runs to improve start-up time of applications. Note that IProfiler will be turned ON after start-up phase. This means that we may have two sources of IProfiler data: current run and SCC. When that happens, the JIT picks the entry with the most number of samples.
There is fusing of the two sources of IProfiler data in the following way: when asking about a particular bytecode, if we pick the SCC as the source of the IProfiler data, we take that IProfiler entry and put it into our runtime IProfiler hashtable. This bytecode will continue to receive samples (until the method gets compiled , of course) and its entry will be updated. However, the corresponding IProfiler entry in the SCC will not be overwritten/updated.

how do know we can trust it

The assumption is that the stored profiling information will continue to be relevant in the subsequent runs because the application hasn't changed (or it's very similar). Of course, that may not always be the case. We don't try to do any validation and the issue you discovered is an interesting one. Checking that the profiled class has been initialized makes a lot of sense.

@liqunl you'll have to open another PR for the 0.24 branch

@Thihup the fix for this will be in the 0.24 release.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

AdamBrousseau picture AdamBrousseau  ·  6Comments

mikezhang1234567890 picture mikezhang1234567890  ·  5Comments

Mesbah-Alam picture Mesbah-Alam  ·  5Comments

xliang6 picture xliang6  ·  3Comments

JasonFengJ9 picture JasonFengJ9  ·  3Comments