openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-b10)
Eclipse OpenJ9 VM (build openj9-0.15.1, JRE 1.8.0 Linux s390x-64-Bit Compressed References 20190717_371 (JIT enabled, AOT enabled)
OpenJ9 - 0f66c6431
OMR - ec782f26
JCL - f147086df1 based on jdk8u222-b10)
JVM crashes due to SIGSEGV in libj9vm29.so:
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=0000020032448B58 Handler2=00000200326A7CE0 InaccessibleAddress=0000000000000000
gpr0=0000000000000000 gpr1=0000000000000004 gpr2=000000000206E700 gpr3=0000000000000000
gpr4=0000000000000000 gpr5=0000020034030320 gpr6=000000000223A300 gpr7=0000000BF29FA5A0
gpr8=000000000206E700 gpr9=0000000000000000 gpr10=0000000000000000 gpr11=000000000206EA08
gpr12=0000000002238B00 gpr13=0000020032511D68 gpr14=000002003239B2F4 gpr15=0000020151266EE8
psw=00000200324445CC mask=0705000180000000 fpc=0008ff00 bea=000002003239B2EE
fpr0 3f400000b2741fa9 (f: 2993954816.000000, d: 4.882816e-04)
fpr1 3ffa425d8d88711c (f: 2374529280.000000, d: 1.641203e+00)
fpr2 3fa1dc4f74616443 (f: 1952539776.000000, d: 3.488396e-02)
fpr3 3ffa4e4685db6e19 (f: 2245750272.000000, d: 1.644110e+00)
fpr4 bfd27eff4d51c6e0 (f: 1297204992.000000, d: -2.890013e-01)
fpr5 a000c000a000c000 (f: 2684403712.000000, d: -1.561591e-154)
fpr6 3fcee6893971c859 (f: 963758144.000000, d: 2.414104e-01)
fpr7 8000800080008000 (f: 2147516416.000000, d: -6.953462e-310)
fpr8 0000000000041000 (f: 266240.000000, d: 1.315400e-318)
fpr9 000003fff027e7a0 (f: 4029147136.000000, d: 2.172792e-311)
fpr10 0000020151228000 (f: 1361215488.000000, d: 1.089256e-311)
fpr11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9vm29.so
Module_base_address=0000020032380000
Target=2_90_20190717_371 (Linux 4.15.0-62-generic)
CPU=s390x (16 logical CPUs) (0xfb268a000 RAM)
----------- Stack Backtrace -----------
(0x00000200324445CC [libj9vm29.so+0xc45cc])
(0x000002003239B2F4 [libj9vm29.so+0x1b2f4])
(0x000002003239ABE4 [libj9vm29.so+0x1abe4])
(0x000002003248A7E4 [libj9vm29.so+0x10a7e4])
---------------------------------------
It seems that the JVM crashed during a stack walk for exception handling:
(gdb) where
#0 0x00000200276445cc in walkStackForExceptionThrow () from /opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9vm29.so
#1 0x000002002759b2f4 in VM_BytecodeInterpreter::run(J9VMThread*) () from /opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9vm29.so
#2 0x000002002759abe4 in bytecodeLoop () from /opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9vm29.so
#3 0x000002002768a7e4 in c_cInterpreter () from /opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9vm29.so
#4 0x000002002762e97e in runJavaThread () from /opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9vm29.so
#5 0x000002002768a07e in javaProtectedThreadProc () from /opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9vm29.so
#6 0x00000200278a92f8 in omrsig_protect () from /opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9prt29.so
#7 0x000002002768619c in javaThreadProc () from /opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9vm29.so
#8 0x00000200277857f0 in thread_wrapper () from /opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9thr29.so
#9 0x0000020026e07aa8 in start_thread (arg=0x201be716910) at pthread_create.c:463
#10 0x000002002707a936 in thread_start () at ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
(gdb) disas 0x00000200276445cc
Dump of assembler code for function walkStackForExceptionThrow:
0x0000020027644580 <+0>: stmg %r11,%r15,88(%r15)
0x0000020027644586 <+6>: larl %r1,0x2002775e730
0x000002002764458c <+12>: ltgr %r4,%r4
0x0000020027644590 <+16>: lay %r15,-160(%r15)
0x0000020027644596 <+22>: lg %r11,600(%r2)
0x000002002764459c <+28>: lg %r1,0(%r1)
0x00000200276445a2 <+34>: stg %r1,192(%r11)
0x00000200276445a8 <+40>: lghi %r1,0
0x00000200276445ac <+44>: stg %r1,136(%r11)
0x00000200276445b2 <+50>: jne 0x20027644622 <walkStackForExceptionThrow+162>
0x00000200276445b6 <+54>: stg %r1,152(%r11)
0x00000200276445bc <+60>: stg %r1,160(%r11)
0x00000200276445c2 <+66>: lghi %r1,4
0x00000200276445c6 <+70>: stg %r1,168(%r11)
=> 0x00000200276445cc <+76>: llgf %r1,0(%r3)
0x00000200276445d2 <+82>: stg %r3,232(%r11)
0x00000200276445d8 <+88>: stg %r2,8(%r11)
0x00000200276445de <+94>: llilh %r3,42028
0x00000200276445e2 <+98>: stg %r3,16(%r11)
0x00000200276445e8 <+104>: nill %r1,65280
0x00000200276445ec <+108>: stg %r1,176(%r11)
0x00000200276445f2 <+114>: lghi %r1,0
0x00000200276445f6 <+118>: stg %r1,56(%r2)
0x00000200276445fc <+124>: lg %r1,8(%r2)
0x0000020027644602 <+130>: lgr %r3,%r11
0x0000020027644606 <+134>: lg %r1,8872(%r1)
0x000002002764460c <+140>: basr %r14,%r1
0x000002002764460e <+142>: lg %r4,272(%r15)
0x0000020027644614 <+148>: lg %r2,232(%r11)
0x000002002764461a <+154>: lmg %r11,%r15,248(%r15)
0x0000020027644620 <+160>: br %r4
0x0000020027644622 <+162>: lghi %r1,1
0x0000020027644626 <+166>: stg %r1,152(%r11)
0x000002002764462c <+172>: lghi %r1,0
0x0000020027644630 <+176>: stg %r1,160(%r11)
0x0000020027644636 <+182>: lghi %r1,4
0x000002002764463a <+186>: stg %r1,168(%r11)
0x0000020027644640 <+192>: llgf %r1,0(%r3)
0x0000020027644646 <+198>: stg %r3,232(%r11)
0x000002002764464c <+204>: stg %r2,8(%r11)
0x0000020027644652 <+210>: llilh %r3,1060
0x0000020027644656 <+214>: stg %r3,16(%r11)
0x000002002764465c <+220>: nill %r1,65280
0x0000020027644660 <+224>: stg %r1,176(%r11)
0x0000020027644666 <+230>: j 0x200276445f2 <walkStackForExceptionThrow+114>
End of assembler dump.
(gdb) info registers
pswm 0x705000180000000 505810539591499776
pswa 0x200276445cc 2199684138444
r0 0x0 0
r1 0x4 4
r2 0x207ed00 34073856
r3 0x0 0
r4 0x0 0
r5 0x20028030320 2199694541600
r6 0x22a5200 36327936
r7 0xbf78c17d0 51397793744
r8 0x207ed00 34073856
r9 0x0 0
r10 0x0 0
r11 0x207f008 34074632
r12 0x229b900 36288768
r13 0x20027711d68 2199684980072
r14 0x2002759b2f4 2199683445492
r15 0x201be714ee8 2206513319656
acr0 0x201 513
acr1 0xbe716910 3195103504
acr2 0x0 0
acr3 0x0 0
acr4 0x0 0
acr5 0x0 0
acr6 0x0 0
acr7 0x0 0
acr8 0x0 0
acr9 0x0 0
acr10 0x0 0
acr11 0x0 0
acr12 0x0 0
acr13 0x0 0
acr14 0x0 0
acr15 0x0 0
fpc 0x8ff00 589568
orig_r2 0xc1 193
last_break 0x2002759b2ee 0x2002759b2ee <VM_BytecodeInterpreter::run(J9VMThread*)+1782>
system_call 0x0 0
tdb0 <unavailable>
tac <unavailable>
tct <unavailable>
atia <unavailable>
tr0 <unavailable>
tr1 <unavailable>
tr2 <unavailable>
tr3 <unavailable>
tr4 <unavailable>
tr5 <unavailable>
tr6 <unavailable>
tr7 <unavailable>
tr8 <unavailable>
tr9 <unavailable>
tr10 <unavailable>
tr11 <unavailable>
tr12 <unavailable>
tr13 <unavailable>
tr14 <unavailable>
tr15 <unavailable>
gsd 0x400 1024
gssm 0x0 0
gsepla 0x207f760 0x207f760
bc_gsd <unavailable>
bc_gssm <unavailable>
bc_gsepla <unavailable>
pc 0x200276445cc 0x200276445cc <walkStackForExceptionThrow+76>
cc 0x0 0
(gdb)
Does R3 == 0 mean a NULL exception object is passed to the function?
https://github.com/eclipse/openj9/blob/60342c2c6a73813a5efe6022c2c7f26b5edf314d/runtime/vm/exceptionsupport.c#L793
Does R3 == 0 mean a NULL exception object is passed to the function?
That's right.
Can you make the core available?
@gacholio Thank you for your immediate response. Unfortunately we cannot provide a core since the target workload is under a non-disclosure agreement.
You can run DDR on the cores yourself to look at the stack of the crashing thread. It's very possible the top few frames will be system classes, and could be disclosed here.
To run DDR, run bin/jdmpview -core corefile. Once in, do a !gpinfo which will tell you the crashing thread (hex value following !j9vmthread, then run !stackslots thread to get the full stack dump.
@gacholio Thank you for your help. The top frame is shown below. The next fame seems to be customer code:
> !stackslots 0x22fde00
<22fde00> *** BEGIN STACK WALK, flags = 00400001 walkThread = 0x00000000022FDE00 ***
<22fde00> ITERATE_O_SLOTS
<22fde00> RECORD_BYTECODE_PC_OFFSET
<22fde00> Initial values: walkSP = 0x0000000002303C18, PC = 0x0000000000000005, literals = 0x0000000000000000, A0 = 0x0000000002303C38, j2iFrame = 0x0000000002303E60, ELS = 0x000002012CAE7A10, decomp = 0x0000000000000000
<22fde00> JIT resolve frame: bp = 0x0000000002303C38, sp = 0x0000000002303C18, pc = 0x0000000000000005, cp = 0x0000000000000000, arg0EA = 0x0000000002303C38, flags = 0x0000000000080000
<22fde00> Object push (savedJITException)
<22fde00> O-Slot[0x0000000002303C18] = 0x0000000000000000
<22fde00> Generic resolve
<22fde00> unwindSP initialized to 0x0000000002303C40
<22fde00> JIT-Resolve-RegisterMap[0x000002012CAE7660] = UDATA(0x0000000000000010) (jit_r0)
<22fde00> JIT-Resolve-RegisterMap[0x000002012CAE7668] = UDATA(0x0000000000000000) (jit_r1)
<22fde00> JIT-Resolve-RegisterMap[0x000002012CAE7670] = UDATA(0x0000000000000000) (jit_r2)
<22fde00> JIT-Resolve-RegisterMap[0x000002012CAE7678] = UDATA(0x0000000910D4A770) (jit_r3)
<22fde00> JIT-Resolve-RegisterMap[0x000002012CAE7680] = UDATA(0x000000000000003C) (jit_r4)
<22fde00> JIT-Resolve-RegisterMap[0x000002012CAE7688] = UDATA(0x0000000002303C40) (jit_r5)
<22fde00> JIT-Resolve-RegisterMap[0x000002012CAE7690] = UDATA(0x00000000021E9800) (jit_r6)
<22fde00> JIT-Resolve-RegisterMap[0x000002012CAE7698] = UDATA(0x00000000021E9800) (jit_r7)
<22fde00> JIT-Resolve-RegisterMap[0x000002012CAE76A0] = UDATA(0x000000091350F1E0) (jit_r8)
<22fde00> JIT-Resolve-RegisterMap[0x000002012CAE76A8] = UDATA(0x00000000009B5600) (jit_r9)
<22fde00> JIT-Resolve-RegisterMap[0x000002012CAE76B0] = UDATA(0x000000091350C760) (jit_r10)
<22fde00> JIT-Resolve-RegisterMap[0x000002012CAE76B8] = UDATA(0x0000000907B2D480) (jit_r11)
<22fde00> JIT-Resolve-RegisterMap[0x000002012CAE76C0] = UDATA(0x0000000907B2D480) (jit_r12)
<22fde00> JIT-Resolve-RegisterMap[0x000002012CAE76C8] = UDATA(0x00000000022FDE00) (jit_r13)
<22fde00> JIT-Resolve-RegisterMap[0x000002012CAE76D0] = UDATA(0x00000200411F1F78) (jit_r14)
<22fde00> JIT-Resolve-RegisterMap[0x000002012CAE76D8] = UDATA(0x0000000000000000) (jit_r15)
Not enough to say anything definitive, but this implies the NULL exception is being thrown from jitted code.
@gacholio Thank you very much for your suggestion. It turned out that excluding the corresponding method from JIT compilation resolves the issue. Also, the following compiation error has been observed:
genIL() returned 1
detected catch block_12 with real predecessors
Compilation Failed Because: Catch blocks have real predecessors
How can I dump the offending catch block into a log file?
@andrewcraik This looks to be moving into more your area...
Ok - will wait for the rest of the analysis - it sounds familiar. Does it occur with head OpenJ9 (eg the latest nightly)? I ask because we had some weird problems like this due to some bad OSR metadata which was fixed by https://github.com/eclipse/openj9/pull/7531 and this symptom looks a lot like some of the weird things we were seeing due to that...
The offending method, which is generated by bytecode instrumentatin, has the following control flow and exception handlers:
121, JBreturn1
122, JBaload1
123, JBifnull 8, 131,
126, JBaload1
127, JBiconst2
128, JBinvokevirtual 62
131, JBathrow
ExceptionRange: start [ 2d] end [ 79] handler [ 83] type [ 35]
ExceptionRange: start [ 2d] end [ 79] handler [ 83] type [ 36]
ExceptionRange: start [ 2d] end [ 79] handler [ 7a] type [ 0]
There is a control flow from the bytecode indices 123 JBifnull and 128 JBinvokevirtual of the last catch block (starting from the bytecode index 122 JBaload1) to the bytecode index 131 JBathrow of the first catch block. Compilation of the offending method fails at genIL() due to the existence of real predecessors for the last catch block. However, when compiling another caller method, inlining the offending method somehow does not fail, and this might cause incorrect dataflow analysis and code generation of the caller method.
@cathyzhyi could you take a look at this when you get a moment?
Compilation of the offending method fails at genIL() due to the existence of real predecessors for the last catch block. However, when compiling another caller method, inlining the offending method somehow does not fail
Hi @inatatsu, does the crash happens when another caller method inline the offending method successfully? If it is such a case, is it possible that you share the tracing log of the caller method by adding -Xjit:{SignatureOfTheCallerMethod}(tracefull,traceinlining,log=log)?
Also, could you post here more frames from output of !stackslots 0x22fde00?
@cathyzhyi any update?
@andrewcraik I think there is not enough data to proceed with the investigation. The offending method is detected to have a catch with real predecessors. JIT detected the case and aborted the compilation so it's still not clear why the failure occurred.
@cathyzhyi Sorry for my late response. I guess the offending method itself was not JIT compiled, but it was incorrectly inlined into another JIT compiled caller method. As explained above, I still cannot post the trace due to a non-disclosure agreement, but will discuss in our team how to send a trace for further analysis.
This failure should be fixed by https://github.com/eclipse/omr/pull/4691
@cathyzhyi Thank you very much. How can I try the fixed compiler?
The OMR change has merged - the fix will propagate to an AdoptOpenJDK OpenJ9 nightly build within the next couple of days all being well. That will be the easiest way to try the fix unless you want to do your open build.
An improvement of the original fix is in:
https://github.com/eclipse/omr/pull/4696
https://github.com/eclipse/openj9/pull/8218
https://github.com/eclipse/omr/pull/4697
@andrewcraik @cathyzhyi Thank you very much! How can I confirm the commit level of a nightly build of AdoptOpenJDK OpenJ9?
the commit sha is shown on the java -version output
@andrewcraik Does the OMR fix for this need to be cherrypicked into 0.18?
@DanHeidinga a good thought - this is low risk so we should get it in to 0.18 - @cathyzhyi can you open the appropriate pull request on openj9-omr's 0.18 branch?
@DanHeidinga a good thought - this is low risk so we should get it in to 0.18 - @cathyzhyi can you open the appropriate pull request on openj9-omr's 0.18 branch?
@DanHeidinga added PR here https://github.com/eclipse/openj9-omr/pull/45
The fix is merged to master and the 0.18.0 branch, I'll go ahead and close this.
@andrewcraik Which image tag should I use to try the latest nightly build? I was using adoptopenjdk/openjdk8-openj9 on s390x and it seems to be not yet updated:
https://hub.docker.com/layers/adoptopenjdk/openjdk8-openj9/latest/images/sha256-c66c260157a5949bed1ea25259d8e0dd2bf338b03f2ddb06d838a24910823a28
It is convenient that the image is based on the same distribution (ubuntu, debian, etc) as the latest since the application image is built on top of it.
The target application had successfully run by using the updated libj9jit29.so 馃帀 Thank you very much everyone!