Openj9: SIGSEGV in walkStackForExceptionThrow()

Created on 5 Nov 2019  路  26Comments  路  Source: eclipse/openj9

Java -version output

openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-b10)
Eclipse OpenJ9 VM (build openj9-0.15.1, JRE 1.8.0 Linux s390x-64-Bit Compressed References 20190717_371 (JIT enabled, AOT enabled)
OpenJ9   - 0f66c6431
OMR      - ec782f26
JCL      - f147086df1 based on jdk8u222-b10)

Summary of problem

JVM crashes due to SIGSEGV in libj9vm29.so:

Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=0000020032448B58 Handler2=00000200326A7CE0 InaccessibleAddress=0000000000000000
gpr0=0000000000000000 gpr1=0000000000000004 gpr2=000000000206E700 gpr3=0000000000000000
gpr4=0000000000000000 gpr5=0000020034030320 gpr6=000000000223A300 gpr7=0000000BF29FA5A0
gpr8=000000000206E700 gpr9=0000000000000000 gpr10=0000000000000000 gpr11=000000000206EA08
gpr12=0000000002238B00 gpr13=0000020032511D68 gpr14=000002003239B2F4 gpr15=0000020151266EE8
psw=00000200324445CC mask=0705000180000000 fpc=0008ff00 bea=000002003239B2EE
fpr0 3f400000b2741fa9 (f: 2993954816.000000, d: 4.882816e-04)
fpr1 3ffa425d8d88711c (f: 2374529280.000000, d: 1.641203e+00)
fpr2 3fa1dc4f74616443 (f: 1952539776.000000, d: 3.488396e-02)
fpr3 3ffa4e4685db6e19 (f: 2245750272.000000, d: 1.644110e+00)
fpr4 bfd27eff4d51c6e0 (f: 1297204992.000000, d: -2.890013e-01)
fpr5 a000c000a000c000 (f: 2684403712.000000, d: -1.561591e-154)
fpr6 3fcee6893971c859 (f: 963758144.000000, d: 2.414104e-01)
fpr7 8000800080008000 (f: 2147516416.000000, d: -6.953462e-310)
fpr8 0000000000041000 (f: 266240.000000, d: 1.315400e-318)
fpr9 000003fff027e7a0 (f: 4029147136.000000, d: 2.172792e-311)
fpr10 0000020151228000 (f: 1361215488.000000, d: 1.089256e-311)
fpr11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9vm29.so
Module_base_address=0000020032380000
Target=2_90_20190717_371 (Linux 4.15.0-62-generic)
CPU=s390x (16 logical CPUs) (0xfb268a000 RAM)
----------- Stack Backtrace -----------
(0x00000200324445CC [libj9vm29.so+0xc45cc])
(0x000002003239B2F4 [libj9vm29.so+0x1b2f4])
(0x000002003239ABE4 [libj9vm29.so+0x1abe4])
(0x000002003248A7E4 [libj9vm29.so+0x10a7e4])
---------------------------------------

It seems that the JVM crashed during a stack walk for exception handling:

(gdb) where
#0  0x00000200276445cc in walkStackForExceptionThrow () from /opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9vm29.so
#1  0x000002002759b2f4 in VM_BytecodeInterpreter::run(J9VMThread*) () from /opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9vm29.so
#2  0x000002002759abe4 in bytecodeLoop () from /opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9vm29.so
#3  0x000002002768a7e4 in c_cInterpreter () from /opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9vm29.so
#4  0x000002002762e97e in runJavaThread () from /opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9vm29.so
#5  0x000002002768a07e in javaProtectedThreadProc () from /opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9vm29.so
#6  0x00000200278a92f8 in omrsig_protect () from /opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9prt29.so
#7  0x000002002768619c in javaThreadProc () from /opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9vm29.so
#8  0x00000200277857f0 in thread_wrapper () from /opt/java/openjdk/jre/lib/s390x/compressedrefs/libj9thr29.so
#9  0x0000020026e07aa8 in start_thread (arg=0x201be716910) at pthread_create.c:463
#10 0x000002002707a936 in thread_start () at ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
(gdb) disas 0x00000200276445cc
Dump of assembler code for function walkStackForExceptionThrow:
   0x0000020027644580 <+0>:     stmg    %r11,%r15,88(%r15)
   0x0000020027644586 <+6>:     larl    %r1,0x2002775e730
   0x000002002764458c <+12>:    ltgr    %r4,%r4
   0x0000020027644590 <+16>:    lay     %r15,-160(%r15)
   0x0000020027644596 <+22>:    lg      %r11,600(%r2)
   0x000002002764459c <+28>:    lg      %r1,0(%r1)
   0x00000200276445a2 <+34>:    stg     %r1,192(%r11)
   0x00000200276445a8 <+40>:    lghi    %r1,0
   0x00000200276445ac <+44>:    stg     %r1,136(%r11)
   0x00000200276445b2 <+50>:    jne     0x20027644622 <walkStackForExceptionThrow+162>
   0x00000200276445b6 <+54>:    stg     %r1,152(%r11)
   0x00000200276445bc <+60>:    stg     %r1,160(%r11)
   0x00000200276445c2 <+66>:    lghi    %r1,4
   0x00000200276445c6 <+70>:    stg     %r1,168(%r11)
=> 0x00000200276445cc <+76>:    llgf    %r1,0(%r3)
   0x00000200276445d2 <+82>:    stg     %r3,232(%r11)
   0x00000200276445d8 <+88>:    stg     %r2,8(%r11)
   0x00000200276445de <+94>:    llilh   %r3,42028
   0x00000200276445e2 <+98>:    stg     %r3,16(%r11)
   0x00000200276445e8 <+104>:   nill    %r1,65280
   0x00000200276445ec <+108>:   stg     %r1,176(%r11)
   0x00000200276445f2 <+114>:   lghi    %r1,0
   0x00000200276445f6 <+118>:   stg     %r1,56(%r2)
   0x00000200276445fc <+124>:   lg      %r1,8(%r2)
   0x0000020027644602 <+130>:   lgr     %r3,%r11
   0x0000020027644606 <+134>:   lg      %r1,8872(%r1)
   0x000002002764460c <+140>:   basr    %r14,%r1
   0x000002002764460e <+142>:   lg      %r4,272(%r15)
   0x0000020027644614 <+148>:   lg      %r2,232(%r11)
   0x000002002764461a <+154>:   lmg     %r11,%r15,248(%r15)
   0x0000020027644620 <+160>:   br      %r4
   0x0000020027644622 <+162>:   lghi    %r1,1
   0x0000020027644626 <+166>:   stg     %r1,152(%r11)
   0x000002002764462c <+172>:   lghi    %r1,0
   0x0000020027644630 <+176>:   stg     %r1,160(%r11)
   0x0000020027644636 <+182>:   lghi    %r1,4
   0x000002002764463a <+186>:   stg     %r1,168(%r11)
   0x0000020027644640 <+192>:   llgf    %r1,0(%r3)
   0x0000020027644646 <+198>:   stg     %r3,232(%r11)
   0x000002002764464c <+204>:   stg     %r2,8(%r11)
   0x0000020027644652 <+210>:   llilh   %r3,1060
   0x0000020027644656 <+214>:   stg     %r3,16(%r11)
   0x000002002764465c <+220>:   nill    %r1,65280
   0x0000020027644660 <+224>:   stg     %r1,176(%r11)
   0x0000020027644666 <+230>:   j       0x200276445f2 <walkStackForExceptionThrow+114>
End of assembler dump.
(gdb) info registers
pswm           0x705000180000000        505810539591499776
pswa           0x200276445cc    2199684138444
r0             0x0      0
r1             0x4      4
r2             0x207ed00        34073856
r3             0x0      0
r4             0x0      0
r5             0x20028030320    2199694541600
r6             0x22a5200        36327936
r7             0xbf78c17d0      51397793744
r8             0x207ed00        34073856
r9             0x0      0
r10            0x0      0
r11            0x207f008        34074632
r12            0x229b900        36288768
r13            0x20027711d68    2199684980072
r14            0x2002759b2f4    2199683445492
r15            0x201be714ee8    2206513319656
acr0           0x201    513
acr1           0xbe716910       3195103504
acr2           0x0      0
acr3           0x0      0
acr4           0x0      0
acr5           0x0      0
acr6           0x0      0
acr7           0x0      0
acr8           0x0      0
acr9           0x0      0
acr10          0x0      0
acr11          0x0      0
acr12          0x0      0
acr13          0x0      0
acr14          0x0      0
acr15          0x0      0
fpc            0x8ff00  589568
orig_r2        0xc1     193
last_break     0x2002759b2ee    0x2002759b2ee <VM_BytecodeInterpreter::run(J9VMThread*)+1782>
system_call    0x0      0
tdb0           <unavailable>
tac            <unavailable>
tct            <unavailable>
atia           <unavailable>
tr0            <unavailable>
tr1            <unavailable>
tr2            <unavailable>
tr3            <unavailable>
tr4            <unavailable>
tr5            <unavailable>
tr6            <unavailable>
tr7            <unavailable>
tr8            <unavailable>
tr9            <unavailable>
tr10           <unavailable>
tr11           <unavailable>
tr12           <unavailable>
tr13           <unavailable>
tr14           <unavailable>
tr15           <unavailable>
gsd            0x400    1024
gssm           0x0      0
gsepla         0x207f760        0x207f760
bc_gsd         <unavailable>
bc_gssm        <unavailable>
bc_gsepla      <unavailable>
pc             0x200276445cc    0x200276445cc <walkStackForExceptionThrow+76>
cc             0x0      0
(gdb)

Does R3 == 0 mean a NULL exception object is passed to the function?
https://github.com/eclipse/openj9/blob/60342c2c6a73813a5efe6022c2c7f26b5edf314d/runtime/vm/exceptionsupport.c#L793

z jit userRaised

All 26 comments

Does R3 == 0 mean a NULL exception object is passed to the function?

That's right.

Can you make the core available?

@gacholio Thank you for your immediate response. Unfortunately we cannot provide a core since the target workload is under a non-disclosure agreement.

You can run DDR on the cores yourself to look at the stack of the crashing thread. It's very possible the top few frames will be system classes, and could be disclosed here.

To run DDR, run bin/jdmpview -core corefile. Once in, do a !gpinfo which will tell you the crashing thread (hex value following !j9vmthread, then run !stackslots thread to get the full stack dump.

@gacholio Thank you for your help. The top frame is shown below. The next fame seems to be customer code:

> !stackslots 0x22fde00
<22fde00> *** BEGIN STACK WALK, flags = 00400001 walkThread = 0x00000000022FDE00 ***
<22fde00>       ITERATE_O_SLOTS
<22fde00>       RECORD_BYTECODE_PC_OFFSET
<22fde00> Initial values: walkSP = 0x0000000002303C18, PC = 0x0000000000000005, literals = 0x0000000000000000, A0 = 0x0000000002303C38, j2iFrame = 0x0000000002303E60, ELS = 0x000002012CAE7A10, decomp = 0x0000000000000000
<22fde00> JIT resolve frame: bp = 0x0000000002303C38, sp = 0x0000000002303C18, pc = 0x0000000000000005, cp = 0x0000000000000000, arg0EA = 0x0000000002303C38, flags = 0x0000000000080000
<22fde00>       Object push (savedJITException)
<22fde00>               O-Slot[0x0000000002303C18] = 0x0000000000000000
<22fde00>       Generic resolve
<22fde00>       unwindSP initialized to 0x0000000002303C40
<22fde00>       JIT-Resolve-RegisterMap[0x000002012CAE7660] = UDATA(0x0000000000000010) (jit_r0)
<22fde00>       JIT-Resolve-RegisterMap[0x000002012CAE7668] = UDATA(0x0000000000000000) (jit_r1)
<22fde00>       JIT-Resolve-RegisterMap[0x000002012CAE7670] = UDATA(0x0000000000000000) (jit_r2)
<22fde00>       JIT-Resolve-RegisterMap[0x000002012CAE7678] = UDATA(0x0000000910D4A770) (jit_r3)
<22fde00>       JIT-Resolve-RegisterMap[0x000002012CAE7680] = UDATA(0x000000000000003C) (jit_r4)
<22fde00>       JIT-Resolve-RegisterMap[0x000002012CAE7688] = UDATA(0x0000000002303C40) (jit_r5)
<22fde00>       JIT-Resolve-RegisterMap[0x000002012CAE7690] = UDATA(0x00000000021E9800) (jit_r6)
<22fde00>       JIT-Resolve-RegisterMap[0x000002012CAE7698] = UDATA(0x00000000021E9800) (jit_r7)
<22fde00>       JIT-Resolve-RegisterMap[0x000002012CAE76A0] = UDATA(0x000000091350F1E0) (jit_r8)
<22fde00>       JIT-Resolve-RegisterMap[0x000002012CAE76A8] = UDATA(0x00000000009B5600) (jit_r9)
<22fde00>       JIT-Resolve-RegisterMap[0x000002012CAE76B0] = UDATA(0x000000091350C760) (jit_r10)
<22fde00>       JIT-Resolve-RegisterMap[0x000002012CAE76B8] = UDATA(0x0000000907B2D480) (jit_r11)
<22fde00>       JIT-Resolve-RegisterMap[0x000002012CAE76C0] = UDATA(0x0000000907B2D480) (jit_r12)
<22fde00>       JIT-Resolve-RegisterMap[0x000002012CAE76C8] = UDATA(0x00000000022FDE00) (jit_r13)
<22fde00>       JIT-Resolve-RegisterMap[0x000002012CAE76D0] = UDATA(0x00000200411F1F78) (jit_r14)
<22fde00>       JIT-Resolve-RegisterMap[0x000002012CAE76D8] = UDATA(0x0000000000000000) (jit_r15)

Not enough to say anything definitive, but this implies the NULL exception is being thrown from jitted code.

@gacholio Thank you very much for your suggestion. It turned out that excluding the corresponding method from JIT compilation resolves the issue. Also, the following compiation error has been observed:

genIL() returned 1
         detected catch block_12 with real predecessors
Compilation Failed Because: Catch blocks have real predecessors

How can I dump the offending catch block into a log file?

@andrewcraik This looks to be moving into more your area...

Ok - will wait for the rest of the analysis - it sounds familiar. Does it occur with head OpenJ9 (eg the latest nightly)? I ask because we had some weird problems like this due to some bad OSR metadata which was fixed by https://github.com/eclipse/openj9/pull/7531 and this symptom looks a lot like some of the weird things we were seeing due to that...

The offending method, which is generated by bytecode instrumentatin, has the following control flow and exception handlers:

      121, JBreturn1
      122, JBaload1
      123, JBifnull                 8,  131,
      126, JBaload1
      127, JBiconst2
      128, JBinvokevirtual         62
      131, JBathrow

ExceptionRange: start [      2d] end [      79] handler [      83] type [      35]
ExceptionRange: start [      2d] end [      79] handler [      83] type [      36]
ExceptionRange: start [      2d] end [      79] handler [      7a] type [       0]

There is a control flow from the bytecode indices 123 JBifnull and 128 JBinvokevirtual of the last catch block (starting from the bytecode index 122 JBaload1) to the bytecode index 131 JBathrow of the first catch block. Compilation of the offending method fails at genIL() due to the existence of real predecessors for the last catch block. However, when compiling another caller method, inlining the offending method somehow does not fail, and this might cause incorrect dataflow analysis and code generation of the caller method.

@cathyzhyi could you take a look at this when you get a moment?

Compilation of the offending method fails at genIL() due to the existence of real predecessors for the last catch block. However, when compiling another caller method, inlining the offending method somehow does not fail

Hi @inatatsu, does the crash happens when another caller method inline the offending method successfully? If it is such a case, is it possible that you share the tracing log of the caller method by adding -Xjit:{SignatureOfTheCallerMethod}(tracefull,traceinlining,log=log)?

Also, could you post here more frames from output of !stackslots 0x22fde00?

@cathyzhyi any update?

@andrewcraik I think there is not enough data to proceed with the investigation. The offending method is detected to have a catch with real predecessors. JIT detected the case and aborted the compilation so it's still not clear why the failure occurred.

@cathyzhyi Sorry for my late response. I guess the offending method itself was not JIT compiled, but it was incorrectly inlined into another JIT compiled caller method. As explained above, I still cannot post the trace due to a non-disclosure agreement, but will discuss in our team how to send a trace for further analysis.

This failure should be fixed by https://github.com/eclipse/omr/pull/4691

@cathyzhyi Thank you very much. How can I try the fixed compiler?

The OMR change has merged - the fix will propagate to an AdoptOpenJDK OpenJ9 nightly build within the next couple of days all being well. That will be the easiest way to try the fix unless you want to do your open build.

@andrewcraik @cathyzhyi Thank you very much! How can I confirm the commit level of a nightly build of AdoptOpenJDK OpenJ9?

the commit sha is shown on the java -version output

@andrewcraik Does the OMR fix for this need to be cherrypicked into 0.18?

@DanHeidinga a good thought - this is low risk so we should get it in to 0.18 - @cathyzhyi can you open the appropriate pull request on openj9-omr's 0.18 branch?

@DanHeidinga a good thought - this is low risk so we should get it in to 0.18 - @cathyzhyi can you open the appropriate pull request on openj9-omr's 0.18 branch?

@DanHeidinga added PR here https://github.com/eclipse/openj9-omr/pull/45

The fix is merged to master and the 0.18.0 branch, I'll go ahead and close this.

@andrewcraik Which image tag should I use to try the latest nightly build? I was using adoptopenjdk/openjdk8-openj9 on s390x and it seems to be not yet updated:
https://hub.docker.com/layers/adoptopenjdk/openjdk8-openj9/latest/images/sha256-c66c260157a5949bed1ea25259d8e0dd2bf338b03f2ddb06d838a24910823a28
It is convenient that the image is based on the same distribution (ubuntu, debian, etc) as the latest since the application image is built on top of it.

The target application had successfully run by using the updated libj9jit29.so 馃帀 Thank you very much everyone!

Was this page helpful?
0 / 5 - 0 ratings