Hi,
We are an IBM internal team, we have been seeing a JVM crash issue and we were able to reproduce it with the following 2 Java versions:
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-b09)
Eclipse OpenJ9 VM (build openj9-0.17.0, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20191017_442 (JIT enabled, AOT enabled)
OpenJ9 - 77c1cf708
OMR - 20db4fbc
JCL - 97b5ec8f383 based on jdk8u232-b09)
openjdk version "1.8.0_242"
OpenJDK Runtime Environment (build 1.8.0_242-b08)
Eclipse OpenJ9 VM (build openj9-0.18.1, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20200122_511 (JIT enabled, AOT enabled)
OpenJ9 - 51a5857d2
OMR - 7a1b0239a
JCL - 8cf8a30581 based on jdk8u242-b08)
JVM crashed with a "Segmentation error" while running one of our applications inside a docker container with the latest stable release of the OpenJ9 (1.8.0_242). The application itself does not log any error to stdout, the JVM simply crashes. (please check the attached stderr logs).
This happened for multiple times when we triggered a series of ElasticSearch index refresh actions in our application, but there wasn't a clear pattern when it happens, it's not consistently happening.
We suspect this is a bug in the OpenJ9 implementation of the JDK as we have tested the same scenario of same application with "hotspot" implementation in our docker image and cannot reproduce the crash.
Runtime Environment:
Docker containers with Ubuntu 16.04.6 LTS (Xenial Xerus) image, managed by Mesos + Marathon, running on Ubuntu 16.04.6 LTS dedicated VMs on IBM Cloud.
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007FB2491A2710 Handler2=00007FB248A87E80 InaccessibleAddress=0000000000000000
RDI=00007FB21C028E40 RSI=0000000755BF86C5 RAX=00007FB213F710C0 RBX=00007FB21C028E40
RCX=0000000000000050 RDX=0000000755BF8870 R8=0000000000000050 R9=0000000000000050
R10=0000000002631688 R11=00007FB225B63210 R12=0000000000000000 R13=00007FB22400283C
R14=0000000755BF8870 R15=0000000755BF86C5
RIP=0000000000000000 GS=0000 FS=0000 RSP=00007FB224002778
EFlags=0000000000010246 CS=0033 RBP=00007FB22400283C ERR=0000000000000014
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000000
xmm0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm1 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm2 9268d30e0eb0cb3f (f: 246467392.000000, d: -5.494049e-220)
xmm3 e8d0b5910aa77679 (f: 178747008.000000, d: -7.806457e+196)
xmm4 11b8f9eda0d29930 (f: 2698156288.000000, d: 2.699051e-223)
xmm5 abbaf804ae675d4b (f: 2926009600.000000, d: -4.932004e-98)
xmm6 d0e91942b8da06fc (f: 3101296384.000000, d: -5.951955e+81)
xmm7 eb5ae7d55107b208 (f: 1359458816.000000, d: -1.382101e+209)
xmm8 71df3366f3c07d28 (f: 4089478400.000000, d: 3.250739e+240)
xmm9 0000000000000006 (f: 6.000000, d: 2.964394e-323)
xmm10 4b2657808bdeacf4 (f: 2346626304.000000, d: 1.069960e+54)
xmm11 c4d4f46def1b08e6 (f: 4011526400.000000, d: -3.958250e+23)
xmm12 bcd6e00000000000 (f: 0.000000, d: -1.269818e-15)
xmm13 bfa01346f6ef8000 (f: 4142891008.000000, d: -3.139707e-02)
xmm14 40357cd0e7026640 (f: 3875694080.000000, d: 2.148756e+01)
xmm15 3c330a683871ff68 (f: 946995072.000000, d: 1.032196e-18)
Target=2_90_20191017_442 (Linux 4.4.0-171-generic)
CPU=amd64 (8 logical CPUs) (0x7daada000 RAM)
----------- Stack Backtrace -----------
(0x00007FB248AB6062 [libj9prt29.so+0x4e062])
(0x00007FB248A88BE3 [libj9prt29.so+0x20be3])
(0x00007FB248AB60DE [libj9prt29.so+0x4e0de])
(0x00007FB248AB61D4 [libj9prt29.so+0x4e1d4])
(0x00007FB248A88BE3 [libj9prt29.so+0x20be3])
(0x00007FB248AB5F3B [libj9prt29.so+0x4df3b])
(0x00007FB2491A2214 [libj9vm29.so+0x9f214])
(0x00007FB248A88BE3 [libj9prt29.so+0x20be3])
(0x00007FB2491A2485 [libj9vm29.so+0x9f485])
(0x00007FB248A880AA [libj9prt29.so+0x200aa])
(0x00007FB24AE18390 [libpthread.so.0+0x11390])
---------------------------------------
The above info was from an earlier occurrence of the issue, it might not contain enough info for troubleshooting. As suggested by folks in java-at-ibm Slack channel, we followed the Mustgather guide in https://www.ibm.com/support/pages/node/344411 and got the java core dump, snap trace, etc during further tests. We can provide these files and part of the stdout of our application via a Box folder or a separate email to the assignee of this issue.
We firstly raised this issue in java-at-ibm Slack channel and then were routed to creating a Salesforce case (case ID: TS003386378), and then we got from IBM Support that it's better to raise an issue in Github openj9-openjdk-jdk8 repo and @jerrylui803 might be able to help, this is the issue we opened there https://github.com/ibmruntimes/openj9-openjdk-jdk8/issues/378. Then we got from that openj9-openjdk-jdk8 issue that we need to open an issue here. We appreciate the help from anyone works in this area or who has met the similar issue before. Thanks!
Yang
@yangzhang-ibm-au can you provide the diagnosis file in a box file (https://www.box.com) or another public accessible location?
Or you could send it to me directly via slack ID U88PN76TY.
In addition, is there a way to reproduce it in a standalone testcase?
@JasonFengJ9 assigning to you initially as you volunteered to get access to the crash data.
The system core file is truncated and not workable. This is what I found from the javacore file.
1XMTHDINFO Thread Details
NULL
3XMTHREADINFO "Finalizer thread" J9VMThread:0x0000000002302000, omrthread_t:0x00007F2A63586AB0, java/lang/Thread:0x0000000500057DB0, state:R, prio=5
3XMJAVALTHREAD (java/lang/Thread getId:0x18, isDaemon:true)
3XMTHREADINFO1 (native thread ID:0x24, native priority:0x5, native policy:UNKNOWN, vmstate:R, vm thread flags:0x404000a1)
3XMTHREADINFO2 (native stack address range from:0x00007F2A632EE000, to:0x00007F2A6332F000, size:0x41000)
3XMCPUTIME CPU usage total: 0.511893061 secs, current category="Application"
3XMHEAPALLOC Heap bytes allocated since last GC cycle=11960 (0x2EB8)
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at jdk/crypto/jniprovider/NativeCrypto.CBCUpdate(Native Method)
4XESTACKTRACE at com/sun/crypto/provider/NativeCipherBlockChaining.encrypt(NativeCipherBlockChaining.java:242(Compiled Code))
4XESTACKTRACE at com/sun/crypto/provider/CipherCore.update(CipherCore.java:852(Compiled Code))
4XESTACKTRACE at com/sun/crypto/provider/AESCipher.engineUpdate(AESCipher.java:410(Compiled Code))
4XESTACKTRACE at javax/crypto/Cipher.update(Cipher.java:1944(Compiled Code))
4XESTACKTRACE at sun/security/ssl/CipherBox.encrypt(CipherBox.java:323(Compiled Code))
4XESTACKTRACE at sun/security/ssl/OutputRecord.encrypt(OutputRecord.java:264(Compiled Code))
4XESTACKTRACE at sun/security/ssl/SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:862(Compiled Code))
4XESTACKTRACE at sun/security/ssl/SSLSocketImpl.writeRecord(SSLSocketImpl.java:850(Compiled Code))
4XESTACKTRACE at sun/security/ssl/SSLSocketImpl.writeRecord(SSLSocketImpl.java:720(Compiled Code))
4XESTACKTRACE at sun/security/ssl/SSLSocketImpl.sendAlert(SSLSocketImpl.java:2066(Compiled Code))
4XESTACKTRACE at sun/security/ssl/SSLSocketImpl.warning(SSLSocketImpl.java:1893(Compiled Code))
4XESTACKTRACE at sun/security/ssl/SSLSocketImpl.closeInternal(SSLSocketImpl.java:1656(Compiled Code))
5XESTACKTRACE (entered lock: sun/security/ssl/SSLSocketImpl@0x00000007E910EA00, entry count: 1)
4XESTACKTRACE at sun/security/ssl/SSLSocketImpl.close(SSLSocketImpl.java:1594)
4XESTACKTRACE at org/apache/http/impl/BHttpConnectionBase.close(BHttpConnectionBase.java:332)
4XESTACKTRACE at org/apache/http/impl/conn/LoggingManagedHttpClientConnection.close(LoggingManagedHttpClientConnection.java:81)
4XESTACKTRACE at org/apache/http/impl/conn/CPoolEntry.closeConnection(CPoolEntry.java:70)
4XESTACKTRACE at org/apache/http/impl/conn/CPoolEntry.close(CPoolEntry.java:96)
4XESTACKTRACE at org/apache/http/pool/AbstractConnPool.shutdown(AbstractConnPool.java:149)
4XESTACKTRACE at org/apache/http/impl/conn/PoolingHttpClientConnectionManager.shutdown(PoolingHttpClientConnectionManager.java:411)
4XESTACKTRACE at org/apache/http/impl/conn/PoolingHttpClientConnectionManager.finalize(PoolingHttpClientConnectionManager.java:206)
4XESTACKTRACE at java/lang/J9VMInternals.runFinalize(J9VMInternals.java:285(Compiled Code))
3XMTHREADINFO3 Native callstack:
4XENATIVESTACK (0x00007F2A87DDF062 [libj9prt29.so+0x4e062])
4XENATIVESTACK (0x00007F2A87DB1BE3 [libj9prt29.so+0x20be3])
4XENATIVESTACK (0x00007F2A87DDF0DE [libj9prt29.so+0x4e0de])
4XENATIVESTACK (0x00007F2A87DDF1D4 [libj9prt29.so+0x4e1d4])
4XENATIVESTACK (0x00007F2A87DB1BE3 [libj9prt29.so+0x20be3])
4XENATIVESTACK (0x00007F2A87DDEF3B [libj9prt29.so+0x4df3b])
4XENATIVESTACK (0x00007F2A87DDB572 [libj9prt29.so+0x4a572])
4XENATIVESTACK (0x00007F2A87DDC334 [libj9prt29.so+0x4b334])
4XENATIVESTACK (0x00007F2A87DB1BE3 [libj9prt29.so+0x20be3])
4XENATIVESTACK (0x00007F2A87960B96 [libj9dmp29.so+0x1ab96])
4XENATIVESTACK (0x00007F2A87960D2D [libj9dmp29.so+0x1ad2d])
4XENATIVESTACK (0x00007F2A87DB1BE3 [libj9prt29.so+0x20be3])
4XENATIVESTACK (0x00007F2A8795D41D [libj9dmp29.so+0x1741d])
4XENATIVESTACK (0x00007F2A879586FD [libj9dmp29.so+0x126fd])
4XENATIVESTACK (0x00007F2A87DB1BE3 [libj9prt29.so+0x20be3])
4XENATIVESTACK (0x00007F2A87959AA0 [libj9dmp29.so+0x13aa0])
4XENATIVESTACK (0x00007F2A8796313C [libj9dmp29.so+0x1d13c])
4XENATIVESTACK (0x00007F2A8794ADAD [libj9dmp29.so+0x4dad])
4XENATIVESTACK (0x00007F2A8794A3A5 [libj9dmp29.so+0x43a5])
4XENATIVESTACK (0x00007F2A87DB1BE3 [libj9prt29.so+0x20be3])
4XENATIVESTACK (0x00007F2A8794DA4B [libj9dmp29.so+0x7a4b])
4XENATIVESTACK (0x00007F2A8794DBDF [libj9dmp29.so+0x7bdf])
4XENATIVESTACK (0x00007F2A87964E4B [libj9dmp29.so+0x1ee4b])
4XENATIVESTACK (0x00007F2A8794FDCE [libj9dmp29.so+0x9dce])
4XENATIVESTACK (0x00007F2A8A141390 [libpthread.so.0+0x11390])
4XENATIVESTACK gsignal+0x38 (0x00007F2A8997F428 [libc.so.6+0x35428])
4XENATIVESTACK abort+0x16a (0x00007F2A8998102A [libc.so.6+0x3702a])
4XENATIVESTACK (0x00007F2A64FA345F [libcrypto.so.1.0.0+0x6245f])
4XENATIVESTACK EVP_DecryptUpdate+0x158 (0x00007F2A650611C8 [libcrypto.so.1.0.0+0x1201c8])
4XENATIVESTACK Java_jdk_crypto_jniprovider_NativeCrypto_CBCUpdate+0x88 (0x00007F2A6538D298 [libjncrypto.so+0x2298])
4XENATIVESTACK (0x00007F2A665AEE72 [<unknown>+0x0])
@yangzhang-ibm-au can you give a try with -Djdk.nativeCBC=false?
Looks like a repeat of https://github.com/eclipse/openj9/issues/8181
@ashbm5
It appears openj9-0.18.1 doesn't have https://github.com/ibmruntimes/openj9-openjdk-jdk8/pull/356.
@yangzhang-ibm-au please verify if the problem can be reproduced with latest nightly build [1].
Thanks for the kindly help from everyone!
@JasonFengJ9 Thanks for the help! I have invited you to a Box folder which contains the core dump, java core, snap trace files, etc.
Do you want me to try the latest nightly build with-Djdk.nativeCBC=false? And is it a general practice to use that option for all our applications regardless of this issue?
-Djdk.nativeCBC=false turns off the feature which is causing this problem. The problem should already fixed in the nightly builds, and for the next OpenJ9 Java 8 release (0.20.0 in April), but the fix isn't in the 0.18 release which you have tested.
@pshipton Thanks for the info! Glad to know that it's already fixed. I've started the testing with -Djdk.nativeCBC=false. And we will try the latest nightly build if necessary.
Note that if you are using cryptography, -Djdk.nativeCBC=false may have an affect on performance.
https://www.eclipse.org/openj9/docs/djdknativecbc/
@pshipton Thanks for the reminder!
We did some tests with the -Djdk.nativeCBC=false option and haven't seen any JVM crash yet. We will do more tests to see if there is any significant impact on performance.
I have invited you to a Box folder which contains the core dump, java core, snap trace files, etc.
It appears I didn't get any email notification.
Since the workaround has been identified, the diagnostics files might not be required.
@yangzhang-ibm-au did you have chance to check if latest nightly build works with your application?
@JasonFengJ9 Are you using this email [email protected]? I added that email to the Box folder but not sure why you didn't get the invitation. If that's the right email address I can email you the diagnostics files directly.
We haven't tried the latest nightly build as we thought there might be some potential risk of using a non-stable release.
@JasonFengJ9 We wanted to use the -Djdk.nativeCBC=false option until next stable release in Apr unless that option causes some significant impact to performance or any other issue. Do you think the performance impact of that option can be very obvious?
@ashbm5 can you make a statement about the performance impact of using -Djdk.nativeCBC=false
@yangzhang-ibm-au that's the right email address. Please send me the box link directly instead.
The reason I asked about the latest nightly build is that passing without -Djdk.nativeCBC=false can confirm this issue has been fixed in current release. I would appreciate your time to run the test and determine whether latest builds pass your tests.
@JasonFengJ9 Glad to know that the Box folder is now accessible to you via the link I shared in email.
Yes, it makes sense to confirm the issue is fixed in latest nightly build. We will try to verify it once our environment is ready do that.
@yangzhang-ibm-au The system core attached seems truncated, it is only ~3M size while javacore shows JRE: 13,859,984,648 bytes / 24235 allocations. The stacktrace does look very close to the known issue https://github.com/eclipse/openj9/issues/8181 which was the base of early investigation.
It is expected that this issue is already fixed in current nightly builds.
@JasonFengJ9 Thanks for the update! I did the changes to our docker image to use the nightly build you provided (OpenJDK8U-jdk_x64_linux_openj9_2020-02-24-12-06.tar.gz) and it's currently pending approval. Hopefully I can get it tested on Monday or Tuesday, will let you know how it goes.
@JasonFengJ9 We have done some tests with the nightly build (OpenJDK8U-jdk_x64_linux_openj9_2020-02-24-12-06.tar.gz) and the issue never happened again. So we are going to upgrade to the next stable release once it's available (hopefully in April), before that, we will use the current stable release (1.8.0_242-b08) plus the -Djdk.nativeCBC=false option as a workaround.
Thanks again to everyone for the help!