With RC15, and it's defaulting to -XX:+UseJVMCINativeLibrary, we're seeing crashes when using -XX:+UseG1GC on some of our OSX machines, including Mojave and High Sierra.
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x000000010ee4dc26, pid=24193, tid=0x0000000000005103
#
# JRE version: OpenJDK Runtime Environment (8.0_202-b08) (build 1.8.0_202-20190206132754.buildslave.jdk8u-src-tar--b08)
# Java VM: OpenJDK GraalVM CE 1.0.0-rc15 (25.202-b08-jvmci-0.58 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# V [libjvm.dylib+0x24dc26]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/.../hs_err_pid24193.log
Compiled method (JVMCI) 18800 11827 4 org.graalvm.compiler.truffle.runtime.OptimizedCallTarget::callRoot (68 bytes)
total in heap [0x0000000115d79ed0,0x0000000115d7aa08] = 2872
relocation [0x0000000115d7a000,0x0000000115d7a048] = 72
main code [0x0000000115d7a060,0x0000000115d7a340] = 736
stub code [0x0000000115d7a340,0x0000000115d7a360] = 32
oops [0x0000000115d7a360,0x0000000115d7a3e8] = 136
metadata [0x0000000115d7a3e8,0x0000000115d7a690] = 680
scopes data [0x0000000115d7a690,0x0000000115d7a818] = 392
scopes pcs [0x0000000115d7a818,0x0000000115d7a8e8] = 208
dependencies [0x0000000115d7a8e8,0x0000000115d7a990] = 168
handler table [0x0000000115d7a990,0x0000000115d7a9c0] = 48
nul chk table [0x0000000115d7a9c0,0x0000000115d7a9e0] = 32
JVMCI data [0x0000000115d7a9e0,0x0000000115d7aa08] = 40
These crashes no longer happen with -XX:-UseJVMCINativeLibrary.
Would you be able to attach hs_err_pid24193.log to this issue? That may offer some clues as to what is going wrong. Even more helpful would be steps to reproduce the crash.
Using the default collector should be a workaround until this is resolved.
@dougxc Unfortunately no longer have that exact file, but I've attached another one that occurred during a test I was running.
hs_err_pid21035_redacted.log
Here is another one, which occurred when debug running a local version of our server, accompanied by this on the console:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x000000011004dc26, pid=30930, tid=0x0000000000005403
#
# JRE version: OpenJDK Runtime Environment (8.0_202-b08) (build 1.8.0_202-20190206132754.buildslave.jdk8u-src-tar--b08)
# Java VM: OpenJDK GraalVM CE 1.0.0-rc15 (25.202-b08-jvmci-0.58 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# V [libjvm.dylib+0x24dc26] G1ParScanThreadState::copy_to_survivor_space(InCSetState, oopDesc*, markOopDesc*)+0x44
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/...tomcat/bin/hs_err_pid30930.log
Compiled method (JVMCI) 210906 20315 4 org.graalvm.compiler.truffle.runtime.OptimizedCallTarget::callRoot (68 bytes)
total in heap [0x0000000119cd49d0,0x0000000119da25e8] = 842776
relocation [0x0000000119cd4b00,0x0000000119cd6ea0] = 9120
main code [0x0000000119cd6ea0,0x0000000119d040c0] = 184864
stub code [0x0000000119d040c0,0x0000000119d04880] = 1984
oops [0x0000000119d04880,0x0000000119d05958] = 4312
metadata [0x0000000119d05958,0x0000000119d07c10] = 8888
scopes data [0x0000000119d07c10,0x0000000119d9a880] = 601200
scopes pcs [0x0000000119d9a880,0x0000000119da0070] = 22512
dependencies [0x0000000119da0070,0x0000000119da0c58] = 3048
handler table [0x0000000119da0c58,0x0000000119da18b8] = 3168
nul chk table [0x0000000119da18b8,0x0000000119da25b8] = 3328
JVMCI data [0x0000000119da25b8,0x0000000119da25e8] = 48
Do you know if the crashes still occur without libgraal (i.e., -XX:-UseJVMCINativeLibrary)?
These crashes only occur when using libgraal.
Sorry, I just saw that you stated that earlier. Do you only get the crashes with G1, not the (default on 8) parallel collector?
Correct.. we see the crashes with G1, but not with the default collector.
@hashtag-smashtag where is this info coming from:
Compiled method (JVMCI) 210906 20315 4 org.graalvm.compiler.truffle.runtime.OptimizedCallTarget::callRoot (68 bytes)
I don't see it in the hs_err files.
To get more details, it would be great if you can add these options:
-XX:+UnlockDiagnosticVMOptions -XX:+VerifyBeforeGC -XX:+VerifyAfterGC -Dgraal.PrintCompilation=true -Dgraal.TraceTruffleCompilation=true
We need to find out which compilation is not producing the right GC info/barriers and why it only happens with libgraal and not JavaGraal. Note that these flags will also slow down execution considerably.
When I run the server, it often doesn't crash in the same way. Here are two hs_err files from runs with the options you specified:
There should also be a bunch of console output from the -Dgraal.PrintCompilation=true -Dgraal.TraceTruffleCompilation=true options. Is is possible for you to attach a (redacted) copy of that?
At this point, we're going to have to try and reproduce locally. In the meantime, I would suggest not using G1 with GraalVM for your workload.
It was easier to repro the crash on an integrated test (with the vm options you provided) and redact the console and logs:
console for hs_err_pid78925.log
Hope those help.
This should be fixed by https://github.com/graalvm/graal-jvmci-8/commit/a53131efce4e1c7d4e5aa3eefef504fee750e9f3 which did not make it into GraalVM RC16. It will be in the subsequent GraalVM release.
Please re-open if not fixed in subsequent GraalVM release.
Most helpful comment
This should be fixed by https://github.com/graalvm/graal-jvmci-8/commit/a53131efce4e1c7d4e5aa3eefef504fee750e9f3 which did not make it into GraalVM RC16. It will be in the subsequent GraalVM release.