Temurin-build: guarantee(d != NULL) failed: Null dominator info

Created on 23 Sep 2019  路  20Comments  路  Source: adoptium/temurin-build

I am using the docker image adoptopenjdk/openjdk13:jre-13_33-alpine on an AWS EC2 t3a.small (x86_64).

3 of my Kubernetes pods are crashed with error:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (loopnode.hpp:979), pid=1, tid=30
#  guarantee(d != NULL) failed: Null dominator info.
#
# JRE version: OpenJDK Runtime Environment (13.0+33) (build 13+33)
# Java VM: OpenJDK 64-Bit Server VM (13+33, mixed mode, sharing, tiered, compressed oops, shenandoah gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xb140a4]  PhaseIdealLoop::dom_depth(Node*) const [clone .isra.122]+0x94
#
# Core dump will be written. Default location: /opt/core
#
# An error report file with more information is saved as:
# /opt/hs_err_pid1.log
#
# Compiler replay data is saved as:
# /opt/replay_pid1.log
#
# If you would like to submit a bug report, please visit:
#   https://github.com/AdoptOpenJDK/openjdk-build/issues
#

[error occurred during error reporting (), id 0xb, SIGSEGV (0xb) at pc=0x00007efe29bab884]

Unfortunately attached log files are lost, I don't know how to reproduce this issue (we are using a non public software).

We are testing Shenandoah GC, we are detected some negative GC time before crashing.

image

image

Edit: startup command

java -server -Djdk.tls.rejectClientInitiatedRenegotiation=true -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:+AlwaysPreTouch -XX:+UseTransparentHugePages -XX:InitialRAMPercentage=50 -XX:MaxRAMPercentage=50

Reported to OpenJDK / JBS bug

All 20 comments

Without the logs, it's going to be challenging. A couple of things you can try:

1.) Try the adoptopenjdk.net/upstream.html JDK - it's the RI as built by Red Hat
2.) Trying running the app outside of a container (let's see if this is a container-related thing)
3.) Try running with G1 GC as opposed to Shenandoah (narrow down where the fault lies).

I made more checks.

Please ignore the graph, the negative value was caused by Pod reboot, so it is OK.

I replaced the GC with the default one and with G1, neither of this caused any crash, the line -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC is required for this issue.

I send a log, I really hope this helps you.
hs_err_pid27.log.gz

This issue did never occur on latest version of AdoptOpenJDK 12-hotspot or previous versions.

OK, I'll get a Shenandoah dev to report this upstream.

@fvasco Does this reproduce outside containers as well?

@jerboaa

Does this reproduce outside containers as well?

I did not test it.

Is there a reproducer one use to debug this? The crash log is not very helpful in itself.

Hi @shipilev,
No, there is not, I am sorry.

The issue occurs in our production environment after a connections spike.

The server uses a lot of stuff: thread pool, synchronize, locks, Kotlin coroutines and so on. Unfortunately I am not very lucky to point one.

I attach you the replay log,
please contact me for further information.

replay_pid27.log.gz

Thank you

Trying with fastdebug build, if possible, would be good to have extended diagnostics.

@fvasco: See JDK-8231405: we've got the reproducer with internal tests, no need to spend time to come up with another one. Thanks for bug report!

Will this issue fixed in 13.0.1?

Looking at the issue it appears that @shipilev is kindly doing so yes.

It is not solid until 13u maintainers agree.

So far the fix is known to be in 13.0.2 (release circa Jan 2020) and Red Hat 11.0.5 downstream (mid-Oct 2019). I asked Rob McKenna (current 13u maintainer) to pick it up to 13.0.1 (mid-Oct 2019), but still not sure if that would happen. If you are curious, you might try the nightly 13u builds: https://builds.shipilev.net/openjdk-jdk13/ or Docker image shipilev/openjdk:13.

I'm wondering if this fix made it to 13.0.1, does not look like it did.
It seems that with AdoptOpenJDK 13.0.1+9 we still consistently have similar crash after 20 minutes of app running, but don't see it with recent nightly builds.

The fix did not make it in upstream 13.0.1. However, there are few distributions that include it regardless. For example, Fedora includes it in 13.0.1.9, Liberica JDK includes it in 13.0.1.

Opening this again, we have to do a re-release here

Need to verify if this has made it into our builds or whether it needs to be moved again to March

The fix is in upstream 13.0.2 and 14, so it should be in your builds. (It is also in 8u242 and 11.0.5 Red Hat downstreams, so all current versions in all Shenandoah-enabled JDK releases should be good.)

Thanks - seems reasonable to assume that it's in all of ours now, so I'll close

Was this page helpful?
0 / 5 - 0 ratings

Related issues

andrew-m-leonard picture andrew-m-leonard  路  26Comments

tkie picture tkie  路  120Comments

keirlawson picture keirlawson  路  50Comments

karianna picture karianna  路  26Comments

gchauvet picture gchauvet  路  51Comments