Openj9: NPE when launching Elasticsearch

Created on 13 Nov 2019  路  13Comments  路  Source: eclipse/openj9

Java -version output

[dsouzai@localhost temp]$ /home/dsouzai/temp/jdk8u232-b09-openj9/bin/java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-b09)
Eclipse OpenJ9 VM (build openj9-0.17.0, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20191017_442 (JIT disabled, AOT disabled)
OpenJ9   - 77c1cf708
OMR      - 20db4fbc
JCL      - 97b5ec8f383 based on jdk8u232-b09)

Summary of problem

When trying to start up Elasticsearch (downloaded from [1]), a NPE is thrown. I narrowed it down to the java command that's called in elasticsearch-7.4.2/bin/elasticsearch-env; I added -Xint and it still fails:

[dsouzai@localhost temp]$ /home/dsouzai/temp/jdk8u232-b09-openj9/bin/java -Xint -cp '/home/dsouzai/temp/elasticsearch-7.4.2/lib/*' org.elasticsearch.tools.launchers.JvmOptionsParser /home/dsouzai/temp/elasticsearch-7.4.2/config/jvm.options
Exception in thread "main" java.lang.NullPointerException
    at org.elasticsearch.tools.launchers.JvmErgonomics.extractHeapSize(JvmErgonomics.java:126)
    at org.elasticsearch.tools.launchers.JvmErgonomics.choose(JvmErgonomics.java:58)
    at org.elasticsearch.tools.launchers.JvmOptionsParser.main(JvmOptionsParser.java:89)

The relevant java code can be found at [2].

If I run with Hotspot, there's no NPE that's thrown:

[dsouzai@localhost temp]$ /home/dsouzai/temp/jdk8u232-b09-hotspot/bin/java -cp '/home/dsouzai/temp/elasticsearch-7.4.2/lib/*' org.elasticsearch.tools.launchers.JvmOptionsParser /home/dsouzai/temp/elasticsearch-7.4.2/config/jvm.options
-Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.io.tmpdir=${ES_TMPDIR} -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=data -XX:ErrorFile=logs/hs_err_pid%p.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:logs/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=32 -XX:GCLogFileSize=64m -Dio.netty.allocator.type=unpooled -XX:MaxDirectMemorySize=536870912

[1] https://www.elastic.co/downloads/elasticsearch
[2] https://github.com/elastic/elasticsearch/blob/b8d85cf6a8db848a0ec2a9680c02a95fb0d9b25a/distribution/tools/launchers/src/main/java/org/elasticsearch/tools/launchers/JvmErgonomics.java#L115-L117

vm userRaised

Most helpful comment

Just to fully close the loop here: I just verified this problem has gone away and elasticsearch starts with the latest 2020-06-08 nightly build from AdoptOpenJDK.

All 13 comments

This looks to be looking for a very specific JVM option. The method right below in [2] seems to look for MaxDirectMemorySize which is present on the HotSpot run. This seems like an Elastic problem to me, at first glance since they should be querying an API to extract the maximum heap size rather than looking for it in an option.

It's using -XX:+PrintFlagsFinal which OpenJ9 doesn't support at this time.

I've stuck this into the 0.19 milestone, but no guarantees when we'll have time to implement the -XX:+PrintFlagsFinal option.

This looks to be looking for a very specific JVM option. The method right below in [2] seems to look for MaxDirectMemorySize which is present on the HotSpot run.

Sure, but the failing code is checking for MaxHeapSize which isn't present in the java options. So at the very least we shouldn't be NPEing.

It's using -XX:+PrintFlagsFinal which OpenJ9 doesn't support at this time.

@pshipton where did you see that option; I'm not finding it in the jvm.options file.

Sure, but the failing code is checking for MaxHeapSize which isn't present in the java options. So at the very least we shouldn't be NPEing.

Right because finalJvmOptions.get("MaxHeapSize") returns null because the option does not exist in OpenJ9 and we call .get() on the result which ends up being an NPE. Elastic is assuming the JVM will always have that option available when it prints the options via -XX:+PrintFlagsFinal. This is not valid beause there is an implicit assumption that the option will actually be printed, which is not the case with OpenJ9 as this option does not exist in the documentation.

Even if -XX:+PrintFlagsFinal is implemented Elastic will still throw the NPE because the MaxHeapSize option will not be printed for OpenJ9. Unless we already somehow map -Xmx to MaxHeapSize?

I hit this recently trying the latest 7.5.2 version of ES. I agree with the comments here, just adding some of my findings in case it helps in any way when working on this issue.

I verified the conflict with the JVM 'flavour' building my own distribution:

  1. Cloned ES github repository
  2. Built a distribution without JDK: ./gradlew :distribution:archives:no-jdk-darwin-tar:assemble --parallel
  3. Set the JAVA_HOME to point to an OpenJ9 JVM. My latest test with Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.15.1, JRE 12 Mac OS X amd64-64-Bit Compressed References 20190915_201 (JIT enabled, AOT enabled), although I have tried with other v11 with same result.

At this point the described NPE is thrown if I try to run the built distribution, even though I'd say this should be a supported configuration: https://www.elastic.co/support/matrix#matrix_jvm

I also got to this line of code https://github.com/eclipse/openj9/issues/7764#issuecomment-553621527 and it feels to me it's launching another jvm with something like java -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.io.tmpdir=/tmp -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintFlagsFinal -version

Anyway, I built the distribution again modifying that line of code to try to pass that point and then I hit a problem related with https://github.com/eclipse/openj9/issues/8195
Es has in config/jvm.options these options: -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m

Getting back to the support statement, regardless of https://github.com/eclipse/openj9/issues/7764#issuecomment-553615023 which sounds like a fix to this problem; I wonder if there is something to change in ElasticSearch itself, given its changes in the code that broke the compatiblity (I haven't had time to verify it but this seems to be caused by this change introduced in 7.2.0: https://github.com/elastic/elasticsearch/pull/41962)

Sadly, I am still seeing this problem even with the fix for #9567 in place, at least with the binary distribution of elasticsearch. I'm rebuilding version 7.7 to debug it, but it's clear this issue should be reopened :( .

The problem is the output stream. When Elasticsearch runs the JVM with -XX:+PrintFlagsFinal, all the output seems to be produced in the error stream (read via ProcessBuilder.getErrorStream() ) rather than on the standard output stream (read via ProcessBuilder.getInputStream() ).

In #9567, I used j9tty_printf() to output the print flags final output, which I would have thought would go to standard output but have since learned all VM output goes by default to stderr. I'll open a PR tomorrow with the change to print to stdout.

I also saw the -Xlog:gc error, though it seems to power through it.

Finally, I hit this problem, which is the first time I've seen it:

ElasticsearchSecurityException[failed to load SSL configuration [xpack.security.http.ssl]]; nested: ElasticsearchException[failed to initialize a TrustManagerFactory]; nested: IllegalStateException[failed to find a X509ExtendedTrustManager];
Likely root cause: java.lang.IllegalStateException: failed to find a X509ExtendedTrustManager

I opened #9725 to change the output stream, which resolves the specific problem referenced in this issue (and marked #9725 as fixing this issue).

I opened #9726 for the new issue and will document progress on this issue there.

Just to fully close the loop here: I just verified this problem has gone away and elasticsearch starts with the latest 2020-06-08 nightly build from AdoptOpenJDK.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

PowerUser1234 picture PowerUser1234  路  3Comments

AdamBrousseau picture AdamBrousseau  路  6Comments

mikezhang1234567890 picture mikezhang1234567890  路  5Comments

AdamBrousseau picture AdamBrousseau  路  6Comments

xliang6 picture xliang6  路  3Comments