Graal: [native image] crash with SIGSEGV

Created on 29 Nov 2019  路  21Comments  路  Source: oracle/graal

Hi guys,

I'm trying to build a simple "hello world" groovy app and compile using native-image (see https://github.com/mojo2012/groovy-native). Unfortunately the native-image tool crashes the JVM with a SIGSEGV and exit code 134:

[io.spotnext.groovynative.main:64539]    classlist:   7,661.82 ms
[io.spotnext.groovynative.main:64539]        (cap):   2,763.46 ms
[io.spotnext.groovynative.main:64539]        setup:   5,836.19 ms
[io.spotnext.groovynative.main:64539]   (typeflow):  40,512.11 ms
[io.spotnext.groovynative.main:64539]    (objects):  28,188.16 ms
[io.spotnext.groovynative.main:64539]   (features):   1,639.57 ms
[io.spotnext.groovynative.main:64539]     analysis:  71,823.00 ms
[io.spotnext.groovynative.main:64539]     (clinit):     878.60 ms
[io.spotnext.groovynative.main:64539]     universe:   2,814.27 ms
[io.spotnext.groovynative.main:64539]      (parse):   8,091.91 ms
[io.spotnext.groovynative.main:64539]     (inline):   7,152.93 ms
[io.spotnext.groovynative.main:64539]    (compile):  50,854.53 ms
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000010b2a5f3c, pid=64539, tid=42507
#
# JRE version: OpenJDK Runtime Environment (11.0.5+10) (build 11.0.5+10-jvmci-19.3-b05-LTS)
# Java VM: OpenJDK 64-Bit GraalVM CE 19.3.0 (11.0.5+10-jvmci-19.3-b05-LTS, mixed mode, sharing, tiered, jvmci, jvmci compiler, compressed oops, g1 gc, bsd-amd64)
# Problematic frame:
# V  [libjvm.dylib+0x6a5f3c]  Symbol::as_unicode(int&) const+0x10
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/matthias.fuchs.lokal/Projekte/privat/groovy-native/target/hs_err_pid64539.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
Error: Image build request failed with exit status 134
com.oracle.svm.driver.NativeImage$NativeImageError: Image build request failed with exit status 134
    at com.oracle.svm.driver.NativeImage.showError(NativeImage.java:1482)
    at com.oracle.svm.driver.NativeImage.build(NativeImage.java:1260)
    at com.oracle.svm.driver.NativeImage.performBuild(NativeImage.java:1222)
    at com.oracle.svm.driver.NativeImage.main(NativeImage.java:1181)
    at com.oracle.svm.driver.NativeImage$JDK9Plus.main(NativeImage.java:1665)

Here are the logs:
maven.log
hs_err_pid64539.log

Hope you can help me :-)

Btw also the proposed fix for this I found here in this repo doesn't work: https://github.com/wololock/graalvm-groovy-examples/tree/master/hello-world.
(This was linked in another issue)

cheers matthias

bug native-image

All 21 comments

Thank you for the report. We had a similar bug report recently and will investigate asap.

Yeap, I reported a similar error week ago. It looks like regression for Java 8 based version in my case: https://github.com/oracle/graal/issues/1863

@thomaswue could you, please, consider creation of some kind of community build for variety of OSS projects that uses native-image to detect regressions earlier before release cut?

Testing on a group of OSS projects is a good idea and something we should work towards.

Note that in this case, the crash is happening on JDK 11 which is something that was not available in previous GraalVM releases. That is, OSS project testing probably would not have caught it.

We have a number of OSS projects that we test against the master in our CI pipeline. We did catch some early issues and submitted patches to e.g., Quarkus and Netty projects, but of course we could expand the coverage. Additionally the Micronaut project runs a set of integration tests against GraalVM master.

Just a quick update. I've been able to reproduce it reliably but only with the libjvm.so from the official build. I've tried building from the same sources on both mac and linux and it doesn't reproduce with those bits for some unknown reason. When it occurs it's always MethodHandle.invokeBasic and we seem to get back a garbage value from the Method::name() function which pulls a Symbol* from the ConstantPool. When I look at the crash in the debugger and do the equivalent dereferences then I get back the right value for the Symbol*. So I'm still unclear why the crash is happening, but I'm investigating.

@tkrodriguez I had a couple of SIGSEGV errors with libjvm.so when running benchmarks. Should it be reported separately?

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fd62a32339a, pid=5623, tid=5640
#
# JRE version: Java(TM) SE Runtime Environment GraalVM LIBGRAAL_EE_BASH 19.3.0 (11.0.5+10) (build 11.0.5+10-jvmci-19.3-b05-LTS)
# Java VM: Java HotSpot(TM) 64-Bit Server VM GraalVM LIBGRAAL_EE_BASH 19.3.0 (11.0.5+10-jvmci-19.3-b05-LTS, mixed mode, sharing, tiered, jvmci, jvmci compiler, compressed oops, parallel gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0xcc339a]  oopDesc* PSPromotionManager::copy_to_survivor_space<false>(oopDesc*)+0x2da

Yes please.

This appears to be a long standing bug in the java_lang_StackTraceElement logic in HotSpot that's reading the version field from a ConstantPool that hasn't been redefined. The CDS support and JVMTI class file versioning logic are sharing this field so in the presence of CDS it can have an unexpected value. This causes some round tripping from a Method* through an InstanceKlass* to fail and we get a bogus answer back that causes a crash. The workaround is run with -J-XX:-UseSharedSpaces or to delete the classes.jsa file from your build.

When I use the flag -J-XX:-UseSharedSpaces I get the following error message:

Undefined symbols for architecture x86_64:
  "_Java_java_lang_invoke_MethodHandle_invokeBasic", referenced from:
      ___svm_cglobaldata_base in io.spotnext.groovynative.main.o
      ___svm_version_info in io.spotnext.groovynative.main.o
  "_jio_fprintf", referenced from:
      _findJavaTZ_md in libjava.a(TimeZone_md.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

See the full log for more infos: log.log

EDIT:
I also tried to remove the classes.jsa file from my installation (graalvm-ce-java11-19.3.0/Contents/Home/lib/server/classes.jsa) without success.

That's a separate issue that's I believe is a regression in the support for -H:+ReportUnsupportedElementsAtRuntime having to do with static linking. I'm not sure if there's a github issue for that at the moment. maybe @vjovanov knows?

@christianwimmer @olpaw could this ___svm_cglobaldata_base problem looks to be caused by the lambda used in com.oracle.svm.core.c.CGlobalDataFactory.createCString(String, String)?

Due to the com.oracle.svm.hosted.phases.IntrinsifyMethodHandlesInvocationPlugin that we use for SubstrateVM there should be no lambdas/method handles left. AFAIK, we reduce them all to plain invocations. And if that is not possible we either report an error message (_... Invoke with MethodHandle argument could not be reduced to at most a single call ..._) at buildtime or if ReportUnsupportedElementsAtRuntime is used we replace the method handle invoke with a simple call to a stub that reports the error at runtime.

If this is Java 11 specific maybe our plugin misses to reduce some form of MethodHandle that did not exist previously.

Ok, maybe the bigger question is where io.spotnext.groovynative.main.o is coming from. If it's not produced by Native Image, then it's very weird that it has references to Native Image symbols.

might it come from the apple clang bundled with xcode 11?

@dougxc , io.spotnext.groovynative.main.o is the output of the image builder before it performs linking it to the final exe. (com.oracle.svm.hosted.image.LIRNativeImageCodeCache#getCCInputFiles)

Ok, that strongly implies there's an issue with IntrinsifyMethodHandlesInvocationPlugin then.

CGlobalDataFactory can only be used at image build time and not at run time. You cannot add new elements to the data section of a running application.
The bug is that CGlobalDataFactory is not annotated properly as HOSTED_ONLY. I will fix that, then image generation will fail with the proper message that this code must not be reachable at runtime.

Update: there was actually already a HOSTED_ONLY annotation on some implementation class for global data. So I don't think the original linker problem reported is related to CGlobalDataFactory.

@mojo2012 the _Java_java_lang_invoke_MethodHandle_invokeBasic issue should be fixed by https://github.com/oracle/graal/commit/bdf8b8ff38d16aa4927d30ded57969cda61bec16, or at least you should get a better error message telling you why that method is reached.

I am also facing a similar issue while building the native image in codeBuild on AWS. I had upgraded aws-java-sdk version from 1.x.x to 2.10.56. The native image builds with the 1.x.x version however, an error occurs when building the native image with 2.10.56 version. I have used GraalVM version 19.0.0 and micronaut-bom: 1.1.1.

The CloudWatch logs are as below:

Warning: class initialization of class io.netty.channel.epoll.Native failed with exception java.lang.NoClassDefFoundError: Could not initialize class io.netty.channel.epoll.Native. This class will be initialized at run time because either option --report-unsupported-elements-at-runtime or option --allow-incomplete-classpath is used for image building. Use the option --delay-class-initialization-to-runtime=io.netty.channel.epoll.Native to explicitly request delayed initialization of this class.

#  A fatal error has been detected by the Java Runtime Environment:
# SIGSEGV (0xb) at pc=0x00007f8a435218e0, pid=71, tid=0x00007f89f79c0700
# JRE version: OpenJDK Runtime Environment (8.0_202-b08) (build 1.8.0_202-20190206132807.buildslave.jdk8u-src-tar--b08)
# Java VM: OpenJDK GraalVM CE 1.0.0-rc15 (25.202-b08-jvmci-0.58 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C 0x00007f8a435218e0
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
# An error report file with more information is saved as:
# /codebuild/output/src666903570/src/hs_err_pid71.log
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
Error: Image build request failed with exit status 134
[Container] 2020/02/07 06:45:52 Command did not exit successfully ./build-native-image.sh exit status 1
[Container] 2020/02/07 06:45:52 Phase complete: BUILD State: FAILED
[Container] 2020/02/07 06:45:52 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: ./build-native-image.sh. Reason: exit status 1
[Container] 2020/02/07 06:45:52 Entering phase POST_BUILD
[Container] 2020/02/07 06:45:52 Phase complete: POST_BUILD State: SUCCEEDED
[Container] 2020/02/07 06:45:52 Phase context status code: Message:
[Container] 2020/02/07 06:45:53 Expanding base directory path: .
[Container] 2020/02/07 06:45:53 Assembling file list
[Container] 2020/02/07 06:45:53 Expanding .
[Container] 2020/02/07 06:45:53 Expanding file paths for base directory .
[Container] 2020/02/07 06:45:53 Assembling file list
[Container] 2020/02/07 06:45:53 Expanding ./test-function-*.zip
[Container] 2020/02/07 06:45:53 Phase complete: UPLOAD_ARTIFACTS State: FAILED
[Container] 2020/02/07 06:45:53 Phase context status code: CLIENT_ERROR Message: no matching artifact paths found

P.S. If I upgrade graal and micronaut-bom version to 1.2.10 and graal verison 19.2.1, the native image is successfully created but when I try to access an application endpoint, it fails for below exception: error failed: org.apache.commons.logging.LogFactoryjava.lang.NoClassDefFoundError: org.apache.commons.logging.LogFactory and series of exceptions with specifically while creating S3 client.

The exception also had failure at below point:
failed: Could not initialize class software.amazon.awssdk.http.apache.internal.conn.SdkTlsSocketFactoryjava.lang.NoClassDefFoundError: Could not initialize class software.amazon.awssdk.http.apache.internal.conn.SdkTlsSocketFactory

Any help is appreciated!

I think you issue is likely unrelated to this one based on what's in your comment. Please create a new issue with your hs_err file if you'd like me to look at it. I'm closing this issue as the reported problem has been resolved.

Was this page helpful?
0 / 5 - 0 ratings