Openj9: Intermittent failures compiling JDK10 on AIX

Created on 27 Mar 2018  路  43Comments  路  Source: eclipse/openj9

JDK8,JDK9 appear to be stable on AIX.

We've had 2/15 Failures on JDK10 on AIX

Build jobs: (can be launched manually for testing purposes)
https://ci.eclipse.org/openj9/view/Build/job/Build-JDK10-aix_ppc-64_cmprssptrs/
Or launch via a PR ...compile aix jdk10

Failed examples:
https://ci.eclipse.org/openj9/view/Build/job/Build-JDK10-aix_ppc-64_cmprssptrs/13/

01:43:48.187 Creating jre jimage
01:43:48.187 Creating jdk jimage
01:43:52.869 WARNING: Using incubator modules: jdk.incubator.httpclient
01:43:53.714 WARNING: Using incubator modules: jdk.incubator.httpclient
01:44:00.621 Error: java.io.IOException: sun/nio/ch/FileKey.class not found: /home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/build/aix-ppc64-normal-server-release/images/jmods/java.base.jmod
01:44:00.621 java.io.UncheckedIOException: java.io.IOException: sun/nio/ch/FileKey.class not found: /home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/build/aix-ppc64-normal-server-release/images/jmods/java.base.jmod
01:44:00.621    at jdk.tools.jlink.internal.ArchiveEntryResourcePoolEntry.content(jdk.jlink/ArchiveEntryResourcePoolEntry.java:57)
01:44:00.621    at jdk.tools.jlink.plugin.ResourcePoolEntry.contentBytes(jdk.jlink/ResourcePoolEntry.java:126)
01:44:00.621    at jdk.tools.jlink.plugin.ResourcePoolEntry.write(jdk.jlink/ResourcePoolEntry.java:140)
01:44:00.621    at jdk.tools.jlink.internal.ImageFileCreator.lambda$generateJImage$4(jdk.jlink/ImageFileCreator.java:239)
01:44:00.621    at jdk.tools.jlink.internal.ImageFileCreator$$Lambda$235.00000000213EAA10.accept(Unknown Source)
01:44:00.621    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(java.base/ArrayList.java:1492)
01:44:00.621    at java.util.stream.ReferencePipeline$Head.forEach(java.base/ReferencePipeline.java:658)
01:44:00.621    at jdk.tools.jlink.internal.ImageFileCreator.generateJImage(jdk.jlink/ImageFileCreator.java:238)
01:44:00.621    at jdk.tools.jlink.internal.ImageFileCreator.writeImage(jdk.jlink/ImageFileCreator.java:161)
01:44:00.621    at jdk.tools.jlink.internal.ImageFileCreator.create(jdk.jlink/ImageFileCreator.java:100)
01:44:00.621    at jdk.tools.jlink.internal.JlinkTask$ImageHelper.retrieve(jdk.jlink/JlinkTask.java:854)
01:44:00.621    at jdk.tools.jlink.internal.ImagePluginStack.operate(jdk.jlink/ImagePluginStack.java:206)
01:44:00.621    at jdk.tools.jlink.internal.JlinkTask.createImage(jdk.jlink/JlinkTask.java:409)
01:44:00.621    at jdk.tools.jlink.internal.JlinkTask.run(jdk.jlink/JlinkTask.java:273)
01:44:00.621    at jdk.tools.jlink.internal.Main.run(jdk.jlink/Main.java:54)
01:44:00.621    at jdk.tools.jlink.internal.Main.main(jdk.jlink/Main.java:33)
01:44:00.621 Caused by: java.io.IOException: sun/nio/ch/FileKey.class not found: /home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/build/aix-ppc64-normal-server-release/images/jmods/java.base.jmod
01:44:00.621    at jdk.internal.jmod.JmodFile.getInputStream(java.base/JmodFile.java:212)
01:44:00.621    at jdk.tools.jlink.internal.JmodArchive$JmodEntry.stream(jdk.jlink/JmodArchive.java:66)
01:44:00.621    at jdk.tools.jlink.internal.ArchiveEntryResourcePoolEntry.content(jdk.jlink/ArchiveEntryResourcePoolEntry.java:55)
01:44:00.621    ... 15 more
01:44:00.621 Images.gmk:135: gmake[3]: *** [/recipe for targehome/u0020236/wot '/home/u002023rkspace/Build-JD6/workspace/BuilK10-aix_ppc-64_cd-JDK10-aix_ppc-mprssptrs/build/64_cmprssptrs/buaix-ppc64-normalild/aix-ppc64-no-server-release/rmal-server-releimages/jdk/bin/jase/images/jdk/bava] Error 1
01:44:00.621 in/java' failed
01:44:00.621 make/Main.gmk:34gmake[2]: *** [j5: recipe for tadk-image] Error rget 'jdk-image'2
01:44:00.621  failed
01:44:00.621 gmake[2]: *** Waiting for unfinished jobs....
01:44:06.543 
01:44:06.543 ERROR: Build failed for target 'all' in configuration 'aix-ppc64-normal-server-release' (exit code 2) 
01:44:06.543 Stopping sjavac server
01:44:06.543 
01:44:06.543 No indication of failed target found.
01:44:06.543 Hint: Try searching the build log for '] Error'.
01:44:06.543 Hint: See doc/building.html#troubleshooting for assistance.
01:44:06.543 
01:44:06.543 /home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/make/Init.gmk:291: recipe for target 'main' failed
01:44:06.543 make[1]: *** [main] Error 2
01:44:06.543 /home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/make/Init.gmk:186: recipe for target 'all' failed
01:44:06.543 make: *** [all] Error 2

https://ci.eclipse.org/openj9/view/Pull%20Requests/job/PullRequest-Compile-JDK10-aix_ppc-64_cmprssptrs-OpenJ9/2/

00:44:23.499 Creating jre jimage
00:44:23.499 Creating jdk jimage
00:44:30.287 WARNING: Using incubator modules: jdk.incubator.httpclient
00:44:30.287 WARNING: Using incubator modules: jdk.incubator.httpclient
00:44:33.955 java.io.UncheckedIOException: java.io.IOException: org/ietf/jgss/Oid.class not found: /home/u0020236/workspace/PullRequest-Compile-JDK10-aix_ppc-64_cmprssptrs-OpenJ9/build/aix-ppc64-normal-server-release/images/jmods/java.security.jgss.jmod
00:44:33.955    at jdk.tools.jlink.internal.ArchiveEntryResourcePoolEntry.content(jdk.jlink/ArchiveEntryResourcePoolEntry.java:57)
00:44:33.955    at jdk.tools.jlink.plugin.ResourcePoolEntry.contentBytes(jdk.jlink/ResourcePoolEntry.java:126)
00:44:33.955    at jdk.tools.jlink.internal.plugins.StripDebugPlugin.lambda$transform$1(jdk.jlink/StripDebugPlugin.java:72)
00:44:33.955    at jdk.tools.jlink.internal.plugins.StripDebugPlugin$$Lambda$206.0000000022152DF0.apply(Unknown Source)
00:44:33.955    at jdk.tools.jlink.plugin.ResourcePool.lambda$transformAndCopy$0(jdk.jlink/ResourcePool.java:113)
00:44:33.955    at jdk.tools.jlink.plugin.ResourcePool$$Lambda$157.0000000022101070.accept(jdk.jlink/Unknown Source)
00:44:33.955    at java.util.Iterator.forEachRemaining(java.base/Iterator.java:133)
00:44:33.955    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(java.base/Spliterators.java:1801)
00:44:33.955    at java.util.stream.ReferencePipeline$Head.forEach(java.base/ReferencePipeline.java:658)
00:44:33.955    at jdk.tools.jlink.plugin.ResourcePool.transformAndCopy(jdk.jlink/ResourcePool.java:112)
00:44:33.955    at jdk.tools.jlink.internal.plugins.StripDebugPlugin.transform(jdk.jlink/StripDebugPlugin.java:64)
00:44:33.955    at jdk.tools.jlink.internal.ImagePluginStack.visitResources(jdk.jlink/ImagePluginStack.java:274)
00:44:33.955    at jdk.tools.jlink.internal.ImageFileCreator.generateJImage(jdk.jlink/ImageFileCreator.java:182)
00:44:33.955    at jdk.tools.jlink.internal.ImageFileCreator.writeImage(jdk.jlink/ImageFileCreator.java:161)
00:44:33.955    at jdk.tools.jlink.internal.ImageFileCreator.create(jdk.jlink/ImageFileCreator.java:100)
00:44:33.955    at jdk.tools.jlink.internal.JlinkTask$ImageHelper.retrieve(jdk.jlink/JlinkTask.java:854)
00:44:33.955    at jdk.tools.jlink.internal.ImagePluginStack.operate(jdk.jlink/ImagePluginStack.java:206)
00:44:33.955    at jdk.tools.jlink.internal.JlinkTask.createImage(jdk.jlink/JlinkTask.java:409)
00:44:33.955    at jdk.tools.jlink.internal.JlinkTask.run(jdk.jlink/JlinkTask.java:273)
00:44:33.955    at jdk.tools.jlink.internal.Main.run(jdk.jlink/Main.java:54)
00:44:33.955    at jdk.tools.jlink.internal.Main.main(jdk.jlink/Main.java:33)
00:44:33.955 Caused by: java.io.IOException: org/ietf/jgss/Oid.class not found: /home/u0020236/workspace/PullRequest-Compile-JDK10-aix_ppc-64_cmprssptrs-OpenJ9/build/aix-ppc64-normal-server-release/images/jmods/java.security.jgss.jmod
00:44:33.955    at jdk.internal.jmod.JmodFile.getInputStream(java.base/JmodFile.java:212)
00:44:33.955    at jdk.tools.jlink.internal.JmodArchive$JmodEntry.stream(jdk.jlink/JmodArchive.java:66)
00:44:33.955    at jdk.tools.jlink.internal.ArchiveEntryResourcePoolEntry.content(jdk.jlink/ArchiveEntryResourcePoolEntry.java:55)
00:44:33.955    ... 20 more
00:44:33.955 Error: java.io.UncheckedIOException: java.io.IOException: org/ietf/jgss/Oid.class not found: /home/u0020236/workspace/PullRequest-Compile-JDK10-aix_ppc-64_cmprssptrs-OpenJ9/build/aix-ppc64-normal-server-release/images/jmods/java.security.jgss.jmod
00:44:33.955 java.io.IOException: java.io.UncheckedIOException: java.io.IOException: org/ietf/jgss/Oid.class not found: /home/u0020236/workspace/PullRequest-Compile-JDK10-aix_ppc-64_cmprssptrs-OpenJ9/build/aix-ppc64-normal-server-release/images/jmods/java.security.jgss.jmod
00:44:33.955    at jdk.tools.jlink.internal.ImageFileCreator.generateJImage(jdk.jlink/ImageFileCreator.java:192)
00:44:33.955    at jdk.tools.jlink.internal.ImageFileCreator.writeImage(jdk.jlink/ImageFileCreator.java:161)
00:44:33.955    at jdk.tools.jlink.internal.ImageFileCreator.create(jdk.jlink/ImageFileCreator.java:100)
00:44:33.955    at jdk.tools.jlink.internal.JlinkTask$ImageHelper.retrieve(jdk.jlink/JlinkTask.java:854)
00:44:33.955    at jdk.tools.jlink.internal.ImagePluginStack.operate(jdk.jlink/ImagePluginStack.java:206)
00:44:33.955    at jdk.tools.jlink.internal.JlinkTask.createImage(jdk.jlink/JlinkTask.java:409)
00:44:33.955    at jdk.tools.jlink.internal.JlinkTask.run(jdk.jlink/JlinkTask.java:273)
00:44:33.955    at jdk.tools.jlink.internal.Main.run(jdk.jlink/Main.java:54)
00:44:33.955    at jdk.tools.jlink.internal.Main.main(jdk.jlink/Main.java:33)
00:44:33.955 Caused by: java.io.UncheckedIOException: java.io.IOException: org/ietf/jgss/Oid.class not found: /home/u0020236/workspace/PullRequest-Compile-JDK10-aix_ppc-64_cmprssptrs-OpenJ9/build/aix-ppc64-normal-server-release/images/jmods/java.security.jgss.jmod
00:44:33.955    at jdk.tools.jlink.internal.ArchiveEntryResourcePoolEntry.content(jdk.jlink/ArchiveEntryResourcePoolEntry.java:57)
00:44:33.955    at jdk.tools.jlink.plugin.ResourcePoolEntry.contentBytes(jdk.jlink/ResourcePoolEntry.java:126)
00:44:33.955    at jdk.tools.jlink.internal.plugins.StripDebugPlugin.lambda$transform$1(jdk.jlink/StripDebugPlugin.java:72)
00:44:33.955    at jdk.tools.jlink.internal.plugins.StripDebugPlugin$$Lambda$206.0000000022152DF0.apply(Unknown Source)
00:44:33.955    at jdk.tools.jlink.plugin.ResourcePool.lambda$transformAndCopy$0(jdk.jlink/ResourcePool.java:113)
00:44:33.956    at jdk.tools.jlink.plugin.ResourcePool$$Lambda$157.0000000022101070.accept(jdk.jlink/Unknown Source)
00:44:33.956    at java.util.Iterator.forEachRemaining(java.base/Iterator.java:133)
00:44:33.956    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(java.base/Spliterators.java:1801)
00:44:33.956    at java.util.stream.ReferencePipeline$Head.forEach(java.base/ReferencePipeline.java:658)
00:44:33.956    at jdk.tools.jlink.plugin.ResourcePool.transformAndCopy(jdk.jlink/ResourcePool.java:112)
00:44:33.956    at jdk.tools.jlink.internal.plugins.StripDebugPlugin.transform(jdk.jlink/StripDebugPlugin.java:64)
00:44:33.956    at jdk.tools.jlink.internal.ImagePluginStack.visitResources(jdk.jlink/ImagePluginStack.java:274)
00:44:33.956    at jdk.tools.jlink.internal.ImageFileCreator.generateJImage(jdk.jlink/ImageFileCreator.java:182)
00:44:33.956    ... 8 more
00:44:33.956 Caused by: java.io.IOException: org/ietf/jgss/Oid.class not found: /home/u0020236/workspace/PullRequest-Compile-JDK10-aix_ppc-64_cmprssptrs-OpenJ9/build/aix-ppc64-normal-server-release/images/jmods/java.security.jgss.jmod
00:44:33.956    at jdk.internal.jmod.JmodFile.getInputStream(java.base/JmodFile.java:212)
00:44:33.956    at jdk.tools.jlink.internal.JmodArchive$JmodEntry.stream(jdk.jlink/JmodArchive.java:66)
00:44:33.956    at jdk.tools.jlink.internal.ArchiveEntryResourcePoolEntry.content(jdk.jlink/ArchiveEntryResourcePoolEntry.java:55)
00:44:33.956    ... 20 more
00:44:33.956 Images.gmk:144: recipe for target '/home/u002023gmake[3]: *** [/6/workspace/Pullhome/u0020236/woRequest-Compile-rkspace/PullRequJDK10-aix_ppc-64est-Compile-JDK10-aix_ppc-64_cmprssptrs-OpenJ9/b_cmprssptrs-Openuild/aix-ppc64-nJ9/build/aix-ppcormal-server-rel64-normal-serverease/images/jre/-release/images/bin/java] Error jre/bin/java' fa1
00:44:33.956 iled
00:44:33.956 gmake[2]: *** [jre-image] Error 2
00:44:33.956 make/Main.gmk:34gmake[2]: *** Wa8: recipe for taiting for unfinirget 'jre-image'shed jobs....
00:44:33.956  failed
00:44:37.430 Error: java.io.IOException: sun/net/NetHooks.class not found: /home/u0020236/workspace/PullRequest-Compile-JDK10-aix_ppc-64_cmprssptrs-OpenJ9/build/aix-ppc64-normal-server-release/images/jmods/java.base.jmod
00:44:37.430 java.io.UncheckedIOException: java.io.IOException: sun/net/NetHooks.class not found: /home/u0020236/workspace/PullRequest-Compile-JDK10-aix_ppc-64_cmprssptrs-OpenJ9/build/aix-ppc64-normal-server-release/images/jmods/java.base.jmod
00:44:37.430    at jdk.tools.jlink.internal.ArchiveEntryResourcePoolEntry.content(jdk.jlink/ArchiveEntryResourcePoolEntry.java:57)
00:44:37.430    at jdk.tools.jlink.plugin.ResourcePoolEntry.contentBytes(jdk.jlink/ResourcePoolEntry.java:126)
00:44:37.430    at jdk.tools.jlink.plugin.ResourcePoolEntry.write(jdk.jlink/ResourcePoolEntry.java:140)
00:44:37.430    at jdk.tools.jlink.internal.ImageFileCreator.lambda$generateJImage$4(jdk.jlink/ImageFileCreator.java:239)
00:44:37.430    at jdk.tools.jlink.internal.ImageFileCreator$$Lambda$235.0000000021F1DA10.accept(Unknown Source)
00:44:37.430    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(java.base/ArrayList.java:1492)
00:44:37.430    at java.util.stream.ReferencePipeline$Head.forEach(java.base/ReferencePipeline.java:658)
00:44:37.430    at jdk.tools.jlink.internal.ImageFileCreator.generateJImage(jdk.jlink/ImageFileCreator.java:238)
00:44:37.430    at jdk.tools.jlink.internal.ImageFileCreator.writeImage(jdk.jlink/ImageFileCreator.java:161)
00:44:37.430    at jdk.tools.jlink.internal.ImageFileCreator.create(jdk.jlink/ImageFileCreator.java:100)
00:44:37.430    at jdk.tools.jlink.internal.JlinkTask$ImageHelper.retrieve(jdk.jlink/JlinkTask.java:854)
00:44:37.430    at jdk.tools.jlink.internal.ImagePluginStack.operate(jdk.jlink/ImagePluginStack.java:206)
00:44:37.430    at jdk.tools.jlink.internal.JlinkTask.createImage(jdk.jlink/JlinkTask.java:409)
00:44:37.430    at jdk.tools.jlink.internal.JlinkTask.run(jdk.jlink/JlinkTask.java:273)
00:44:37.430    at jdk.tools.jlink.internal.Main.run(jdk.jlink/Main.java:54)
00:44:37.430    at jdk.tools.jlink.internal.Main.main(jdk.jlink/Main.java:33)
00:44:37.430 Caused by: java.io.IOException: sun/net/NetHooks.class not found: /home/u0020236/workspace/PullRequest-Compile-JDK10-aix_ppc-64_cmprssptrs-OpenJ9/build/aix-ppc64-normal-server-release/images/jmods/java.base.jmod
00:44:37.430    at jdk.internal.jmod.JmodFile.getInputStream(java.base/JmodFile.java:212)
00:44:37.430    at jdk.tools.jlink.internal.JmodArchive$JmodEntry.stream(jdk.jlink/JmodArchive.java:66)
00:44:37.430    at jdk.tools.jlink.internal.ArchiveEntryResourcePoolEntry.content(jdk.jlink/ArchiveEntryResourcePoolEntry.java:55)
00:44:37.430    ... 15 more
00:44:37.430 gmake[3]: *** [/Images.gmk:135: home/u0020236/worecipe for targerkspace/PullRequt '/home/u0020236/workspace/Pullest-Compile-JDK1Request-Compile-0-aix_ppc-64_cmpJDK10-aix_ppc-64rssptrs-OpenJ9/b_cmprssptrs-Openuild/aix-ppc64-nJ9/build/aix-ppcormal-server-rel64-normal-serverease/images/jdk/-release/images/bin/java] Error jdk/bin/java' fa1
00:44:37.430 iled
00:44:37.430 make/Main.gmk:345: recipe for target 'jdk-image' failed
00:44:37.430 gmake[2]: *** [jdk-image] Error 2
00:44:37.430 
00:44:37.430 ERROR: Build failed for target 'all' in configuration 'aix-ppc64-normal-server-release' (exit code 2) 
00:44:37.430 Stopping sjavac server
00:44:37.430 
00:44:37.430 === Make failed targets repeated here ===
00:44:37.430 make/Main.gmk:345: recipe for target 'jdk-image' failed
00:44:37.430 === End of repeated output ===
00:44:37.430 
00:44:37.430 Hint: Try searching the build log for the name of the first failed target.
00:44:37.430 Hint: See doc/building.html#troubleshooting for assistance.
00:44:37.430 
00:44:37.430 /home/u0020236/workspace/PullRequest-Compile-JDK10-aix_ppc-64_cmprssptrs-OpenJ9/make/Init.gmk:291: recipe for target 'main' failed
00:44:37.430 make[1]: *** [main] Error 2
00:44:37.430 /home/u0020236/workspace/PullRequest-Compile-JDK10-aix_ppc-64_cmprssptrs-OpenJ9/make/Init.gmk:186: recipe for target 'all' failed
00:44:37.430 make: *** [all] Error 2
build jdk10 high

All 43 comments

@JasonFengJ9 Can you take a look at this?

Yup, looking at it.

It appears the AIX Java 10 can be built consistently in a local farm machine.
Considering the intermittent nature of this failure, the issue might present in the Jenkins machine(s).

@AdamBrousseau any chance I can reserve such a machine?

Sure, I will directly message you with the details

Update about the investigation:

This intermittent error occurs at Java 10 jlink stage to combine jmod files into a single jimage file. The error could happen in various jmod/class, and the frequency varies as well, i.e., sometimes it doesn't failure in 20 runs, other time could failure more than 10/20.
Reducing the number of jmod files for jimage generation seems not affect the result even when only one single module java.base is used.
AIX syslog message doesn't show related error info.
On the other hand, Java 9 build seems not affected at all.

Next action: identify the difference between Java 10 and Java 9 particularly in zip file access.

Update:

Identified the following code snippet within java.util.zip.ZipFile.java:

byte[] bname = zc.getBytes(name);  --> this is a call to java.util.zip.ZipCoder.getBytes(String)

sometimes bname returned is a byte array with bname.length is longer than name.length, the number of extra bytes is ranged from 1 to 6 (from current observation), and these extra bytes are all 0.
Note: it is not necessary that String.length equals its byte array length due to encoding. However in this specific case, the string is a path to a class file within a jmod file, they are ASCII hence String.length equals its byte array length for those successful class retrieval.

Usually this failure rate is around 20% to 50% for a 20x run, with -Xint, 20x run all passed.
Running a larger grinder now.

x500 grinder all passed with -Xint.

@andrewcraik @fjeremic Could JIT folks take a look this issue?
Please ping me for machine access and problem reproducing.

x500 grinder all passed with -Xint.

@andrewcraik @fjeremic Could JIT folks take a look this issue?
Please ping me for machine access and problem reproducing.

@gita-omr could we get someone from the Power side to take a look at this AIX intermittent failure?

Hmm give the getBytes nature of this and the fact it is probably running string encoding/decoding running a grinder with -Xjit:disableIdiomRecognition might be one option to see if that opt is causing an issue. It has a history of being a source of bugs related to encodings. If the failure goes away the question is then whether the idiom recognition is wrong or if the codegen implementation of the array translate it would be likely to produce is wrong.

Running a 200x grinder with -Xjit:disableIdiomRecognition.

The grinder with -Xjit:disableIdiomRecognition still failed, it seems it didn't make a difference about the failing rate either.

@gita-omr do you need any info for machine access and the steps to reproduce the failure?

@JasonFengJ9: yes, please provide machine access and steps to reproduce the failure. Thanks.

Provided the info to @gita-omr in Slack.

@gita-omr Have you had a chance to look at this? Its intermittently causing jenkins builds to fail on AIX.

We are looking at it. Hopefully we'll have some update today.

Was trying to assign to @r-santhir but could not somehow...

The machine lease has expired and can't be login now.
Will need find another machine and reproduce the failure.

Was trying to assign to @R-Santhir but could not somehow...

Only OpenJ9 committers can be assigned.

Set as high priority. This problem is intermittently blocking the OMR acceptance builds. We could remove the AIX criteria from the acceptance builds, but this is not the right answer, although it may be necessary until this problem can be resolved.

Ok, we will wait for Jason to find another machine.

Joe is working on AIX base image to make build stable.

Since Joe's AIX machines are not available, can I reserve and run the test in an Eclipse jenkins AIX machine instead?
@DanHeidinga @pshipton please approve to allow Adam provide the machine access.

We're not set up to allow temporary access to the CI machines.

@gita-omr IBM Runtimes has a large number of AIX machines. Are you able to use one of those machines to test with?

Since the problem is intermittent, it would be much more preferable to try and reproduce it in the original environment. Nevertheless, @R-Santhir is talking to Jason to get all the commands to reproduce on some other machine.

Joe's base image seems stabled.
Reproduced the failure (-Xint can finish successfully), passed to Ryan.

@R-Santhir / @gita-omr Any update on this? The failing AIX builds are blocking OMR-integration.

As suggested by @pshipton above:

We could remove the AIX criteria from the acceptance builds, but this is not the right answer, although it may be necessary until this problem can be resolved.

I think we should disable AIX in the OMR acceptance builds if there isn't progress on this by the end of this week.

Didn't get a chance to look into it yet, as the machines were shut down over the weekend. I'll spend some time today and provide an update

Thanks @R-Santhir! Much appreciated.

I haven't been able to figure out a cause for the failure yet due to its intermittent nature, I'm going to continue to investigate it.

Any update on this?

I have had success with getting a consistent failure, I am now looking further into the failing method.

Looks to be an issue with Partial Redundancy Elimination, @andrewcraik can we get someone to help with this?

A summary of the problem will help make sure the right people look at it - could you summarize your findings?

I was able to find the compilation of a specific method and an opt transformation index that was causing the failure to popup:

java/lang/String.value()[B with a lastOptTransformationIndex=84

That index corresponded to Partial redundancy elimination, here's the tree before and after the transformation

Before PRE

n223n     BBStart <block_19> (freq 10000)
n71n      compressedRefs
n69n        aloadi  java/lang/String.value [B[#619  final Shadow +4] [flags 0x400a0607 0x0 ]
n68n          aload  this<'this' parm Ljava/lang/String;>[#614  Parm] [flags 0x40000107 0x0
n70n        lconst 0 (highWordZero X==0 X>=0 X<=0 )
n72n      areturn
n69n        ==>aloadi
n10n      BBEnd </block_19> =====

After PRE:

n223n     BBStart <block_19> (freq 10000)
n71n      compressedRefs
n69n        aload  <temp slot 9>[#680  Auto] [flags 0x7 0x0 ] (X!=0 sharedMemory )
n70n        lconst 0 (highWordZero X==0 X>=0 X<=0 )
n72n      areturn
n69n        ==>aload
n10n      BBEnd </block_19> =====

I'm also attaching the traceFiles, I collected with and without the failure.

passing-opt83.txt
failing-opt84.txt

So I believe this problem has already been fixed by changes from @fjeremic who refactored the implementation of String.value().

The failure here is not a general optimization failure - it is a failure of the Java code of the String implementation to take the optimizer into account. The basic sequence we have in the original code:
```byte[] value() {
int validLen;
if (enableCompression) {
if ((null == compressionFlag) || (count >= 0)) {
validLen = count;
} else {
validLen = (count & ~uncompressedBit) * 2;
}
} else {
validLen = count * 2;
}

    if (validLen < value.length) {
        helpers.putObjectInObject(this, UnsafeHelpers.valueFieldOffset, java.util.Arrays.copyOf(value, validLen));
    }
    return value;

```

The problem is that the optimizer treats the value field of String in a special way - we know that the String's value will not change and so we tell the optimizer the field is 'final'. This means that the optimizer does not thing the call to putObjectInObject could modify the value field. The failing transformation noted by @R-Santhir occurs when the optimizer eliminates the read of value at the return and replaces it with the value read when we passed it into Arrays.copyOf. This means we return the wrong array in cases where the String's array was too long. Recent changes have eliminated all this code - the String's array is always just the right length and the problem is resolved.

There is no bug in the optimizer to fix - the right thing would have been to return the copied array directly to avoid the optimizer getting confused in this very special bit of Java code.

@AdamBrousseau - Can you verify that this is fixed and close the issue if so. (I am trying to see what's done and what not and what's mustfix for 0.9.0)

I will re-enable the AIX JDK10 build in the Nightly and OMR Acceptance build and watch for problems. If it appears to be resolved we will close this issue.

Since the nightly builds only run once a day, and OMR Acceptance has its own set of problems, it could be some time before we are confident the problem is resolved. I think given Andrew's statement [1] we should assume its fixed and close this issue. We can re-open if it appears again in the re-enabled builds.

[1] https://github.com/eclipse/openj9/issues/1550#issuecomment-390324779

AIX builds for JDK10 are still not working reliably, although the failures look different now.

May 23 job passed
https://ci.eclipse.org/openj9/job/Build-JDK10-aix_ppc-64_cmprssptrs/103/ - aix71-p8-2

May 25 job passed
https://ci.eclipse.org/openj9/job/Build-JDK10-aix_ppc-64_cmprssptrs/108/ - aix71-p8-2

OMR acceptance May 28
https://ci.eclipse.org/openj9/job/Build-JDK10-aix_ppc-64_cmprssptrs/111/ - aix71-p8-2

14:41:50 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/AccessController.c", line 20.4: 1506-356 (W) Compilation unit is empty.
14:41:50 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/Class.c", line 20.4: 1506-356 (W) Compilation unit is empty.
14:41:50 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/ClassLoader.c", line 20.4: 1506-356 (W) Compilation unit is empty.
14:41:50 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/Object.c", line 20.4: 1506-356 (W) Compilation unit is empty.
14:42:01 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/StackStreamFactory.c", line 20.4: 1506-356 (W) Compilation unit is empty.
14:42:01 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/StackTraceElement.c", line 20.4: 1506-356 (W) Compilation unit is empty.
14:42:01 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/System.c", line 20.4: 1506-356 (W) Compilation unit is empty.
14:42:01 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/Thread.c", line 20.4: 1506-356 (W) Compilation unit is empty.
14:42:01 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/Throwable.c", line 20.4: 1506-356 (W) Compilation unit is empty.
14:42:01 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/verify_stub.c", line 20.4: 1506-356 (W) Compilation unit is empty.
14:42:01 /opt/freeware/bin/bash: cannot make pipe for process substitution: A file or directory in the path name does not exist.
14:42:01 /opt/freeware/bin/bash: /tmp//sh-np.Ans7ab: ambiguous redirect
14:42:01 cp: /home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/build/aix-ppc64-normal-server-release/support/native/java.base/libzip/Adler32.o.log: A file or directory in the path name does not exist.
14:42:01 CoreLibraries.gmk:220: recipe for target '/home/u0020236/workspagmake[3]: *** [/ce/Build-JDK10-ahome/u0020236/woix_ppc-64_cmprssrkspace/Build-JDptrs/build/aix-pK10-aix_ppc-64_cpc64-normal-servmprssptrs/build/er-release/suppoaix-ppc64-normalrt/native/java.b-server-release/ase/libzip/Adlersupport/native/j32.o' failed
14:42:01 ava.base/libzip/Adler32.o] Error 1
14:42:01 gmake[3]: *** Waiting for unfinished jobs....
14:42:01 gmake[2]: *** [java.base-libs] Error 2
14:42:01 make/Main.gmk:228: recipe for target 'java.base-libs' failed
14:42:01 
14:42:01 ERROR: Build failed for target 'all' in configuration 'aix-ppc64-normal-server-release' (exit code 2) 

OMR acceptance May 29 job passed
https://ci.eclipse.org/openj9/job/Build-JDK10-aix_ppc-64_cmprssptrs/113/ - aix71-p8-2

OMR acceptance May 29 job passed OMR promoted
https://ci.eclipse.org/openj9/job/Build-JDK10-aix_ppc-64_cmprssptrs/116/ - aix71-p8-3

OMR acceptance May 29
https://ci.eclipse.org/openj9/job/Build-JDK10-aix_ppc-64_cmprssptrs/117/ - aix71-p8-2

11:40:20 OpenJ9 compile complete
11:40:20 Creating support/modules_libs/java.base/*/libjvm.so from J9 sources
11:40:21 gmake[4]: Entering directory '/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs'
11:40:22 gmake[4]: Leaving directory '/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs'
11:40:22 
11:40:22 ERROR: Build failed for target 'all' in configuration 'aix-ppc64-normal-server-release' (exit code 2) 
11:40:22 
11:40:22 === Make failed targets repeated here ===
11:40:22 CompileInterimLangtools.gmk:86: recipe for target '/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/build/aix-ppc64-normal-server-release/buildtools/interim_langtools_modules/java.compiler.interim/_the.BUILD_java.compiler.interim_batch' failed
11:40:22 make/Main.gmk:78: recipe for target 'interim-langtools' failed
11:40:22 === End of repeated output ===
11:40:22 
11:40:22 Hint: Try searching the build log for the name of the first failed target.
11:40:22 Hint: See doc/building.html#troubleshooting for assistance.

OMR acceptance May 29 job passed
https://ci.eclipse.org/openj9/job/Build-JDK10-aix_ppc-64_cmprssptrs/119 - aix71-p8-3

OMR acceptance May 30
https://ci.eclipse.org/openj9/job/Build-JDK10-aix_ppc-64_cmprssptrs/120/ - aix71-p8-2

00:42:28 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/AccessController.c", line 20.4: 1506-356 (W) Compilation unit is empty.
00:42:28 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/ClassLoader.c", line 20.4: 1506-356 (W) Compilation unit is empty.
00:42:28 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/Class.c", line 20.4: 1506-356 (W) Compilation unit is empty.
00:42:29 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/Object.c", line 20.4: 1506-356 (W) Compilation unit is empty.
00:42:30 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/StackStreamFactory.c", line 20.4: 1506-356 (W) Compilation unit is empty.
00:42:30 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/StackTraceElement.c", line 20.4: 1506-356 (W) Compilation unit is empty.
00:42:30 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/System.c", line 20.4: 1506-356 (W) Compilation unit is empty.
00:42:30 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/Thread.c", line 20.4: 1506-356 (W) Compilation unit is empty.
00:42:30 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/Throwable.c", line 20.4: 1506-356 (W) Compilation unit is empty.
00:42:31 "/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/closed/src/java.base/share/native/libjava/verify_stub.c", line 20.4: 1506-356 (W) Compilation unit is empty.
00:42:36 /opt/freeware/bin/bash: cannot make pipe for process substitution: A file or directory in the path name does not exist.
00:42:36 /opt/freeware/bin/bash: /tmp//sh-np.HBmMab: ambiguous redirect
00:42:36 cp: /home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/build/aix-ppc64-normal-server-release/support/native/java.base/libnet/proxy_util.o.log: A file or directory in the path name does not exist.
00:42:36 NetworkingLibraries.gmk:28: recipe for target '/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/build/aix-ppc64-normal-server-release/support/native/java.base/libnet/gmake[3]: *** [/proxy_util.o' fahome/u0020236/woiled
00:42:36 rkspace/Build-JDK10-aix_ppc-64_cmprssptrs/build/aix-ppc64-normal-server-release/support/native/java.base/libnet/proxy_util.o] Error 1
00:42:36 gmake[3]: *** Waiting for unfinished jobs....
00:42:37 make/Main.gmk:22gmake[2]: *** [j8: recipe for taava.base-libs] Erget 'java.base-rror 2
00:42:37 libs' failed
00:42:37 
00:42:37 ERROR: Build failed for target 'all' in configuration 'aix-ppc64-normal-server-release' (exit code 2) 

Nightly build May 30
https://ci.eclipse.org/openj9/job/Build-JDK10-aix_ppc-64_cmprssptrs/121/ - aix71-p8-2

01:11:54 OpenJ9 compile complete
01:11:54 Creating support/modules_libs/java.base/*/libjvm.so from J9 sources
01:11:54 gmake[4]: Entering directory '/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs'
01:11:56 gmake[4]: Leaving directory '/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs'
01:11:56 
01:11:56 ERROR: Build failed for target 'all' in configuration 'aix-ppc64-normal-server-release' (exit code 2) 
01:11:56 
01:11:56 === Make failed targets repeated here ===
01:11:56 CompileJavaModules.gmk:633: recipe for target '/home/u0020236/workspace/Build-JDK10-aix_ppc-64_cmprssptrs/build/aix-ppc64-normal-server-release/jdk/modules/java.transaction/_the.java.transaction_batch' failed
01:11:56 === End of repeated output ===
01:11:56 
01:11:56 Hint: Try searching the build log for the name of the first failed target.
01:11:56 Hint: See doc/building.html#troubleshooting for assistance.

Its interesting that all the failures are all on the same machine aix71-p8-2, although there was one job that passed on this machine.

I've disabled aix71-p8-2 for now, we'll see if we get any of these failures an another machine.

aix71-p8-2 is re-enabled and #2023 is opened for the problem. Closing this issue again.

Was this page helpful?
0 / 5 - 0 ratings