Openj9: JDK15 s390x jdk_math_j9_0_FAILED : Failure in BigIntegerTest & PrimeTest

Created on 23 Apr 2020  路  46Comments  路  Source: eclipse/openj9

Failure link

From an internal test Test_openjdknext_j9_sanity.openjdk_s390x_linux_xl_Nightly/54:

00:05:48  openjdk version "15-internal" 2020-09-15
00:05:48  OpenJDK Runtime Environment (build 15-internal+0-adhoc.jenkins.BuildJDKnexts390xlinuxxlNightly)
00:05:48  Eclipse OpenJ9 VM (build ibm_sdk-3ee59d21c7, JRE 15 Linux s390x-64-Bit 20200421_60 (JIT enabled, AOT enabled)
00:05:48  OpenJ9   - 3ee59d21c7
00:05:48  OMR      - 634ac4a8b
00:05:48  JCL      - 76d75207e94 based on jdk-15+18)

Optional info

Failure output (captured from console output)

00:18:45  STDERR:
00:18:45  Constructor: Passed
00:18:45  Mersenne prime 5 failed.
00:18:45  Mersenne prime 6 failed.
00:18:45  Prime: Failed(2)
00:18:45  nextProbablePrime: Passed

00:18:45  java.lang.RuntimeException: Failure in BigIntegerTest.
00:18:45    at BigIntegerTest.main(BigIntegerTest.java:1278)
00:18:45    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
00:18:45    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
00:18:45    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
00:18:45    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
00:18:45    at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
00:18:45    at java.base/java.lang.Thread.run(Thread.java:835)

00:19:50  Checking first 19 Mersenne primes
00:19:50  Mersenne test result: FAILURE
00:19:50  STDERR:
00:19:50  Mp with p = 521 not classified as prime
00:19:50  Mp with p = 607 not classified as prime
00:19:50  Mp with p = 1279 not classified as prime
00:19:50  Mp with p = 2203 not classified as prime
00:19:50  Mp with p = 2281 not classified as prime
00:19:50  Mp with p = 3217 not classified as prime
00:19:50  Mp with p = 4253 not classified as prime
00:19:50  java.lang.Exception: PrimeTest FAILED!
00:19:50    at PrimeTest.main(PrimeTest.java:86)
00:19:50    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
00:19:50    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
00:19:50    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
00:19:50    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
00:19:50    at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
00:19:50    at java.base/java.lang.Thread.run(Thread.java:835)

jdk_math_j9_0_FAILED

For example, to rebuild the failed tests in =https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder, use the following links: https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder/parambuild/?JDK_VERSION=15&JDK_IMPL=openj9&BUILD_LIST=openjdk&PLATFORM=s390x_linux_xl&TARGET=jdk_math_j9_0

jit jdk15 test failure

All 46 comments

This isn't a new failure, nor is it specific to the xl build.

PrimeTest failure seems new. s390x can be removed if it occurs in another platform.

There was an issue about BigIntegerTest failure https://github.com/eclipse/openj9/issues/6468 which was a while ago and it was JDK11.

@fjeremic fyi

I've started an open build https://ci.eclipse.org/openj9/job/Pipeline-Release-Build/204/

job/Test_openjdknext_j9_sanity.openjdk_s390x_linux_Nightly/179
job/Test_openjdknext_j9_sanity.openjdk_s390x_linux_xl_Nightly/81
java/math/BigInteger/PrimeTest.java

STDOUT:
Upper bound = 1299709
Certainty = 100
Parallel = true
Created 100001 primes
Prime test result: SUCCESS
Seed from RandomFactory = -5395300775752289288L
Non-prime test result: SUCCESS
Checking first 19 Mersenne primes
Mersenne test result: FAILURE

STDERR:
Mp with p = 521 not classified as prime
Mp with p = 607 not classified as prime
Mp with p = 1279 not classified as prime
Mp with p = 2203 not classified as prime
Mp with p = 2281 not classified as prime
Mp with p = 3217 not classified as prime
Mp with p = 4253 not classified as prime
java.lang.Exception: PrimeTest FAILED!
    at PrimeTest.main(PrimeTest.java:86)

java/math/BigInteger/BigIntegerTest.java
com/sun/crypto/provider/Cipher/RSA/TestOAEPPadding.java
Agent 1 timed out with a timeout of 960 seconds

23:33:35  "AgentVMThread" prio=5 Id=46 RUNNABLE
23:33:35    at java.base@15-internal/java.math.BigInteger.subN(BigInteger.java:3145)
23:33:35    at java.base@15-internal/java.math.BigInteger.montReduce(BigInteger.java:3111)
23:33:35    at java.base@15-internal/java.math.BigInteger.implMontgomerySquare(BigInteger.java:2849)
23:33:35    at java.base@15-internal/java.math.BigInteger.montgomerySquare(BigInteger.java:2805)
23:33:35    at java.base@15-internal/java.math.BigInteger.oddModPow(BigInteger.java:3075)
23:33:35    at java.base@15-internal/java.math.BigInteger.modPow(BigInteger.java:2736)
23:33:35    at java.base@15-internal/java.math.BigInteger.passesMillerRabin(BigInteger.java:1104)
23:33:35    at java.base@15-internal/java.math.BigInteger.primeToCertainty(BigInteger.java:960)
23:33:35    at java.base@15-internal/java.math.BitSieve.retrieve(BitSieve.java:203)
23:33:35    at java.base@15-internal/java.math.BigInteger.largePrime(BigInteger.java:837)
23:33:35    at java.base@15-internal/java.math.BigInteger.probablePrime(BigInteger.java:767)
23:33:35    at app//BigIntegerTest.prime(BigIntegerTest.java:1060)
23:33:35    at app//BigIntegerTest.main(BigIntegerTest.java:1232)
22:11:06  "AgentVMThread" prio=5 Id=46 RUNNABLE
22:11:06    at java.base@15-internal/java.math.BigInteger.montReduce(BigInteger.java:3111)
22:11:06    at java.base@15-internal/java.math.BigInteger.implMontgomerySquare(BigInteger.java:2849)
22:11:06    at java.base@15-internal/java.math.BigInteger.montgomerySquare(BigInteger.java:2805)
22:11:06    at java.base@15-internal/java.math.BigInteger.oddModPow(BigInteger.java:3075)
22:11:06    at java.base@15-internal/java.math.BigInteger.modPow(BigInteger.java:2736)
22:11:06    at java.base@15-internal/java.math.BigInteger.passesMillerRabin(BigInteger.java:1104)
22:11:06    at java.base@15-internal/java.math.BigInteger.primeToCertainty(BigInteger.java:960)
22:11:06    at java.base@15-internal/java.math.BitSieve.retrieve(BitSieve.java:203)
22:11:06    at java.base@15-internal/java.math.BigInteger.largePrime(BigInteger.java:837)
22:11:06    at java.base@15-internal/java.math.BigInteger.probablePrime(BigInteger.java:767)
22:11:06    at app//BigIntegerTest.prime(BigIntegerTest.java:1060)
22:11:06    at app//BigIntegerTest.main(BigIntegerTest.java:1232)



md5-af321a81d13c514666b1ae764f2ff421



00:10:23  "AgentVMThread" prio=5 Id=104 RUNNABLE
00:10:23    at java.base@15-internal/java.math.BigInteger.mulAdd(BigInteger.java:3153)
00:10:23    at java.base@15-internal/java.math.BigInteger.montReduce(BigInteger.java:3102)
00:10:23    at java.base@15-internal/java.math.BigInteger.implMontgomerySquare(BigInteger.java:2849)
00:10:23    at java.base@15-internal/java.math.BigInteger.montgomerySquare(BigInteger.java:2805)
00:10:23    at java.base@15-internal/java.math.BigInteger.oddModPow(BigInteger.java:3075)
00:10:23    at java.base@15-internal/java.math.BigInteger.modPow(BigInteger.java:2736)
00:10:23    at java.base@15-internal/java.math.BigInteger.passesMillerRabin(BigInteger.java:1104)
00:10:23    at java.base@15-internal/java.math.BigInteger.primeToCertainty(BigInteger.java:960)
00:10:23    at java.base@15-internal/java.math.BitSieve.retrieve(BitSieve.java:203)
00:10:23    at java.base@15-internal/java.math.BigInteger.largePrime(BigInteger.java:837)
00:10:23    at java.base@15-internal/java.math.BigInteger.probablePrime(BigInteger.java:767)
00:10:23    at java.base@15-internal/sun.security.rsa.RSAKeyPairGenerator.generateKeyPair(RSAKeyPairGenerator.java:144)
00:10:23    at java.base@15-internal/java.security.KeyPairGenerator$Delegate.generateKeyPair(KeyPairGenerator.java:722)
00:10:23    at app//TestOAEPPadding.main(TestOAEPPadding.java:60)

@fjeremic I'm adding a JIT label. I ran the test with -Xint and it passed. https://ci.eclipse.org/openj9/view/Test/job/Grinder/984
Also without -Xint and it still fails the same way.

Rerunning Peter's grinder with 20 iterations to find out the reproduction rate with JIT enabled:
https://ci.eclipse.org/openj9/view/Test/job/Grinder/1012/

@VermaSh could you help take a look at this one please? Let's wait for the grinder to finish and observe the failure rate. If it is high enough we can try to reproduce the issue locally or if that doesn't work we can hop on the farm machine and reproduce the problem there to begin investigation. Let's focus first on the PrimeTest failure.

Here is the test in question that fails:
https://github.com/ibmruntimes/openj9-openjdk-jdk15/blob/c15a7aef2ed53987877c59bcffe70273909b56ee/test/jdk/java/math/BigInteger/PrimeTest.java#L211-L232

We can start by a limitfile search if the reproduction rate is good enough.

Any update on this? We're about a week away from the code split for the 0.22 release.

BigInteger created in checkMersennePrimes seems to be same for both failing and passing runs. Based that the failure most likely originates from isProbablePrime. I am investigating that method right now, will post another update soon

The method causing the failures is + (profiled very-hot) java/math/BigInteger.implSquareToLen([II[II)[I @ when compiled at opt level very hot and scorching. Looking into the method now.

Quick update:
The failure looks to be caused by SPMDKernelParallelization optimization. This optimization happens only when the method is compiled at opt level scorching or very-hot.

<optimization id=125 name=SPMDKernelParallelization method=java/math/BigInteger.implSquareToLen([II[II)[I>
Performing 125: SPMDKernelParallelization
            (Building use/def info)
         PREPARTITION VN   (Building value number info)
Checking data locality in loop 45 piv = 494
SPMD DEPENDENCE ANALYSIS: def 000003FF9C44D000, use 000003FF9C44D5A0. Constant distance dependence of -4 bytes
checking loop iteration pattern on loop 45 
Loop 45 and piv = 494 collected for Auto-Vectorization
SPMD PRE-CHECK FAILURE: found disallowed treetop opcode BNDCHK at node 000003FF9C7508A0 in loop 109
Checking data locality in loop 16 piv = 448
SPMD PRE-CHECK FAILURE: found disallowed treetop opcode treetop at node 000003FF9C3023A0 in loop 4
Checking data locality in loop 4 piv = 380
[  4294] 125.1    O^O AUTO SIMD:  Simdizing loop 45  branch block = 45 inc = 1 piv = 494
   Checking Loop ==== (iter = -1, unroll = 16, vectorSize = 4) and unroller._spillLoopRequired 1 
              (Invalidating structure)
         BLOCK CLONER: Newly created block_465 is a clone of original block_45
         BLOCK CLONER: Newly created block_466 is a clone of original block_45
         BLOCK CLONER: Newly created block_467 is a clone of original block_45
         BLOCK CLONER: Newly created block_468 is a clone of original block_45
         O^O GENERAL LOOP UNROLLER: changed entry node of region 45 [9d10b1a0] to 466
Reducing the number of iterations of the loop 466 at storeNode [000003FF9D16E870] by vector length 16 
Reducing the number of iterations of the loop 466 at storeNode [000003FF9D03DE10] by vector length 16 
Reducing the number of iterations of the loop 466 at storeNode [000003FF9D03D410] by vector length 16 
Reducing the number of iterations of the loop 466 at storeNode [000003FF9C44C970] by vector length 16 
              (Invalidating value number info)
              (Invalidating use/def info)
              (Invalidating alias info)

@VermaSh what does the loop look like before and after the transformation? Can you post a log of with/without this particular transformation. Tagging @gita-omr and @mnalam-p as the resident experts in this area for your awareness. We'll try to get as far as we can into the investigation, but @VermaSh may need help eventually.

@fjeremic Bellow is the loop before and after the transformation. The transformation itself seems to be correct(thanks to @r30shah for helping me verify). looking into the instructions generated next.

Here is the log file with the transformation.scorching_lastOpt_125_trace.txt.zip As for a trace file without the transformation. The method java/math/BigInteger.implSquareToLen([II[II)[I doesn't get compiled when SPMDKernelParallelization opt is disabled. That was an error on my part. Both trace files are attached here https://github.com/eclipse/openj9/issues/9331#issuecomment-673533625

This is the java code for the loop: https://github.com/ibmruntimes/openj9-openjdk-jdk15/blob/master/src/java.base/share/classes/java/math/BigInteger.java#L3359-L3365

Loop before transformation:

n1129n    BBStart <block_45> (freq 2479)                                                      [     0x3ff9c4430a0] bci=[6,9,3362] rc=0 vc=6969 vn=- li=-1 udi=- nc=0
n1668n    asynccheck  jitCheckAsyncMessages[#23  helper Method] [flags 0x400 0x0 ]            [     0x3ff9c44d910] bci=[6,16,3363] rc=0 vc=6969 vn=- li=-1 udi=- nc=0
n1618n    istore  oldIdx<auto slot 6>[#494  Auto] [flags 0x3 0x0 ]                            [     0x3ff9c44c970] bci=[6,24,3363] rc=0 vc=6969 vn=- li=-1 udi=29 nc=1
n1619n      isub                                                                              [     0x3ff9c44c9c0] bci=[6,24,3363] rc=3 vc=6969 vn=- li=- udi=- nc=2
n1614n        iload  oldIdx<auto slot 6>[#494  Auto] [flags 0x3 0x0 ] (cannotOverflow )       [     0x3ff9c44c830] bci=[6,22,3363] rc=3 vc=6969 vn=- li=- udi=222 nc=0 flg=0x1000
n1621n        iconst -1 (X!=0 X<=0 )                                                          [     0x3ff9c44ca60] bci=[6,24,3363] rc=1 vc=6969 vn=- li=- udi=- nc=0 flg=0x204
n1639n    istorei  <array-shadow>[#248  Shadow] [flags 0x80000603 0x0 ]                       [     0x3ff9c44d000] bci=[6,38,3363] rc=0 vc=6969 vn=- li=-1 udi=- nc=2
n1640n      aladd (X>=0 internalPtr sharedMemory )                                            [     0x3ff9c44d050] bci=[6,38,3363] rc=1 vc=6969 vn=- li=- udi=- nc=2 flg=0x8100
n1632n        aload  z<parm 2 [I>[#377  Parm] [flags 0x40000107 0x0 ] (X!=0 sharedMemory )    [     0x3ff9c44cdd0] bci=[6,16,3363] rc=3 vc=6969 vn=- li=- udi=223 nc=0 flg=0x4
n1641n        lsub (highWordZero X>=0 cannotOverflow )                                        [     0x3ff9c44d0a0] bci=[6,38,3363] rc=1 vc=6969 vn=- li=- udi=- nc=2 flg=0x5100
n1642n          lmul (X>=0 cannotOverflow )                                                   [     0x3ff9c44d0f0] bci=[6,38,3363] rc=1 vc=6969 vn=- li=- udi=- nc=2 flg=0x1100
n1643n            i2l (highWordZero X>=0 )                                                    [     0x3ff9c44d140] bci=[6,38,3363] rc=1 vc=6969 vn=- li=- udi=- nc=1 flg=0x4100
n1614n              ==>iload
n1644n            lconst 4 (highWordZero X!=0 X>=0 )                                          [     0x3ff9c44d190] bci=[6,38,3363] rc=3 vc=6969 vn=- li=- udi=- nc=0 flg=0x4104
n1645n          lconst -8 (X!=0 X<=0 )                                                        [     0x3ff9c44d1e0] bci=[6,38,3363] rc=3 vc=6969 vn=- li=- udi=- nc=0 flg=0x204
n1646n      ior                                                                               [     0x3ff9c44d230] bci=[6,37,3363] rc=1 vc=6969 vn=- li=- udi=- nc=2
n1647n        imul                                                                            [     0x3ff9c44d280] bci=[6,29,3363] rc=1 vc=6969 vn=- li=- udi=- nc=2
n1648n          iloadi  <array-shadow>[#248  Shadow] [flags 0x80000603 0x0 ] (cannotOverflow )  [     0x3ff9c44d2d0] bci=[6,27,3363] rc=1 vc=6969 vn=- li=- udi=- nc=1 flg=0x1000
n1649n            aladd (X>=0 internalPtr sharedMemory )                                      [     0x3ff9c44d320] bci=[6,27,3363] rc=1 vc=6969 vn=- li=- udi=- nc=2 flg=0x8100
n1632n              ==>aload
n1650n              lsub (highWordZero X>=0 cannotOverflow )                                  [     0x3ff9c44d370] bci=[6,27,3363] rc=1 vc=6969 vn=- li=- udi=- nc=2 flg=0x5100
n1651n                lmul (X>=0 cannotOverflow )                                             [     0x3ff9c44d3c0] bci=[6,27,3363] rc=1 vc=6969 vn=- li=- udi=- nc=2 flg=0x1100
n1652n                  i2l (highWordZero X>=0 )                                              [     0x3ff9c44d410] bci=[6,27,3363] rc=1 vc=6969 vn=- li=- udi=- nc=1 flg=0x4100
n1614n                    ==>iload
n1644n                  ==>lconst 4
n1645n                ==>lconst -8
n1655n          iconst 2 (X!=0 X>=0 )                                                         [     0x3ff9c44d500] bci=[-1,133,2191] rc=1 vc=6969 vn=- li=- udi=- nc=0 flg=0x104
n1656n        iushr (X>=0 cannotOverflow )                                                    [     0x3ff9c44d550] bci=[6,36,3363] rc=1 vc=6969 vn=- li=- udi=- nc=2 flg=0x1100
n1657n          iloadi  <array-shadow>[#248  Shadow] [flags 0x80000603 0x0 ] (cannotOverflow )  [     0x3ff9c44d5a0] bci=[6,33,3363] rc=1 vc=6969 vn=- li=- udi=- nc=1 flg=0x1000
n1658n            aladd (X>=0 internalPtr sharedMemory )                                      [     0x3ff9c44d5f0] bci=[6,33,3363] rc=1 vc=6969 vn=- li=- udi=- nc=2 flg=0x8100
n1632n              ==>aload
n1659n              lsub (highWordZero X>=0 cannotOverflow )                                  [     0x3ff9c44d640] bci=[6,33,3363] rc=1 vc=6969 vn=- li=- udi=- nc=2 flg=0x5100
n1660n                lmul (X>=0 cannotOverflow )                                             [     0x3ff9c44d690] bci=[6,33,3363] rc=1 vc=6969 vn=- li=- udi=- nc=2 flg=0x1100
n1661n                  i2l (highWordZero X>=0 )                                              [     0x3ff9c44d6e0] bci=[6,33,3363] rc=1 vc=6969 vn=- li=- udi=- nc=1 flg=0x4100
n1619n                    ==>isub
n1644n                  ==>lconst 4
n1645n                ==>lconst -8
n1664n          iconst 31 (X!=0 X>=0 cannotOverflow )                                         [     0x3ff9c44d7d0] bci=[6,34,3363] rc=1 vc=6969 vn=- li=- udi=- nc=0 flg=0x1104
n1569n    ificmplt --> block_45 BBStart at n1129n ()                                          [     0x3ff9c44ba20] bci=[6,13,3362] rc=0 vc=6969 vn=- li=-1 udi=- nc=2 flg=0x20
n1619n      ==>isub
n1571n      iload  <temp slot 9>[#497  Auto] [flags 0x20000003 0x0 ] (X>=0 cannotOverflow )   [     0x3ff9c44bac0] bci=[6,11,3362] rc=1 vc=6969 vn=- li=- udi=224 nc=0 flg=0x1100
n1568n    BBEnd </block_45> =====                                                             [     0x3ff9c44b9d0] bci=[6,9,3362] rc=0 vc=6969 vn=- li=-1 udi=- nc=0

Loop after transformation:

n1129n    BBStart <block_45> (freq 2479) (in loop 466)                                        [     0x3ff9c4430a0] bci=[6,9,3362] rc=0 vc=7085 vn=- li=- udi=- nc=0
n1668n    asynccheck  jitCheckAsyncMessages[#23  helper Method] [flags 0x400 0x0 ]            [     0x3ff9c44d910] bci=[6,16,3363] rc=0 vc=7192 vn=- li=- udi=- nc=0
n1618n    istore  oldIdx<auto slot 6>[#494  Auto] [flags 0x3 0x0 ]                            [     0x3ff9c44c970] bci=[6,24,3363] rc=0 vc=7192 vn=- li=52 udi=29 nc=1
n1619n      isub                                                                              [     0x3ff9c44c9c0] bci=[6,24,3363] rc=3 vc=7192 vn=- li=- udi=- nc=2
n1614n        iload  oldIdx<auto slot 6>[#494  Auto] [flags 0x3 0x0 ] (cannotOverflow )       [     0x3ff9c44c830] bci=[6,22,3363] rc=3 vc=7192 vn=- li=- udi=240 nc=0 flg=0x1000
n7511n        iconst -4 (X!=0 X<=0 )                                                          [     0x3ff9d16fb80] bci=[6,24,3363] rc=1 vc=7192 vn=- li=- udi=- nc=0 flg=0x204
n1639n    vstorei  <array-shadow>[#255  Shadow] [flags 0x8000060a 0x0 ]                       [     0x3ff9c44d000] bci=[6,38,3363] rc=0 vc=7192 vn=- li=- udi=- nc=2
n1640n      aladd (X>=0 internalPtr sharedMemory )                                            [     0x3ff9c44d050] bci=[6,38,3363] rc=1 vc=7192 vn=- li=- udi=- nc=2 flg=0x8100
n1632n        aload  z<parm 2 [I>[#377  Parm] [flags 0x40000107 0x0 ] (X!=0 sharedMemory )    [     0x3ff9c44cdd0] bci=[6,16,3363] rc=3 vc=7192 vn=- li=- udi=241 nc=0 flg=0x4
n1641n        lsub (highWordZero X>=0 cannotOverflow )                                        [     0x3ff9c44d0a0] bci=[6,38,3363] rc=1 vc=7192 vn=- li=- udi=- nc=2 flg=0x5100
n1642n          lmul (X>=0 cannotOverflow )                                                   [     0x3ff9c44d0f0] bci=[6,38,3363] rc=1 vc=7192 vn=- li=- udi=- nc=2 flg=0x1100
n1643n            i2l (highWordZero X>=0 )                                                    [     0x3ff9c44d140] bci=[6,38,3363] rc=1 vc=7192 vn=- li=- udi=- nc=1 flg=0x4100
n1614n              ==>iload
n1644n            lconst 4 (highWordZero X!=0 X>=0 )                                          [     0x3ff9c44d190] bci=[6,38,3363] rc=3 vc=7192 vn=- li=- udi=- nc=0 flg=0x4104
n1645n          lconst -8 (X!=0 X<=0 )                                                        [     0x3ff9c44d1e0] bci=[6,38,3363] rc=3 vc=7192 vn=- li=- udi=- nc=0 flg=0x204
n1646n      vor (VectorInt32)                                                                 [     0x3ff9c44d230] bci=[6,37,3363] rc=1 vc=7192 vn=- li=- udi=- nc=2
n1647n        vmul (VectorInt32)                                                              [     0x3ff9c44d280] bci=[6,29,3363] rc=1 vc=7192 vn=- li=- udi=- nc=2
n1648n          vloadi  <array-shadow>[#255  Shadow] [flags 0x8000060a 0x0 ] (cannotOverflow )  [     0x3ff9c44d2d0] bci=[6,27,3363] rc=1 vc=7192 vn=- li=- udi=- nc=1 flg=0x1000
n1649n            aladd (X>=0 internalPtr sharedMemory )                                      [     0x3ff9c44d320] bci=[6,27,3363] rc=1 vc=7192 vn=- li=- udi=- nc=2 flg=0x8100
n1632n              ==>aload
n1650n              lsub (highWordZero X>=0 cannotOverflow )                                  [     0x3ff9c44d370] bci=[6,27,3363] rc=1 vc=7192 vn=- li=- udi=- nc=2 flg=0x5100
n1651n                lmul (X>=0 cannotOverflow )                                             [     0x3ff9c44d3c0] bci=[6,27,3363] rc=1 vc=7192 vn=- li=- udi=- nc=2 flg=0x1100
n1652n                  i2l (highWordZero X>=0 )                                              [     0x3ff9c44d410] bci=[6,27,3363] rc=1 vc=7192 vn=- li=- udi=- nc=1 flg=0x4100
n1614n                    ==>iload
n1644n                  ==>lconst 4
n1645n                ==>lconst -8
n7545n          vsplats (VectorInt32)                                                         [     0x3ff9d170620] bci=[-1,133,2191] rc=1 vc=0 vn=- li=- udi=- nc=1
n7544n            iconst 2 (X!=0 X>=0 )                                                       [     0x3ff9d1705d0] bci=[-1,133,2191] rc=1 vc=7192 vn=- li=- udi=- nc=0 flg=0x104
n1656n        vushr (VectorInt32) (X>=0 cannotOverflow )                                      [     0x3ff9c44d550] bci=[6,36,3363] rc=1 vc=7192 vn=- li=- udi=- nc=2 flg=0x1100
n1657n          vloadi  <array-shadow>[#255  Shadow] [flags 0x8000060a 0x0 ] (cannotOverflow )  [     0x3ff9c44d5a0] bci=[6,33,3363] rc=1 vc=7192 vn=- li=- udi=- nc=1 flg=0x1000
n1658n            aladd (X>=0 internalPtr sharedMemory )                                      [     0x3ff9c44d5f0] bci=[6,33,3363] rc=1 vc=7192 vn=- li=- udi=- nc=2 flg=0x8100
n1632n              ==>aload
n1659n              lsub (highWordZero X>=0 cannotOverflow )                                  [     0x3ff9c44d640] bci=[6,33,3363] rc=1 vc=7192 vn=- li=- udi=- nc=2 flg=0x5100
n1660n                lmul (X>=0 cannotOverflow )                                             [     0x3ff9c44d690] bci=[6,33,3363] rc=1 vc=7192 vn=- li=- udi=- nc=2 flg=0x1100
n1661n                  i2l (highWordZero X>=0 )                                              [     0x3ff9c44d6e0] bci=[6,33,3363] rc=1 vc=7192 vn=- li=- udi=- nc=1 flg=0x4100
n1619n                    ==>isub
n1644n                  ==>lconst 4
n1645n                ==>lconst -8
n7549n          vsplats (VectorInt32)                                                         [     0x3ff9d170760] bci=[6,34,3363] rc=1 vc=0 vn=- li=- udi=- nc=1
n7548n            iconst 31 (X!=0 X>=0 cannotOverflow )                                       [     0x3ff9d170710] bci=[6,34,3363] rc=1 vc=7192 vn=- li=- udi=- nc=0 flg=0x1104
n1569n    ificmplt --> block_466 BBStart at n7383n ()                                         [     0x3ff9c44ba20] bci=[6,13,3362] rc=0 vc=7192 vn=- li=- udi=- nc=2 flg=0x20
n1619n      ==>isub
n7486n      iload  <temp slot 22>[#514  Auto] [flags 0x3 0x0 ]                                [     0x3ff9d16f3b0] bci=[6,13,3362] rc=1 vc=7192 vn=- li=- udi=- nc=0
n1568n    BBEnd </block_45> =====                                                             [     0x3ff9c44b9d0] bci=[6,9,3362] rc=0 vc=7047 vn=- li=- udi=- nc=0

@gita-omr Can you please give it a look as well in case I missed something?

Here are the instructions generated for the loop:

 [     0x3ff9d2f1c10]                          LGFR    GPR_0777,GPR_0773
 [     0x3ff9d2f1cf0]                          SLLG    GPR_0777,GPR_0777,2
 [     0x3ff9d2f1f30]                          LG      &GPR_0778, Parm[z<parm 2 [I>] ?+16(GPR5)
 [     0x3ff9d2f2010]                          VL      VRF_0776, Shadow[<array-shadow>] 8(GPR_0777,&GPR_0778)
 [     0x3ff9d2f21b0]                          VREPIF  VRF_0780,0x2
 [     0x3ff9d2f22a0]                          VMLF    VRF_0776,VRF_0776,VRF_0780
 [     0x3ff9d2f24f0]                          LGFR    GPR_0782,GPR_0774
 [     0x3ff9d2f25d0]                          SLLG    GPR_0782,GPR_0782,2
 [     0x3ff9d2f26d0]                          VL      VRF_0781, Shadow[<array-shadow>] 8(GPR_0782,&GPR_0778)
 [     0x3ff9d2f2920]                          VREPIF  VRF_0784,0x1f
 [     0x3ff9d2f2a10]                          VESRLVF VRF_0783,VRF_0781,VRF_0784
 [     0x3ff9d2f2b00]                          VO      VRF_0776,VRF_0776,VRF_0783
 [     0x3ff9d2f2ce0]                          LGFR    GPR_0773,GPR_0773
 [     0x3ff9d2f2dc0]                          SLLG    GPR_0773,GPR_0773,2
 [     0x3ff9d2f2ea0]                          VST     VRF_0776, Shadow[<array-shadow>] 8(GPR_0773,&GPR_0778)

I hand-checked them and they appear to match the trees and are doing the right thing. I looked at the trees and compared it to the the Java method and it looks ok as well. There must be something more subtle going on.

@fjeremic @VermaSh I saw this error message in trace file while working on the AutoSIMD bug. Not sure if they are be related, but maybe experienced eyes can interpret them better?

@fjeremic @VermaSh I saw this error message in trace file while working on the AutoSIMD bug. Not sure if they are be related, but maybe experienced eyes can interpret them better?

That isn't something to be concerned about. The memory reference logic artificially bumps node reference counts to prevent nodes from getting decremented past 0. It is just warning us this is happening and this is quite frequent in pretty much all trace files.

@VermaSh any updates on this one?

I worked with @r30shah yesterday to verify that the opt was indeed the one that's causing the failures and the failures do seem to be triggered only when the opt is enabled. We ran into some issues with gbd yesterday where it kept throwing segmentation fault(not related to the primetest failures). Got those fixed today and looking at the instructions in gdb right now. So far what I have found is that it enters the BigInteger.shiftLeftImp 1296 times before failing.

here are logs, both with and without the opt i missed something logs.zip

Command that reproduces the failure:

OpenJDK15-jdk_s390x_linux_openj9_2020-07-29-22-58/jdk-15+33/bin/java \
-Xjit:"verbose={compil*|inlining},vlog=SIMD_vlog,limit={java/math/BigInteger.implSquareToLen([II[II)[I},\
{java/math/BigInteger.implSquareToLen([II[II)[I}{hot}\
(traceFull,traceCG,log=SIMD_trace.log,lastOptIndex=131)" \
Test

Command for successful runs:

OpenJDK15-jdk_s390x_linux_openj9_2020-07-29-22-58/jdk-15+33/bin/java \
-Xjit:"verbose={compil*|inlining},vlog=no_SIMD_vlog,limit={java/math/BigInteger.implSquareToLen([II[II)[I},\
{java/math/BigInteger.implSquareToLen([II[II)[I}{hot}\
(traceFull,traceCG,log=no_SIMD_trace.log,lastOptIndex=131),disableAutoSIMD" \
Test

Unit Test:
to compile: OpenJDK15-jdk_s390x_linux_openj9_2020-07-29-22-58/jdk-15+33/bin/javac Test.java

import java.math.BigInteger;
public class Test {
    public static void main(String[] args) throws Exception {
        if (!checkMersennePrimes(100)) {
            throw new Exception("PrimeTest FAILED!");
        }
        System.out.println("PrimeTest succeeded!");
    }

    /**
     * Verifies whether a specified subset of Mersenne primes are correctly
     * identified as being prime. See
     * <a href="https://en.wikipedia.org/wiki/Mersenne_prime">Mersenne prime</a>
     * for more information.
     *
     * @return true if and only if the test succeeds
     */
    private static boolean checkMersennePrimes(int certainty) {
        int[] MERSENNE_EXPONENTS = {
            4253, // uncomment remaining array elements to make this test run a long time
            /* 4423, 9689, 9941, 11213, 19937, 21701, 23209, 44497,
            86243, 110503, 132049, 216091, 756839, 859433, 1257787, 1398269,
            2976221, 3021377, 6972593, 13466917, 20996011, 24036583, 25964951,
            30402457, 32582657, 37156667, 42643801, 43112609, 57885161 */
        };
        System.out.println("Checking first "+MERSENNE_EXPONENTS.length+" Mersenne primes");

        boolean result = true;
        for (int n : MERSENNE_EXPONENTS) {
            BigInteger mp = BigInteger.ONE.shiftLeft(n).subtract(BigInteger.ONE);
            if (!mp.isProbablePrime(certainty)) {
                System.err.println("Mp with p = "+n+" not classified as prime");
                result = false;
            }
        }

        return result;
    }
}

Sorry, somehow missed all the notifications. I will take a look at the trees. @IBMJimmyk could you please take a look as well?

"Here is the log file with the transformation.scorching_lastOpt_125_trace.txt.zip As for a trace file without the transformation. The method java/math/BigInteger.implSquareToLen([II[II)[I doesn't get compiled when SPMDKernelParallelization opt is disabled."

Does not this mean that the opt might not be at fault?

That was an error on my part. Both trace files are attached here https://github.com/eclipse/openj9/issues/9331#issuecomment-673533625

Thanks. Does the test case fail when the number of array elements is 1, as in the unit test above?

Actually, might not be as relevant since this is not the loop we vectorize.

I found the problem.

The node that increments the induction variable is getting commoned and it is screwing up the calculation.

One way or another the optimizer has figured out that newArr and oldArr are the same, newIdx always has the same value as oldIdx (same initial value, get incremented at the same time by the same amount) and that shiftCount is 1.

So block_45 before SPMD looks like this:

while (oldIdx < numIter) {
    z[oldIdx] = (z[oldIdx] * 2) | ( z[oldIdx + 1] >>> 31);
    oldIdx = oldIdx + 1;
}

After SPMD, block_466, block_467, block_468 and block_45 looks like this:

while (oldIdx < numIter) {
    // 3 more unrolled iterations go here
    z[oldIdx] = (z[oldIdx] * 2) | (z[oldIdx + 4] >>> 31); //but with vector instructions that do 4 iterations at a time
    oldIdx = oldIdx + 4;
}

Notice the z[oldIdx + 4]? The + 4 is wrong. The original oldIdx + 1 isub node got commoned. After vectorization the induction variable was changed to be incremented by 4 since 4 iterations were happening per vectorized iteration. However, since the node was commoned this was propagated inside the calculation as well and it shouldn't have been.

Off the top of my head, I believe this can be fixed by checking if the isub node that increments the induction variable has been commoned and duplicating it before chaninging it to increment by 4. I think @mnalam-p has looked at this specific area of code more recently than I have so he might have a better suggestion as well.

@IBMJimmyk I am still trying to setup the machine for z. I can see what you meant and agreed that uncommoning and duplicating is one way to fix it. However, we have to be careful as @gita-omr pointed out while working with the last auto simd fix, that we can't guarantee that it is 'The Node' that needs to be uncommoned. The generalized fix seems to be a little difficult. I will continue to investigate on this issue. Thank you, btw.

Yeah, I actually thought about it a bit more since my previous comment and uncommoning the node won't be as simple as I thought it would be.

In https://github.com/eclipse/openj9/pull/9947 we disabled AutoSIMD for induction variable, my understanding was that we disabled it in general to make sure that whether the issue is with trees or code-gen, it will need careful work to make this opt work. With that in mind, for the trees seen in this loop, this opt should work or it SIMD should be disabled in such case?

This is slightly different than the issue seen in #9947. In #9947 a node was being converted to a vector version of that operation and, due to commoning, this change was propagating to where the induction variable was being incremented when it shouldn't. #9947 disabled that behavior.

This particular case is sort of the reverse. When incrementing the induction variable (for int arrays) after vectorization the increment is changed from being "+1" to "+4". This is propagating into the actual calculations when trying to determine which array location to access and it shouldn't be.

It might be possible to make AutoSIMD work for the trees in this case but I would need to take a closer look. It might be better to disable AutoSIMD in this case as well.

Thanks a lot @IBMJimmyk for finding the problem. I also think it's different from our previous problem since, in this case, induction variable is only used inside the address expressions (array indices), except for the actual increment. So this is not related to vectorizing induction variable itself.

@IBMJimmyk The problematic loop you mentioned is invoked from here. I don't see how the new and old array can be the same.

The trees definitely think the two arrays are the same. Under the aladd node the first node is the same for all three loads. I also looked at the tracefile and at the point of inlining it thinks all three array accesses are for the same array.

@mnalam-p If you take a look at the log, the failing method is https://github.com/openjdk/jdk/blob/3fb8f4364d32c5172a02a7775b67bfb394224a57/src/java.base/share/classes/java/math/BigInteger.java#L2137 and it contains call to primitiveShiftLeft (https://github.com/openjdk/jdk/blob/8e4a4cdbe0d0e2270e0806d10e48ff678820bd9d/src/java.base/share/classes/java/math/BigInteger.java#L2631-L2637) In the method take a look at the argument passed to shiftLeftImplWorker (https://github.com/openjdk/jdk/blob/8e4a4cdbe0d0e2270e0806d10e48ff678820bd9d/src/java.base/share/classes/java/math/BigInteger.java#L2635)
So optimizer knows that newArr and oldArr is same as well as newIdx id 0.

@mnalam-p any updates on this?

@VermaSh I did reproduce the issue in JDK15 and implemented an initial fix, which generated correct IL and eventually passed any individual test run. However, the fix is not enough. It was still getting commoned with loop reduction calculation, yielding wrong result in longer run. I am currently investigating on this area on how to approach it.

@mnalam-p

I did reproduce the issue in JDK15 and implemented an initial fix, which generated correct IL and eventually passed any individual test run.

that's great!

However, the fix is not enough. It was still getting commoned with loop reduction calculation, yielding wrong result in longer run. I am currently investigating on this area on how to approach it.

Will you be able to make the September 15 deadline for 0.22.0 Java 15 release?

@VermaSh Based on my current understanding I should be able to make it, unless something more subtle emerges.

Is this stop ship? Will this close in the next couple of days? Or does it need to be moved out of the release?

The Milestone 2 build will be based on the Aug 30 code from the 0.22 branch. If this fix won't be in by that date, it becomes very hard to get it in after that

@DanHeidinga I should be able to make it before Aug 30.

The root cause of the issue is not the commoning of the node, rather the order of the IV increment and opcodes that needs to be vectorized.
In general cases, a loop
for (int i=0; i<n; i++) A[i] = B[i]
generates something like the following,
vstore A[i], B[i] istore i, i + 4 ificmplt i, n
Note that the i used in address calculation is incremented after the vector operation. However, in this particular code the IL is generated as following
istore i, i + 4 vstore A[i], B[i] <-- note even if "i" was not commoned, it would yield wrong result after vectorization ificmplt i, n

This looks like following -

for(; i<n;) { i++; A[i] = B[i];}

Now this ordering creates problem with the vectorization process, and it is not trivial as a) the node can be commoned with compare (where we need to use i + 4) and b) address calculation (where we need to use old value of i).
After discussing with @gita-omr we came to conclusion that for now we will disable this specific case until we finalize a proper solution.

Would it be possible to check in a deterministic JIT unit test for this as part of the fix? I can help to teach whomever how to do that. We have a bunch of existing unit tests already written where we specify pretty specific JIT options to make it 100% deterministic. This prevents future regressions in these areas and can catch other bugs related to the original change. There is an easy to follow example in #8402.

@fjeremic That would be really helpful :)

I did add the SIMD opt test in https://github.com/eclipse/openj9/pull/9908. It can be used as a guide to add the test (May be we can add another unit test in same test umbrella) ?

Kindly review the WIP PR.

I don't quite follow why commoning isn't a problem.

Say you have this:

for (int i=0; i<n; i++)
   A[i] = B[i+1]

Before vectorization you have something like this:

istore A[i], B[i+1]
istore i, i+1
ificmplt i, n

If i+1 gets commoned, after vectorization you might get this:

vstore A[i], B[i+4]
istore i, i+4
ificmplt i, n

The i+4 inside the index for B is wrong.

Maybe I was not clear, I meant the root cause is not the commoning (Without commoning it may happen). The commoning seems to add another level of complexity to the issue. Thanks for the insight. I think you might be right. I will try to write up a test case and see how it goes.

@IBMJimmyk I completely agree that the problem you described in https://github.com/eclipse/openj9/issues/9331#issuecomment-682105215 exists and needs to be fixed. But I still believe the problem @mnalam-p described is a separate problem, it definitely exists even without commoning. I think we can fix those two problems in separate PRs, especially since, so far, we were not able to reproduce the commoning problem (can happen theoretically but luckily some optimization behaviour prevents it).

@mnalam-p could you please open an issue for the commoning problem that @IBMJimmyk described?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

JamesKingdon picture JamesKingdon  路  5Comments

mikezhang1234567890 picture mikezhang1234567890  路  5Comments

ciplogic picture ciplogic  路  3Comments

dsouzai picture dsouzai  路  5Comments

AdamBrousseau picture AdamBrousseau  路  6Comments