Openj9: JTReg test failure - zLinux : sun/nio/cs/TestStringCoding.java

Created on 22 Mar 2019  路  21Comments  路  Source: eclipse/openj9

Intermittent failure ...

Testing(sm=true) US-ASCII....
STDERR:
java.lang.RuntimeException: getBytes(csn) failed  -> US-ASCII
    at TestStringCoding.testGetBytes(TestStringCoding.java:164)
    at TestStringCoding.main(TestStringCoding.java:82)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:567)
    at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
    at java.base/java.lang.Thread.run(Thread.java:825)

JavaTest Message: Test threw exception: java.lang.RuntimeException: getBytes(csn) failed  -> US-ASCII

... of test https://github.com/ibmruntimes/openj9-openjdk-jdk12/blob/openj9/test/jdk/sun/nio/cs/TestStringCoding.java observed on zLinux.

10x run showing 20% failure rate : consoleText.txt (external)
10x run showing 0% failure rate : consoleText.txt (internal)
100x run showing 3% failure rate : consoleText.txt (internal)
100x run showing 3% failure rate : consoleText.txt (internal)

100x run with -Xint (internal) is in progress - will update this comment with results

z jit linux test failure

All 21 comments

@dchopra001 can you take a look at this guy? I'm not aware of any recent changes in this area. The test is not too informative. i.e. it does not tell us what the byte arrays contained. Thankfully the test looks easy to modify so you can alter it to print out the failing arrays and that may give us a hint of what went wrong.

The 100x run with -Xint (internal) is progressing extremely slow in comparison to the previous runs - 30+ minutes for each iteration - console log thus far : consoleText.txt
At this rate, we'll likely only get 20 iterations in before the 10 hour job time limit kicks in.
The first 4 iterations have passed but I'm not confident that tells us anything yet, given the observed failure rate without -Xint.

18 complete iterations without any failure from the requested 100x -Xint run (internal) in 10 hours. It was then aborted per the Jenkins job timeout. Final console log : consoleText.txt

@fjeremic @pshipton Let me know if there is anything else you need me to try.

I've managed to reproduce this at a 100% rate with a local/modified version of the StringCoding test. Here's a sample output of what the two byte arrays look like in a failing run:

Testing(sm=true) US-ASCII.... 
baSC(firstArray):
0 1 2 3 4 5 6 7 8 9 a b c d 3f 10 11 12 13 14 15 16 17 18 19 1a 1b
baNIO(secondArray):
0 1 2 3 4 5 3f 7 8 9 a b c d 3f 10 11 12 13 14 15 16 17 18 19 1a 1b
Exception in thread "main" java.lang.RuntimeException: getBytes(csn) failed  -> US-ASCII
    at TestStringCoding.testGetBytes(TestStringCoding.java:182)
    at TestStringCoding.main(TestStringCoding.java:85)

In the second array we have a 3f instead of 6. A similar pattern is seen among other failures. Instead of an actual failures we see 3f.

I'm trying to narrow the problem down further right now.

Any update here @dchopra001 ? Do we have any better understanding of the source of the mysterious 3f ?

Status Update

I have the problem narrowed down to the sub optimization that is causing the issue. I am still not sure if this is an optimization or codegen issue. Right now I am leaning towards an edge case being missed in the codegen. More details about the problem below.

During Value Propagation, we identify the icall node inside this block:

34822 n149n     BBStart <block_16>                                                                  [     0x3ff66aefe50] bci=[-1,142,194] rc=0 vc=702 vn=- li=- udi=- nc=0
 34823 n161n     treetop                                                                             [     0x3ff66af0210] bci=[-1,155,194] rc=0 vc=702 vn=- li=- udi=- nc=1
 34824 n160n       icall  sun/nio/cs/US_ASCII$Encoder.encodeASCII([CII[BI)I[#477  final special Method] [flags 0x20500 0x0 ] ()  [     0x3ff66af01c0] bci=[-1,155,194] rc=2 vc=702 vn=- li=- udi=- nc=6 flg=0x20
 34825 n152n         aload  this<'this' parm Lsun/nio/cs/US_ASCII$Encoder;>[#442  Parm] [flags 0x40000107 0x0 ] (X!=0 sharedMemory )  [     0x3ff66aeff40] bci=[-1,142,194] rc=1 vc=702 vn=- li=- udi=- nc=0 flg=0x4
 34826 n153n         aload  sa<auto slot 3>[#447  Auto] [flags 0x7 0x0 ]                             [     0x3ff66aeff90] bci=[-1,143,194] rc=1 vc=702 vn=- li=- udi=- nc=0
 34827 n154n         iload  sp<auto slot 4>[#452  Auto] [flags 0x3 0x0 ] (cannotOverflow )           [     0x3ff66aeffe0] bci=[-1,144,194] rc=1 vc=702 vn=- li=- udi=- nc=0 flg=0x1000
 34828 n157n         isub                                                                            [     0x3ff66af00d0] bci=[-1,150,194] rc=1 vc=702 vn=- li=- udi=- nc=2
 34829 n155n           iload  sl<auto slot 5>[#455  Auto] [flags 0x3 0x0 ] (cannotOverflow )         [     0x3ff66af0030] bci=[-1,146,194] rc=1 vc=702 vn=- li=- udi=- nc=0 flg=0x1000
 34830 n156n           iload  sp<auto slot 4>[#452  Auto] [flags 0x3 0x0 ] (cannotOverflow )         [     0x3ff66af0080] bci=[-1,148,194] rc=1 vc=702 vn=- li=- udi=- nc=0 flg=0x1000
 34831 n158n         aload  da<auto slot 6>[#462  Auto] [flags 0x7 0x0 ]                             [     0x3ff66af0120] bci=[-1,151,194] rc=1 vc=702 vn=- li=- udi=- nc=0
 34832 n159n         iload  dp<auto slot 7>[#467  Auto] [flags 0x3 0x0 ] (cannotOverflow )           [     0x3ff66af0170] bci=[-1,153,194] rc=1 vc=702 vn=- li=- udi=- nc=0 flg=0x1000
 34833 n162n     istore  n<auto slot 9>[#478  Auto] [flags 0x3 0x0 ]                                 [     0x3ff66af0260] bci=[-1,158,194] rc=0 vc=702 vn=- li=- udi=- nc=1
 34834 n160n       ==>icall
 34835 n166n     istore  sp<auto slot 4>[#452  Auto] [flags 0x3 0x0 ]                                [     0x3ff66af03a0] bci=[-1,165,195] rc=0 vc=702 vn=- li=- udi=- nc=1
 34836 n165n       iadd                                                                              [     0x3ff66af0350] bci=[-1,164,195] rc=1 vc=702 vn=- li=- udi=- nc=2
 34837 n163n         iload  sp<auto slot 4>[#452  Auto] [flags 0x3 0x0 ] (cannotOverflow )           [     0x3ff66af02b0] bci=[-1,160,195] rc=1 vc=702 vn=- li=- udi=- nc=0 flg=0x1000
 34838 n164n         iload  n<auto slot 9>[#478  Auto] [flags 0x3 0x0 ] (cannotOverflow )            [     0x3ff66af0300] bci=[-1,162,195] rc=1 vc=702 vn=- li=- udi=- nc=0 flg=0x1000
 34839 n170n     istore  dp<auto slot 7>[#467  Auto] [flags 0x3 0x0 ]                                [     0x3ff66af04e0] bci=[-1,172,196] rc=0 vc=702 vn=- li=- udi=- nc=1
 34840 n169n       iadd                                                                              [     0x3ff66af0490] bci=[-1,171,196] rc=1 vc=702 vn=- li=- udi=- nc=2
 34841 n167n         iload  dp<auto slot 7>[#467  Auto] [flags 0x3 0x0 ] (cannotOverflow )           [     0x3ff66af03f0] bci=[-1,167,196] rc=1 vc=702 vn=- li=- udi=- nc=0 flg=0x1000
 34842 n168n         iload  n<auto slot 9>[#478  Auto] [flags 0x3 0x0 ] (cannotOverflow )            [     0x3ff66af0440] bci=[-1,169,196] rc=1 vc=702 vn=- li=- udi=- nc=0 flg=0x1000
 34843 n175n     ificmpge --> block_22 BBStart at n15n ()                                            [     0x3ff66af0670] bci=[-1,178,198] rc=0 vc=702 vn=- li=- udi=- nc=2 flg=0x20
 34844 n171n       iload  sp<auto slot 4>[#452  Auto] [flags 0x3 0x0 ] (cannotOverflow )             [     0x3ff66af0530] bci=[-1,174,198] rc=1 vc=702 vn=- li=- udi=- nc=0 flg=0x1000
 34845 n172n       iload  sl<auto slot 5>[#455  Auto] [flags 0x3 0x0 ] (cannotOverflow )             [     0x3ff66af0580] bci=[-1,176,198] rc=1 vc=702 vn=- li=- udi=- nc=0 flg=0x1000
 34846 n150n     BBEnd </block_16> =====                                                             [     0x3ff66aefea0] bci=[-1,178,198] rc=0 vc=701 vn=- li=- udi=- nc=0

And inline it into an arraytranslate node. After tracing this optimization the trees show the following new arraytranslate post inlining (the tracing doesn't say much else about the inlining):

 38718 n4178n    BBStart <block_301>                                                                 [     0x3ff66efe9b0] bci=[-1,155,194] rc=0 vc=0 vn=- li=- udi=- nc=0
 38719 n4127n    treetop                                                                             [     0x3ff66d2d9b0] bci=[-1,155,194] rc=0 vc=702 vn=- li=- udi=- nc=1
 38720 n4153n      arraytranslate  <arraytranslate>[#381  helper Method] [flags 0x400 0x0 ] (char2byteXlate byteArrayXlate tableBackedByRawStorage )  [     0x3ff66efe1e0] bci=[-1,155,194] rc=2 vc=0 vn=- li=- udi=- nc=6 flg=0xa020
 38721 n4157n        aladd                                                                           [     0x3ff66efe320] bci=[-1,155,194] rc=1 vc=0 vn=- li=- udi=- nc=2
 38722 n4144n          aload  <temp slot 53>[#750  Auto] [flags 0x7 0x0 ]                            [     0x3ff66d2df00] bci=[-1,155,194] rc=1 vc=0 vn=- li=- udi=- nc=0
 38723 n4156n          ladd                                                                          [     0x3ff66efe2d0] bci=[-1,155,194] rc=1 vc=0 vn=- li=- udi=- nc=2
 38724 n4155n            lmul                                                                        [     0x3ff66efe280] bci=[-1,155,194] rc=1 vc=0 vn=- li=- udi=- nc=2
 38725 n4146n              i2l                                                                       [     0x3ff66d2dfa0] bci=[-1,155,194] rc=1 vc=0 vn=- li=- udi=- nc=1
 38726 n4145n                iload  <temp slot 52>[#749  Auto] [flags 0x10003 0x0 ]                  [     0x3ff66d2df50] bci=[-1,155,194] rc=1 vc=0 vn=- li=- udi=- nc=0
 38727 n4154n              lconst 2                                                                  [     0x3ff66efe230] bci=[-1,155,194] rc=1 vc=0 vn=- li=- udi=- nc=0
 38728 n4152n            lconst 8                                                                    [     0x3ff66efe190] bci=[-1,155,194] rc=2 vc=0 vn=- li=- udi=- nc=0
 38729 n4159n        aladd                                                                           [     0x3ff66efe3c0] bci=[-1,155,194] rc=1 vc=0 vn=- li=- udi=- nc=2
 38730 n4149n          aload  da<auto slot 6>[#462  Auto] [flags 0x7 0x0 ]                           [     0x3ff66efe0a0] bci=[-1,155,194] rc=1 vc=0 vn=- li=- udi=- nc=0
 38731 n4158n          ladd                                                                          [     0x3ff66efe370] bci=[-1,155,194] rc=1 vc=0 vn=- li=- udi=- nc=2
 38732 n4151n            i2l                                                                         [     0x3ff66efe140] bci=[-1,155,194] rc=1 vc=0 vn=- li=- udi=- nc=1
 38733 n4150n              iload  dp<auto slot 7>[#467  Auto] [flags 0x3 0x0 ]                       [     0x3ff66efe0f0] bci=[-1,155,194] rc=1 vc=0 vn=- li=- udi=- nc=0
 38734 n4152n            ==>lconst 8
 38735 n4160n        aconst NULL                                                                     [     0x3ff66efe410] bci=[-1,155,194] rc=1 vc=0 vn=- li=- udi=- nc=0
 38736 n4161n        iconst 255                                                                      [     0x3ff66efe460] bci=[-1,155,194] rc=1 vc=0 vn=- li=- udi=- nc=0
 38737 n4148n        i2l                                                                             [     0x3ff66efe050] bci=[-1,155,194] rc=1 vc=0 vn=- li=- udi=- nc=1
 38738 n4147n          iload  <temp slot 51>[#748  Auto] [flags 0x10003 0x0 ]                        [     0x3ff66efe000] bci=[-1,155,194] rc=1 vc=0 vn=- li=- udi=- nc=0
 38739 n4162n        iconst 127                                                                      [     0x3ff66efe4b0] bci=[-1,155,194] rc=1 vc=0 vn=- li=- udi=- nc=0
 38740 n4188n    istore  <temp slot 55>[#752  Auto] [flags 0x3 0x0 ]                                 [     0x3ff66efecd0] bci=[-1,155,194] rc=0 vc=0 vn=- li=- udi=- nc=1
 38741 n4153n      ==>arraytranslate
 38742 n4190n    asynccheck  jitCheckAsyncMessages[#22  helper Method] [flags 0x400 0x0 ]            [     0x3ff66efed70] bci=[-1,155,194] rc=0 vc=0 vn=- li=- udi=- nc=0
 38743 n4179n    BBEnd </block_301> =====                                                            [     0x3ff66efea00] bci=[-1,155,194] rc=0 vc=0 vn=- li=- udi=- nc=0

I'm looking into the instructions generated right now to see if we perhaps missed an edge case.

@andrewcraik FYI - I don't think this is related to optimizer but tagging you just in case you notice something in the trees above or have any other insight regarding the opt.

@dchopra001 are you able to hop in the debugger and validate what the JIT is doing at the codegen level? We will evaluate arraytranslate into some form of TRTO (translate) instruction or similar which uses a precomputed static table to do the translation. It's possible we have a codegen bug there assuming the optimizer transformed the trees correctly.

@fjeremic Yes, I'm trying to see what the JIT is doing incorrectly in a debugger.

This isn't related to TROT instructions though because we end up taking this branch and generating vector instructions to handle this workload instead:
https://github.com/eclipse/omr/blob/18846d220b2b0417d75e171b6baf5819ed8555ff/compiler/z/codegen/OMRTreeEvaluator.cpp#L10586-L10590

So the issue should be in the arraytranslateEncodeSIMDEvaluator.

Note: Currently a quick workaround to this problem is to specify the -Xjit:disableSIMDArrayTranslate option. (arraytranslateEvaluator is the only evaluator that is affected by this option in the z codegen)

For documentation purposes, here is how to reproduce the problem.

This problem can be reproduced 100% with the following Java program:

public class TestStringCoding {
    public static void helper() throws Throwable
    {
    Charset cs = Charset.availableCharsets().get("US-ASCII");
    System.out.println("temp: " + cs.name());
        int[] candidate  = {  5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 55296};  // This input creates the failure
        StringBuilder sb = new StringBuilder();
        StringBuilder sb2 = new StringBuilder();
        for (int ii = 0; ii < candidate.length; ii++)
           {
           sb.append((char)candidate[ii]);
           }
           testGetBytes(cs, sb.toString());
    }
    public static void main(String[] args) throws Throwable {
        helper();
    }

    static byte[] testGetBytes(Charset cs, String str) throws Throwable {
        byte[] baSC = str.getBytes(cs.name()); // correct
        //byte[] baSC = {0x5, 0x6, 0x7, 0x8, 0x9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x3f}; // For the failing case, the above getBytes() routine will return this array.

        CharsetEncoder enc = cs.newEncoder()
            .onMalformedInput(CodingErrorAction.REPLACE)
            .onUnmappableCharacter(CodingErrorAction.REPLACE);
        ByteBuffer bf = enc.reset().encode(CharBuffer.wrap(str.toCharArray()));
        byte[] baNIO = new byte[bf.limit()];
        bf.get(baNIO, 0, baNIO.length);

        if (!Arrays.equals(baSC, baNIO)) {
            System.out.println();
            System.out.println("- Expected Output (baSC):");
            for (int i = 0; i < baSC.length; i++)
               {
               System.out.print(Integer.toHexString(baSC[i]));
               System.out.print(", ");
               }
            System.out.println();
            System.out.println("Actual Output (baNIO)");
            for (int i = 0; i < baNIO.length; i++)
               {
               System.out.print(Integer.toHexString(baNIO[i]));
               System.out.print(", ");
               }
            System.out.println();
            throw new RuntimeException("getBytes(csn) failed  -> " + cs.name());
        }
        return baSC;
    }
}

The file can be compiled with the following command:

jdk-12+33/bin/javac --add-exports java.base/sun.nio.cs=ALL-UNNAMED -cp "sun/nio/" sun/nio/cs/TestStringCoding.java;

(When run from the openj9-openjdk-jdk12/test/jdk/) folder.

The command to run the test, reproduce the failure and collect log files is:

jdk-12+33/bin/java -Xcompressedrefs -Xmx512m -Xjit:verbose,vlog=tempVlog,disableSuffixLogs,{"*Encoder.encodeArrayLoop*"}\(lastOptIndex=24,traceFull,traceCG,log=trace.log,lastOptSubIndex=42,traceLocalVP,optLevel=scorching,count=0,disableAsyncCompilation\) TestStringCoding

The jdk version I'm using is:

openjdk version "12" 2019-03-19
OpenJDK Runtime Environment AdoptOpenJDK (build 12+33)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.13.0, JRE 12 Linux s390x-64-Bit Compressed References 20190320_31 (JIT enabled, AOT enabled)
OpenJ9   - caeb51f87
OMR      - 33a33ff2
JCL      - 419f571019 based on jdk-12+33)

@pshipton I don't think this needs to be rushed into 0.14.0. This bug has existed for ~3 years now and is not limited for JDK12.

Below is some background on how this problem was exposed on Java:

The error happens when we inline sun/nio/cs/US_ASCII$Encoder.encodeASCII([CII[BI)I[ into an arraytranslate node.

This routine essentially translates an array of 2-byte chars into an array of bytes. If during the translation the function encounters a value that is saturated (i.e. val > 127), then it stops translation and returns the number of chars it was able to translate. The caller of this routine then normalizes the saturated value, and then calls this function again to translate the rest of the array. This process repeats until there are no more remaining chars left to translate.

For example, consider the following int array of length=26:
{5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 55296, 29};

(Note: this array must be cast to a char array first).

To translate this array the caller will first provide a reference to the beginning of the array where the value is 5. The encodeASCII routine will be able to translate all values up to and including index 23 (it stops at 24 because array[24] = 55296 which is a saturated value). The function will return 24 as it was able to translate 24 values. At this point, the caller will normalize 55296 to 3f (which is the 25th value, and then call encodeASCII again but with the starting reference to the 26th value (array[25]=29). This process continues until we have translated all chars in the array.

To accelerate this workload, we inline this encodeASCII call into an arraytranslate node and generate vector instructions. These are generated inside: OMR::Z::TreeEvaluator::arraytranslateEncodeSIMDEvaluator.

For the above input array, the evaluator will load the first 16 chars into 2 input vector registers: v1 and v2 (Note: Each char is 2-bytes in length and so each vector can only hold up to 8 chars at a time). It will then use the VSTRC instruction available on z13 and above architectures to determine if any of the 2 vectors contain a saturated value. For the above scenario, there are no saturated values in the first 16 chars, so we then move on to process the remaining 10 chars. This is done by again loading the first 8 chars in v1 and the last two chars in v2. We then use VSTRC to detect if v1 or v2 have any saturated chars. Since v2 does have a saturated char, VSTRC will indicate the position of this character in another vector register, v0, and branch to an internal control flow that we generate here:
https://github.com/eclipse/omr/blob/8355cf729167f73259eb113f944e808e6c1a2c8c/compiler/z/codegen/OMRTreeEvaluator.cpp#L15225-L15254
(for convenience I'm pasting a copy of the code here)

// ----------------- Incoming branch -----------------

   // Case 2: The 2nd vector register contained a saturated char
   generateS390LabelInstruction(cg, TR::InstOpCode::LABEL, node, processSaturatedInput2);

   // Extract the index of the first saturated char in the 2nd vector register
   generateVRScInstruction(cg, TR::InstOpCode::VLGV, node, inputLen, vSaturated, generateS390MemoryReference(7, cg), 0);

   // Divide by 2 as the unit length of each element in the vector register is 2 bytes
   generateRSInstruction(cg, TR::InstOpCode::getShiftRightLogicalSingleOpCode(), node, inputLen, inputLen, 1);

   // VLL and VSTL work on indices so we must subtract 1 (add 8 as saturation happened in the second input vector)
   generateRIEInstruction(cg, TR::InstOpCode::getAddLogicalRegRegImmediateOpCode(), node, inputLenMinus1, inputLen, 7);

As can be seen above, we use VLGV and SRLG (via getShiftRightLogicalSingleOpCode) to calculate the position of the saturated char in the second vector. In our case, this would make the tegister inputLen=0 to indicate the saturated character is in the first halfword of v2.

After a few instructions we generate an instruction to add the value in inputLen to the total number of translated chars:
generateRRInstruction (cg, TR::InstOpCode::getAddRegOpCode(), node, translated, inputLen);

This translated register represents the return value of the function. As is evident, we have an incorrect return value in the register. Based on the above example, translated is 16 when it should be 24 (We update translated to 16 when we process the first 16 chars). As a consequence of this, the caller will assume the 17th value in the array is a saturated value and normalize it to 3f. In our above example, the 17th value (at index 16) is not a saturated value as it is less than 127. This leads to incorrect behaviour and similar output to the one seen in this issue.

The problem then is that inputLen is being used as a total count for the number of characters we translated in the residue. However that is not what it represents when the saturated char is in the second input vector v2. In the scenario described above, inputLen didn't account for the 8 characters that were successfully translated in the first input vector v1. When such a scenario happens, we must add 8 to either inputLen or the translated register to indicate this.

I've opened up a PR to fix this here:
https://github.com/eclipse/omr/pull/3719

I don't think this needs to be rushed into 0.14.0

ok, removing the milestone target.

Is it worth adding the test case into the test suite to avoid future regressions?

@dchopra001 wow, well done! Excellent analysis and write up of the problem. Reads take note. Thanks!

Is it worth adding the test case into the test suite to avoid future regressions?

Yes I would agree. We should open up an issue to write up some unit tests for this to cover such corner cases.

I've merged the fix. @dchopra001 can you please see the linked PRs which excluded the tests and submit a change to re-enable them? We can close this issue once the tests are re-enabled.

Test reincluded: https://github.com/AdoptOpenJDK/openjdk-tests/pull/1069 (thanks @dchopra001 !)

Issue to write tril tests for the arraytranslate node:
https://github.com/eclipse/omr/issues/3729

Excellent. Thank you very much! I think we're done here now.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

PowerUser1234 picture PowerUser1234  路  3Comments

mikezhang1234567890 picture mikezhang1234567890  路  5Comments

AdamBrousseau picture AdamBrousseau  路  6Comments

hrstoyanov picture hrstoyanov  路  4Comments

xliang6 picture xliang6  路  3Comments