Graal: GraalVM generates wrong result due to speculative optimization

Created on 22 May 2020  ยท  16Comments  ยท  Source: oracle/graal

Describe the issue
GraalVM generates wrong result due to speculative optimization.

Steps to reproduce the issue
Please include both build steps as well as run steps

  1. Run this code
    If you cannot reproduce it, you can simply increase the int size = 1000000 to a lager value in main() function to make the repro work.
import java.util.HashMap;

class InputRow {
  public long[] row;
  public int len;

  public InputRow(int size, long initValue) {
    len = size;
    row = new long[size];
    for (int i = 0; i < size; i++) {
      row[i] = initValue;
    }
  }

  public long getLong(int idx) {
    assert (idx < len);
    return row[idx];
  }
}

class SimpleInput {
  public InputRow[] rows;
  public int len;
  public int idx;

  public SimpleInput(int size) {
    len = size;
    rows = new InputRow[size];
    idx = 0;
    for (int i = 0; i < size - 3; i = i + 3) {
      rows[i] = new InputRow(10, 50);
      rows[i + 1] = new InputRow(10, -1);
      rows[i + 2] = new InputRow(10, 100);
    }
    rows[size - 1] = new InputRow(10, 10);
  }

  public boolean hasNext() {
    return idx < len;
  }

  public InputRow next() {
    InputRow ret = rows[idx];
    idx++;
    return ret;
  }
}

class OutputElement {
  public long value1;
  public String value2;

  public OutputElement() {
    value1 = 0;
    value2 = "init";
  }

  public void write(int idx, long value) {
    assert (idx == 0);
    value1 = value;
  }

  public void write(int idx, String value) {
    assert (idx == 1);
    value2 = value;
  }
}

class SimpleOutput {
  public OutputElement[] output;
  public int len;
  public int idx;

  public SimpleOutput(int size) {
    output = new OutputElement[size];
    len = size;
    idx = 0;
  }

  public void append(OutputElement element) {
    assert (idx < len);
    output[idx] = element;
    idx++;
  }

  public void print() {
    HashMap<String, Integer> map = new HashMap<String, Integer>();
    for (int i = 0; i < len; i++) {
      long v1 = output[i].value1;
      String v2 = output[i].value2;
      if (map.containsKey(v2)) {
        int tmp = map.get(v2);
        map.put(v2, (tmp + 1));
      } else {
        map.put(v2, 1);
      }
    }
    System.out.print(map);
  }
}

public class Main {
  static void processNext(SimpleInput input, SimpleOutput output) {
    while (input.hasNext()) {
      InputRow orc_scan_row = (InputRow) input.next();
      long orc_scan_value = orc_scan_row.getLong(0);
      String project_value1 = null;
      boolean project_value2 = false;
      if (orc_scan_value >= 0L) {
        project_value2 = orc_scan_value <= 35L;
      }
      if (project_value2) {
        project_value1 = "35";
      } else {
        if (orc_scan_value > 35L) {
          project_value1 = "90";
        } else {
          project_value1 = "Unknown";
        }
      }
      OutputElement element = new OutputElement();
      element.write(0, orc_scan_value);
      element.write(1, project_value1);
      output.append(element);
    }
  }

  public static void main(String[] args) {
    System.out.println("Hello World!");
    int size = 1000000;
    SimpleInput input = new SimpleInput(size);
    SimpleOutput output = new SimpleOutput(size);
    processNext(input, output);
    output.print();
  }
}
  1. Output of GraalVM
Hello World!
{Unknown=333334, 90=666666}
  1. Output of OpenJDK
Hello World!
{35=1, Unknown=333333, 90=666666}

Describe GraalVM and your environment:

  • GraalVM version (latest snapshot builds can be found here), or commit id if built from source: graalvm-ce-java8-19.3.1
  • JDK major version: [8]
  • OS: [macOS Catalina]
  • Architecture: [AMD64]

More details
-Dgraal.RemoveNeverExecutedCode=false could generate the correct result.

bug

Most helpful comment

I've investigated a bit more and suspect it's a problem in the code that converts a signed range test into an unsigned compare but am not yet 100% sure.
In any case, I can reliably reproduce now with:

java -XX:+UseJVMCICompiler -XX:-TieredCompilation -Dgraal.GraalCompileOnly=processNext -Xbatch Main

All 16 comments

cc @christhalinger

The bug could be reproduced on CentOS 7 as well.

Thanks for the report @helloguo . This looks like a bad compilation of Main.processNext since the bad result is always produced by java -Dgraal.GraalCompileOnly=processNext Main. We will investigate.

@dougxc @davleopo Thanks for looking at this issue.

One thing may worth mentioning is that the bug cannot be reproduced when I add -Dgraal.Dump -Dgraal.MethodFilter=*.processNext to dump the graphs. In this case, I have to increase int size = 1000000 to a lager value (e.g. 10000000) in main() function to make the reproduce case work. Now I have two sets of graphs: one is from the run which generates the correct result while the other one is from the run which generates the wrong result. When I compare the two sets of graphs, I'm not able to find the difference with IdealGraphVisualizer. Maybe I missed something there. Anyway, just share it here if it helps the investigation a bit.

While we are trying to narrow down the root cause, do you think -Dgraal.RemoveNeverExecutedCode=false is a proper short-term fix? The bug disappeared when I disabled RemoveNeverExecutedCode.

I've investigated a bit more and suspect it's a problem in the code that converts a signed range test into an unsigned compare but am not yet 100% sure.
In any case, I can reliably reproduce now with:

java -XX:+UseJVMCICompiler -XX:-TieredCompilation -Dgraal.GraalCompileOnly=processNext -Xbatch Main

BTW, I don't think -Dgraal.RemoveNeverExecutedCode=false is a reliable workaround.

The problem seems relate to "reordering two if statements"

_Before CanonicalizerPhase:_
Screen Shot 2020-05-23 at 7 28 40 PM

checkForUnsignedCompare() created unsigned-compare "382 if" node and looks correct. But "382 if" node's simplify() call triggered reordering putting "48 if" node before "382 if". So "382 if"'s trueSuccessor could never be reached.

_After checkForUnsignedCompare():_
Screen Shot 2020-05-23 at 11 36 55 PM

_After reordering:_
Screen Shot 2020-05-23 at 11 37 21 PM

_After CanonicalizerPhase:_
Screen Shot 2020-05-23 at 7 29 07 PM

Comment out line 304 - 338 will get the correct result.

It seems like prepareForSwap(tool, condition(), nextIf.condition()) doesn't handle signed comparison and unsigned comparison correctly.

compareA is node "381||<|": condition=BT, X=27|Invoke#Direct#InputRow.getLong, Y=38|Constant(36, i64)
compareB is node "39|<": condition=LT, X=27|Invoke#Direct#InputRow.getLong, Y=38|Constant(36, i64)
Can swap disjoint coditions on same values: BT and LT

But they are not disjoint condition because conditionA is unsigned.

To resolve the issue, we could either fix join, or skip comparing signed and unsigned in prepareForSwap.

@dougxc Please, also, revise merging of range check compare operations to work for arbitrary range constants.

As example b >= '0' && b <= '9' can be replaced by b + 2147483600 <= -2147483639.

Please, bellow see the code generated by GraalVM EE.

b >= '0' && b <= '9':

[info]   2.46%  โ”‚โ”‚โ”‚โ”‚โ”‚  โ”‚ โ”‚โ”‚    0x00007f039833640f: cmp    $0x30,%eax
[info]          โ”‚โ”‚โ”‚โ”‚โ”‚โ•ญ โ”‚ โ”‚โ”‚    0x00007f0398336412: jl     0x00007f0398336473
[info]   3.68%  โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚ โ”‚ โ”‚โ”‚    0x00007f0398336418: nopl   0x0(%rax,%rax,1)
[info]   2.50%  โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚ โ”‚ โ”‚โ”‚    0x00007f0398336420: cmp    $0x3a,%eax
[info]          โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ•ญโ”‚ โ”‚โ”‚    0x00007f0398336423: jae    0x00007f0398336473  ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
[info]          โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚ โ”‚โ”‚                                                  ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::parseLong@173 (line 1318)
[info]          โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚ โ”‚โ”‚                                                  ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::readLong@2 (line 376)
[info]          โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚ โ”‚โ”‚                                                  ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::d0@57 (line 63)
[info]          โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚ โ”‚โ”‚                                                  ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::decodeValue@3 (line 63)
[info]          โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚ โ”‚โ”‚                                                  ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::decodeValue@6 (line 63)
[info]          โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚ โ”‚โ”‚                                                  ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::read@46 (line 631)
[info]          โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚ โ”‚โ”‚                                                  ; - com.github.plokhotnyuk.jsoniter_scala.core.package$::readFromArray@15 (line 100)
[info]          โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚ โ”‚โ”‚                                                  ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.ArrayOfLongsReading::jsoniterScala@19 (line 36)
[info]          โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚ โ”‚โ”‚                                                  ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.generated.ArrayOfLongsReading_jsoniterScala_jmhTest::jsoniterScala_thrpt_jmhStub@1

b + 2147483600 <= -2147483639:

[info]   0.29%  โ”‚โ”‚     โ”‚   0x00007f149c3342fa: lea    0x7fffffd0(%rsi),%edx  ;*iadd {reexecute=0 rethrow=0 return_oop=0}
[info]          โ”‚โ”‚     โ”‚                                                 ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::parseLong@159 (line 1318)
[info]          โ”‚โ”‚     โ”‚                                                 ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::readLong@2 (line 376)
[info]          โ”‚โ”‚     โ”‚                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::d0@57 (line 63)
[info]          โ”‚โ”‚     โ”‚                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::decodeValue@3 (line 63)
[info]          โ”‚โ”‚     โ”‚                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::decodeValue@6 (line 63)
[info]          โ”‚โ”‚     โ”‚                                                 ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::read@46 (line 631)
[info]          โ”‚โ”‚     โ”‚                                                 ; - com.github.plokhotnyuk.jsoniter_scala.core.package$::readFromArray@15 (line 100)
[info]          โ”‚โ”‚     โ”‚                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.ArrayOfLongsReading::jsoniterScala@19 (line 36)
[info]          โ”‚โ”‚     โ”‚                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.generated.ArrayOfLongsReading_jsoniterScala_jmhTest::jsoniterScala_thrpt_jmhStub@17 (line 119)
[info]   0.44%  โ”‚โ”‚     โ”‚   0x00007f149c334300: cmp    $0x8000000a,%edx
[info]          โ”‚โ”‚โ•ญ    โ”‚   0x00007f149c334306: jge    0x00007f149c334465  ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
[info]          โ”‚โ”‚โ”‚    โ”‚                                                 ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::parseLong@171 (line 1318)
[info]          โ”‚โ”‚โ”‚    โ”‚                                                 ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::readLong@2 (line 376)
[info]          โ”‚โ”‚โ”‚    โ”‚                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::d0@57 (line 63)
[info]          โ”‚โ”‚โ”‚    โ”‚                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::decodeValue@3 (line 63)
[info]          โ”‚โ”‚โ”‚    โ”‚                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::decodeValue@6 (line 63)
[info]          โ”‚โ”‚โ”‚    โ”‚                                                 ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::read@46 (line 631)
[info]          โ”‚โ”‚โ”‚    โ”‚                                                 ; - com.github.plokhotnyuk.jsoniter_scala.core.package$::readFromArray@15 (line 100)
[info]          โ”‚โ”‚โ”‚    โ”‚                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.ArrayOfLongsReading::jsoniterScala@19 (line 36)
[info]          โ”‚โ”‚โ”‚    โ”‚                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.generated.ArrayOfLongsReading_jsoniterScala_jmhTest::jsoniterScala_thrpt_jmhStub@17 (line 119)

Tracked internally with GR-23729.

@plokhotnyuk what are you comparing with Before and After in your example?

@dougxc I have replaced captions to corresponding expressions in the comment above.

Applying this kind of optimization manually here I got up to 10% increase of throughput for parsing from the JSON to an array of long values on GraalVM CE/EE Java 8/11.

Thanks. We will investigate further what it takes to apply this optimization.

@dougxc Thank you a lot!

HotSpot JIT compiler already do this kind of optimization since JDK 8, but using mov + add operations instead of lea:

[info]   0.06%    โ”‚     โ”‚ โ”‚โ”‚โ”‚โ”‚  0x00007fea202ce92d: mov    %ecx,%edi
[info]   1.61%    โ”‚     โ”‚ โ”‚โ”‚โ”‚โ”‚  0x00007fea202ce92f: add    $0xffffffd0,%edi
[info]   0.90%    โ”‚     โ”‚ โ”‚โ”‚โ”‚โ”‚  0x00007fea202ce932: cmp    $0xa,%edi
[info]            โ”‚     โ”‚ โ•ฐโ”‚โ”‚โ”‚  0x00007fea202ce935: jae    0x00007fea202ce6a9  ;*if_icmpgt
[info]            โ”‚     โ”‚  โ”‚โ”‚โ”‚                                                ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::parseLong@165 (line 1318)
[info]            โ”‚     โ”‚  โ”‚โ”‚โ”‚                                                ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::readLong@2 (line 376)
[info]            โ”‚     โ”‚  โ”‚โ”‚โ”‚                                                ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::d0@57 (line 63)

@dougxc @tkrodriguez canonicalizeConditionalCascade also uses "join". But as commented, it intentionally makes unsigned and signed return null for "join", so the logic is correct in canonicalizeConditionalCascade.

I couldn't find any other usage of "join".

Yes the other cases are explicitly safe since they treat the null as meaning unknown. Fixed in 2dfa6ab92a02f4c018376190ced992be9dd066e3

Was this page helpful?
0 / 5 - 0 ratings