Graal: GraalVM generates wrong result due to speculative optimization

Created on 22 May 2020 · 16Comments · Source: oracle/graal

Describe the issue
GraalVM generates wrong result due to speculative optimization.

Steps to reproduce the issue
Please include both build steps as well as run steps

Run this code
If you cannot reproduce it, you can simply increase the int size = 1000000 to a lager value in main() function to make the repro work.

import java.util.HashMap;

class InputRow {
  public long[] row;
  public int len;

  public InputRow(int size, long initValue) {
    len = size;
    row = new long[size];
    for (int i = 0; i < size; i++) {
      row[i] = initValue;
    }
  }

  public long getLong(int idx) {
    assert (idx < len);
    return row[idx];
  }
}

class SimpleInput {
  public InputRow[] rows;
  public int len;
  public int idx;

  public SimpleInput(int size) {
    len = size;
    rows = new InputRow[size];
    idx = 0;
    for (int i = 0; i < size - 3; i = i + 3) {
      rows[i] = new InputRow(10, 50);
      rows[i + 1] = new InputRow(10, -1);
      rows[i + 2] = new InputRow(10, 100);
    }
    rows[size - 1] = new InputRow(10, 10);
  }

  public boolean hasNext() {
    return idx < len;
  }

  public InputRow next() {
    InputRow ret = rows[idx];
    idx++;
    return ret;
  }
}

class OutputElement {
  public long value1;
  public String value2;

  public OutputElement() {
    value1 = 0;
    value2 = "init";
  }

  public void write(int idx, long value) {
    assert (idx == 0);
    value1 = value;
  }

  public void write(int idx, String value) {
    assert (idx == 1);
    value2 = value;
  }
}

class SimpleOutput {
  public OutputElement[] output;
  public int len;
  public int idx;

  public SimpleOutput(int size) {
    output = new OutputElement[size];
    len = size;
    idx = 0;
  }

  public void append(OutputElement element) {
    assert (idx < len);
    output[idx] = element;
    idx++;
  }

  public void print() {
    HashMap<String, Integer> map = new HashMap<String, Integer>();
    for (int i = 0; i < len; i++) {
      long v1 = output[i].value1;
      String v2 = output[i].value2;
      if (map.containsKey(v2)) {
        int tmp = map.get(v2);
        map.put(v2, (tmp + 1));
      } else {
        map.put(v2, 1);
      }
    }
    System.out.print(map);
  }
}

public class Main {
  static void processNext(SimpleInput input, SimpleOutput output) {
    while (input.hasNext()) {
      InputRow orc_scan_row = (InputRow) input.next();
      long orc_scan_value = orc_scan_row.getLong(0);
      String project_value1 = null;
      boolean project_value2 = false;
      if (orc_scan_value >= 0L) {
        project_value2 = orc_scan_value <= 35L;
      }
      if (project_value2) {
        project_value1 = "35";
      } else {
        if (orc_scan_value > 35L) {
          project_value1 = "90";
        } else {
          project_value1 = "Unknown";
        }
      }
      OutputElement element = new OutputElement();
      element.write(0, orc_scan_value);
      element.write(1, project_value1);
      output.append(element);
    }
  }

  public static void main(String[] args) {
    System.out.println("Hello World!");
    int size = 1000000;
    SimpleInput input = new SimpleInput(size);
    SimpleOutput output = new SimpleOutput(size);
    processNext(input, output);
    output.print();
  }
}

Output of GraalVM

Hello World!
{Unknown=333334, 90=666666}

Output of OpenJDK

Hello World!
{35=1, Unknown=333333, 90=666666}

Describe GraalVM and your environment:

GraalVM version (latest snapshot builds can be found here), or commit id if built from source: graalvm-ce-java8-19.3.1
JDK major version: [8]
OS: [macOS Catalina]
Architecture: [AMD64]

More details
-Dgraal.RemoveNeverExecutedCode=false could generate the correct result.

bug

Source

helloguo

👍2

Most helpful comment

I've investigated a bit more and suspect it's a problem in the code that converts a signed range test into an unsigned compare but am not yet 100% sure.
In any case, I can reliably reproduce now with:

java -XX:+UseJVMCICompiler -XX:-TieredCompilation -Dgraal.GraalCompileOnly=processNext -Xbatch Main

dougxc on 23 May 2020

👍2

All 16 comments

cc @christhalinger

helloguo on 22 May 2020

The bug could be reproduced on CentOS 7 as well.

helloguo on 22 May 2020

Thanks for the report @helloguo . This looks like a bad compilation of Main.processNext since the bad result is always produced by java -Dgraal.GraalCompileOnly=processNext Main. We will investigate.

dougxc on 23 May 2020

@dougxc @davleopo Thanks for looking at this issue.

One thing may worth mentioning is that the bug cannot be reproduced when I add -Dgraal.Dump -Dgraal.MethodFilter=*.processNext to dump the graphs. In this case, I have to increase int size = 1000000 to a lager value (e.g. 10000000) in main() function to make the reproduce case work. Now I have two sets of graphs: one is from the run which generates the correct result while the other one is from the run which generates the wrong result. When I compare the two sets of graphs, I'm not able to find the difference with IdealGraphVisualizer. Maybe I missed something there. Anyway, just share it here if it helps the investigation a bit.

While we are trying to narrow down the root cause, do you think -Dgraal.RemoveNeverExecutedCode=false is a proper short-term fix? The bug disappeared when I disabled RemoveNeverExecutedCode.

helloguo on 23 May 2020

java -XX:+UseJVMCICompiler -XX:-TieredCompilation -Dgraal.GraalCompileOnly=processNext -Xbatch Main

dougxc on 23 May 2020

👍2

BTW, I don't think -Dgraal.RemoveNeverExecutedCode=false is a reliable workaround.

dougxc on 23 May 2020

👍1

The problem seems relate to "reordering two if statements"

_Before CanonicalizerPhase:_
Screen Shot 2020-05-23 at 7 28 40 PM

checkForUnsignedCompare() created unsigned-compare "382 if" node and looks correct. But "382 if" node's simplify() call triggered reordering putting "48 if" node before "382 if". So "382 if"'s trueSuccessor could never be reached.

_After checkForUnsignedCompare():_
Screen Shot 2020-05-23 at 11 36 55 PM

_After reordering:_
Screen Shot 2020-05-23 at 11 37 21 PM

_After CanonicalizerPhase:_
Screen Shot 2020-05-23 at 7 29 07 PM

Comment out line 304 - 338 will get the correct result.

lic9 on 24 May 2020

👍1

It seems like prepareForSwap(tool, condition(), nextIf.condition()) doesn't handle signed comparison and unsigned comparison correctly.

compareA is node "381||<|": condition=BT, X=27|Invoke#Direct#InputRow.getLong, Y=38|Constant(36, i64)
compareB is node "39|<": condition=LT, X=27|Invoke#Direct#InputRow.getLong, Y=38|Constant(36, i64)
Can swap disjoint coditions on same values: BT and LT

But they are not disjoint condition because conditionA is unsigned.

To resolve the issue, we could either fix join, or skip comparing signed and unsigned in prepareForSwap.

lic9 on 24 May 2020

@dougxc Please, also, revise merging of range check compare operations to work for arbitrary range constants.

As example b >= '0' && b <= '9' can be replaced by b + 2147483600 <= -2147483639.

Please, bellow see the code generated by GraalVM EE.

`b >= '0' && b <= '9'`:

[info]   2.46%  │││││  │ ││    0x00007f039833640f: cmp    $0x30,%eax
[info]          │││││╭ │ ││    0x00007f0398336412: jl     0x00007f0398336473
[info]   3.68%  ││││││ │ ││    0x00007f0398336418: nopl   0x0(%rax,%rax,1)
[info]   2.50%  ││││││ │ ││    0x00007f0398336420: cmp    $0x3a,%eax
[info]          ││││││╭│ ││    0x00007f0398336423: jae    0x00007f0398336473  ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
[info]          ││││││││ ││                                                  ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::parseLong@173 (line 1318)
[info]          ││││││││ ││                                                  ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::readLong@2 (line 376)
[info]          ││││││││ ││                                                  ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::d0@57 (line 63)
[info]          ││││││││ ││                                                  ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::decodeValue@3 (line 63)
[info]          ││││││││ ││                                                  ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::decodeValue@6 (line 63)
[info]          ││││││││ ││                                                  ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::read@46 (line 631)
[info]          ││││││││ ││                                                  ; - com.github.plokhotnyuk.jsoniter_scala.core.package$::readFromArray@15 (line 100)
[info]          ││││││││ ││                                                  ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.ArrayOfLongsReading::jsoniterScala@19 (line 36)
[info]          ││││││││ ││                                                  ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.generated.ArrayOfLongsReading_jsoniterScala_jmhTest::jsoniterScala_thrpt_jmhStub@1

`b + 2147483600 <= -2147483639`:

[info]   0.29%  ││     │   0x00007f149c3342fa: lea    0x7fffffd0(%rsi),%edx  ;*iadd {reexecute=0 rethrow=0 return_oop=0}
[info]          ││     │                                                 ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::parseLong@159 (line 1318)
[info]          ││     │                                                 ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::readLong@2 (line 376)
[info]          ││     │                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::d0@57 (line 63)
[info]          ││     │                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::decodeValue@3 (line 63)
[info]          ││     │                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::decodeValue@6 (line 63)
[info]          ││     │                                                 ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::read@46 (line 631)
[info]          ││     │                                                 ; - com.github.plokhotnyuk.jsoniter_scala.core.package$::readFromArray@15 (line 100)
[info]          ││     │                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.ArrayOfLongsReading::jsoniterScala@19 (line 36)
[info]          ││     │                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.generated.ArrayOfLongsReading_jsoniterScala_jmhTest::jsoniterScala_thrpt_jmhStub@17 (line 119)
[info]   0.44%  ││     │   0x00007f149c334300: cmp    $0x8000000a,%edx
[info]          ││╭    │   0x00007f149c334306: jge    0x00007f149c334465  ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
[info]          │││    │                                                 ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::parseLong@171 (line 1318)
[info]          │││    │                                                 ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::readLong@2 (line 376)
[info]          │││    │                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::d0@57 (line 63)
[info]          │││    │                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::decodeValue@3 (line 63)
[info]          │││    │                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::decodeValue@6 (line 63)
[info]          │││    │                                                 ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::read@46 (line 631)
[info]          │││    │                                                 ; - com.github.plokhotnyuk.jsoniter_scala.core.package$::readFromArray@15 (line 100)
[info]          │││    │                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.ArrayOfLongsReading::jsoniterScala@19 (line 36)
[info]          │││    │                                                 ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.generated.ArrayOfLongsReading_jsoniterScala_jmhTest::jsoniterScala_thrpt_jmhStub@17 (line 119)

plokhotnyuk on 24 May 2020

Tracked internally with GR-23729.

dougxc on 24 May 2020

@plokhotnyuk what are you comparing with Before and After in your example?

dougxc on 24 May 2020

@dougxc I have replaced captions to corresponding expressions in the comment above.

Applying this kind of optimization manually here I got up to 10% increase of throughput for parsing from the JSON to an array of long values on GraalVM CE/EE Java 8/11.

plokhotnyuk on 24 May 2020

Thanks. We will investigate further what it takes to apply this optimization.

dougxc on 24 May 2020

@dougxc Thank you a lot!

HotSpot JIT compiler already do this kind of optimization since JDK 8, but using mov + add operations instead of lea:

[info]   0.06%    │     │ ││││  0x00007fea202ce92d: mov    %ecx,%edi
[info]   1.61%    │     │ ││││  0x00007fea202ce92f: add    $0xffffffd0,%edi
[info]   0.90%    │     │ ││││  0x00007fea202ce932: cmp    $0xa,%edi
[info]            │     │ ╰│││  0x00007fea202ce935: jae    0x00007fea202ce6a9  ;*if_icmpgt
[info]            │     │  │││                                                ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::parseLong@165 (line 1318)
[info]            │     │  │││                                                ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::readLong@2 (line 376)
[info]            │     │  │││                                                ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::d0@57 (line 63)

plokhotnyuk on 24 May 2020

@dougxc @tkrodriguez canonicalizeConditionalCascade also uses "join". But as commented, it intentionally makes unsigned and signed return null for "join", so the logic is correct in canonicalizeConditionalCascade.

I couldn't find any other usage of "join".

lic9 on 24 May 2020

Yes the other cases are explicitly safe since they treat the null as meaning unknown. Fixed in 2dfa6ab92a02f4c018376190ced992be9dd066e3

tkrodriguez on 29 May 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Extending native applications with Java and JS

guaporocco · 3Comments

Error: Already registered: java.lang.reflect.Method.getDefaultValue()

gersonimo · 3Comments

Unused GraalOptions

helloguo · 3Comments

Build fails by "Error: No instances of java.io.FilePermission are allowed in the image heap as this class should be initialized at image runtime."

oroppas · 3Comments

graalvm 0.31 and R language

himanshumps · 3Comments

Graal: GraalVM generates wrong result due to speculative optimization

Most helpful comment

All 16 comments

b >= '0' && b <= '9':

b + 2147483600 <= -2147483639:

Related issues

`b >= '0' && b <= '9'`:

`b + 2147483600 <= -2147483639`: