Describe the issue
GraalVM generates wrong result due to speculative optimization.
Steps to reproduce the issue
Please include both build steps as well as run steps
int size = 1000000 to a lager value in main() function to make the repro work.import java.util.HashMap;
class InputRow {
public long[] row;
public int len;
public InputRow(int size, long initValue) {
len = size;
row = new long[size];
for (int i = 0; i < size; i++) {
row[i] = initValue;
}
}
public long getLong(int idx) {
assert (idx < len);
return row[idx];
}
}
class SimpleInput {
public InputRow[] rows;
public int len;
public int idx;
public SimpleInput(int size) {
len = size;
rows = new InputRow[size];
idx = 0;
for (int i = 0; i < size - 3; i = i + 3) {
rows[i] = new InputRow(10, 50);
rows[i + 1] = new InputRow(10, -1);
rows[i + 2] = new InputRow(10, 100);
}
rows[size - 1] = new InputRow(10, 10);
}
public boolean hasNext() {
return idx < len;
}
public InputRow next() {
InputRow ret = rows[idx];
idx++;
return ret;
}
}
class OutputElement {
public long value1;
public String value2;
public OutputElement() {
value1 = 0;
value2 = "init";
}
public void write(int idx, long value) {
assert (idx == 0);
value1 = value;
}
public void write(int idx, String value) {
assert (idx == 1);
value2 = value;
}
}
class SimpleOutput {
public OutputElement[] output;
public int len;
public int idx;
public SimpleOutput(int size) {
output = new OutputElement[size];
len = size;
idx = 0;
}
public void append(OutputElement element) {
assert (idx < len);
output[idx] = element;
idx++;
}
public void print() {
HashMap<String, Integer> map = new HashMap<String, Integer>();
for (int i = 0; i < len; i++) {
long v1 = output[i].value1;
String v2 = output[i].value2;
if (map.containsKey(v2)) {
int tmp = map.get(v2);
map.put(v2, (tmp + 1));
} else {
map.put(v2, 1);
}
}
System.out.print(map);
}
}
public class Main {
static void processNext(SimpleInput input, SimpleOutput output) {
while (input.hasNext()) {
InputRow orc_scan_row = (InputRow) input.next();
long orc_scan_value = orc_scan_row.getLong(0);
String project_value1 = null;
boolean project_value2 = false;
if (orc_scan_value >= 0L) {
project_value2 = orc_scan_value <= 35L;
}
if (project_value2) {
project_value1 = "35";
} else {
if (orc_scan_value > 35L) {
project_value1 = "90";
} else {
project_value1 = "Unknown";
}
}
OutputElement element = new OutputElement();
element.write(0, orc_scan_value);
element.write(1, project_value1);
output.append(element);
}
}
public static void main(String[] args) {
System.out.println("Hello World!");
int size = 1000000;
SimpleInput input = new SimpleInput(size);
SimpleOutput output = new SimpleOutput(size);
processNext(input, output);
output.print();
}
}
Hello World!
{Unknown=333334, 90=666666}
Hello World!
{35=1, Unknown=333333, 90=666666}
Describe GraalVM and your environment:
More details
-Dgraal.RemoveNeverExecutedCode=false could generate the correct result.
cc @christhalinger
The bug could be reproduced on CentOS 7 as well.
Thanks for the report @helloguo . This looks like a bad compilation of Main.processNext since the bad result is always produced by java -Dgraal.GraalCompileOnly=processNext Main. We will investigate.
@dougxc @davleopo Thanks for looking at this issue.
One thing may worth mentioning is that the bug cannot be reproduced when I add -Dgraal.Dump -Dgraal.MethodFilter=*.processNext to dump the graphs. In this case, I have to increase int size = 1000000 to a lager value (e.g. 10000000) in main() function to make the reproduce case work. Now I have two sets of graphs: one is from the run which generates the correct result while the other one is from the run which generates the wrong result. When I compare the two sets of graphs, I'm not able to find the difference with IdealGraphVisualizer. Maybe I missed something there. Anyway, just share it here if it helps the investigation a bit.
While we are trying to narrow down the root cause, do you think -Dgraal.RemoveNeverExecutedCode=false is a proper short-term fix? The bug disappeared when I disabled RemoveNeverExecutedCode.
I've investigated a bit more and suspect it's a problem in the code that converts a signed range test into an unsigned compare but am not yet 100% sure.
In any case, I can reliably reproduce now with:
java -XX:+UseJVMCICompiler -XX:-TieredCompilation -Dgraal.GraalCompileOnly=processNext -Xbatch Main
BTW, I don't think -Dgraal.RemoveNeverExecutedCode=false is a reliable workaround.
The problem seems relate to "reordering two if statements"
_Before CanonicalizerPhase:_

checkForUnsignedCompare() created unsigned-compare "382 if" node and looks correct. But "382 if" node's simplify() call triggered reordering putting "48 if" node before "382 if". So "382 if"'s trueSuccessor could never be reached.
_After checkForUnsignedCompare():_

_After reordering:_

_After CanonicalizerPhase:_

Comment out line 304 - 338 will get the correct result.
It seems like prepareForSwap(tool, condition(), nextIf.condition()) doesn't handle signed comparison and unsigned comparison correctly.
compareA is node "381||<|": condition=BT, X=27|Invoke#Direct#InputRow.getLong, Y=38|Constant(36, i64)
compareB is node "39|<": condition=LT, X=27|Invoke#Direct#InputRow.getLong, Y=38|Constant(36, i64)
Can swap disjoint coditions on same values: BT and LT
But they are not disjoint condition because conditionA is unsigned.
To resolve the issue, we could either fix join, or skip comparing signed and unsigned in prepareForSwap.
@dougxc Please, also, revise merging of range check compare operations to work for arbitrary range constants.
As example b >= '0' && b <= '9' can be replaced by b + 2147483600 <= -2147483639.
Please, bellow see the code generated by GraalVM EE.
b >= '0' && b <= '9':[info] 2.46% โโโโโ โ โโ 0x00007f039833640f: cmp $0x30,%eax
[info] โโโโโโญ โ โโ 0x00007f0398336412: jl 0x00007f0398336473
[info] 3.68% โโโโโโ โ โโ 0x00007f0398336418: nopl 0x0(%rax,%rax,1)
[info] 2.50% โโโโโโ โ โโ 0x00007f0398336420: cmp $0x3a,%eax
[info] โโโโโโโญโ โโ 0x00007f0398336423: jae 0x00007f0398336473 ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
[info] โโโโโโโโ โโ ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::parseLong@173 (line 1318)
[info] โโโโโโโโ โโ ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::readLong@2 (line 376)
[info] โโโโโโโโ โโ ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::d0@57 (line 63)
[info] โโโโโโโโ โโ ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::decodeValue@3 (line 63)
[info] โโโโโโโโ โโ ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::decodeValue@6 (line 63)
[info] โโโโโโโโ โโ ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::read@46 (line 631)
[info] โโโโโโโโ โโ ; - com.github.plokhotnyuk.jsoniter_scala.core.package$::readFromArray@15 (line 100)
[info] โโโโโโโโ โโ ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.ArrayOfLongsReading::jsoniterScala@19 (line 36)
[info] โโโโโโโโ โโ ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.generated.ArrayOfLongsReading_jsoniterScala_jmhTest::jsoniterScala_thrpt_jmhStub@1
b + 2147483600 <= -2147483639:[info] 0.29% โโ โ 0x00007f149c3342fa: lea 0x7fffffd0(%rsi),%edx ;*iadd {reexecute=0 rethrow=0 return_oop=0}
[info] โโ โ ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::parseLong@159 (line 1318)
[info] โโ โ ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::readLong@2 (line 376)
[info] โโ โ ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::d0@57 (line 63)
[info] โโ โ ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::decodeValue@3 (line 63)
[info] โโ โ ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::decodeValue@6 (line 63)
[info] โโ โ ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::read@46 (line 631)
[info] โโ โ ; - com.github.plokhotnyuk.jsoniter_scala.core.package$::readFromArray@15 (line 100)
[info] โโ โ ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.ArrayOfLongsReading::jsoniterScala@19 (line 36)
[info] โโ โ ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.generated.ArrayOfLongsReading_jsoniterScala_jmhTest::jsoniterScala_thrpt_jmhStub@17 (line 119)
[info] 0.44% โโ โ 0x00007f149c334300: cmp $0x8000000a,%edx
[info] โโโญ โ 0x00007f149c334306: jge 0x00007f149c334465 ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
[info] โโโ โ ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::parseLong@171 (line 1318)
[info] โโโ โ ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::readLong@2 (line 376)
[info] โโโ โ ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::d0@57 (line 63)
[info] โโโ โ ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::decodeValue@3 (line 63)
[info] โโโ โ ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::decodeValue@6 (line 63)
[info] โโโ โ ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::read@46 (line 631)
[info] โโโ โ ; - com.github.plokhotnyuk.jsoniter_scala.core.package$::readFromArray@15 (line 100)
[info] โโโ โ ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.ArrayOfLongsReading::jsoniterScala@19 (line 36)
[info] โโโ โ ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.generated.ArrayOfLongsReading_jsoniterScala_jmhTest::jsoniterScala_thrpt_jmhStub@17 (line 119)
Tracked internally with GR-23729.
@plokhotnyuk what are you comparing with Before and After in your example?
@dougxc I have replaced captions to corresponding expressions in the comment above.
Applying this kind of optimization manually here I got up to 10% increase of throughput for parsing from the JSON to an array of long values on GraalVM CE/EE Java 8/11.
Thanks. We will investigate further what it takes to apply this optimization.
@dougxc Thank you a lot!
HotSpot JIT compiler already do this kind of optimization since JDK 8, but using mov + add operations instead of lea:
[info] 0.06% โ โ โโโโ 0x00007fea202ce92d: mov %ecx,%edi
[info] 1.61% โ โ โโโโ 0x00007fea202ce92f: add $0xffffffd0,%edi
[info] 0.90% โ โ โโโโ 0x00007fea202ce932: cmp $0xa,%edi
[info] โ โ โฐโโโ 0x00007fea202ce935: jae 0x00007fea202ce6a9 ;*if_icmpgt
[info] โ โ โโโ ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::parseLong@165 (line 1318)
[info] โ โ โโโ ; - com.github.plokhotnyuk.jsoniter_scala.core.JsonReader::readLong@2 (line 376)
[info] โ โ โโโ ; - com.github.plokhotnyuk.jsoniter_scala.benchmark.JsoniterScalaCodecs$$anon$27::d0@57 (line 63)
@dougxc @tkrodriguez canonicalizeConditionalCascade also uses "join". But as commented, it intentionally makes unsigned and signed return null for "join", so the logic is correct in canonicalizeConditionalCascade.
I couldn't find any other usage of "join".
Yes the other cases are explicitly safe since they treat the null as meaning unknown. Fixed in 2dfa6ab92a02f4c018376190ced992be9dd066e3
Most helpful comment
I've investigated a bit more and suspect it's a problem in the code that converts a signed range test into an unsigned compare but am not yet 100% sure.
In any case, I can reliably reproduce now with: