Describe GraalVM and your environment :
java -Xinternalversion:
**Have you verified this issue still happens when using the latest snapshot?**
No.
**Describe the issue**
Unexpected, more than 3x drop of throughput when running some benchmarks on GraalVM EE.
**Code snippet or code repository that reproduces the issue**
https://github.com/plokhotnyuk/jsoniter-scala
**Steps to reproduce the issue**
1. [Optional] Download and install the `sbt` tool using the following link:
https://www.scala-sbt.org/download.html
2. Clone the repo and change the working directory:
git clone --depth 1 https://github.com/plokhotnyuk/jsoniter-scala
cd jsoniter-scala
3. Run the parsing benchmark on GraalVM EE:
sbt -java-home /usr/lib/jvm/graalvm-ee-java8 'jsoniter-scala-benchmark/jmh:run -jvmArgsAppend "-Dgraal.UseBranchesWithin32ByteBoundary=true" -p size=128 -wi 10 -i 10 ArrayOfEnumADTsReading.jsoniterScala'
4. Run the serialization benchmark on GraalVM EE
sbt -java-home /usr/lib/jvm/graalvm-ee-java8 'jsoniter-scala-benchmark/jmh:run -jvmArgsAppend "-Dgraal.UseBranchesWithin32ByteBoundary=true" -p size=128 -wi 10 -i 10 StringOfEscapedCharsWriting.jsoniterJava
5. Run benchmarks from pp. 3 and 4 on GraalVM CE by altering value for the `-java-home` option.
**Expected behavior**
Expected to have values of throughput scores for EE to be greater than for CE.
**Additional context**
Below are results of both benchmarks for different versions of GraalVM with the perf profiler using the `-prof perfnorm` option.
## ArrayOfEnumADTsReading.jsoniterScala with GraalVM EE
[info] Benchmark (size) Mode Cnt Score Error Units
[info] ArrayOfEnumADTsReading.jsoniterScala 128 thrpt 10 122827.665 卤 17880.191 ops/s
[info] ArrayOfEnumADTsReading.jsoniterScala:CPI 128 thrpt 0.404 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-dcache-load-misses 128 thrpt 965.144 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-dcache-loads 128 thrpt 45268.994 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-dcache-stores 128 thrpt 17143.531 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-icache-load-misses 128 thrpt 597.267 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-load-misses 128 thrpt 10.637 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-loads 128 thrpt 187.110 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-store-misses 128 thrpt 10.974 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-stores 128 thrpt 30.594 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:branch-misses 128 thrpt 233.420 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:branches 128 thrpt 34185.243 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:cycles 128 thrpt 62726.759 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-load-misses 128 thrpt 6.092 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-loads 128 thrpt 45250.767 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-store-misses 128 thrpt 0.734 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-stores 128 thrpt 17196.921 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:iTLB-load-misses 128 thrpt 8.717 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:iTLB-loads 128 thrpt 26.634 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:instructions 128 thrpt 155360.063 #/op
## ArrayOfEnumADTsReading.jsoniterScala with GraalVM CE
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark (size) Mode Cnt Score Error Units
[info] ArrayOfEnumADTsReading.jsoniterScala 128 thrpt 10 391932.863 卤 696.596 ops/s
[info] ArrayOfEnumADTsReading.jsoniterScala:CPI 128 thrpt 0.223 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-dcache-load-misses 128 thrpt 19.162 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-dcache-loads 128 thrpt 6327.540 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-dcache-stores 128 thrpt 3016.906 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-icache-load-misses 128 thrpt 0.560 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-load-misses 128 thrpt 0.276 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-loads 128 thrpt 0.629 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-store-misses 128 thrpt 1.891 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-stores 128 thrpt 4.031 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:branch-misses 128 thrpt 8.236 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:branches 128 thrpt 10309.307 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:cycles 128 thrpt 8712.587 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-load-misses 128 thrpt 0.271 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-loads 128 thrpt 6345.154 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-store-misses 128 thrpt 0.005 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-stores 128 thrpt 3036.552 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:iTLB-load-misses 128 thrpt 0.021 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:iTLB-loads 128 thrpt 0.040 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:instructions 128 thrpt 39150.551 #/op
## StringOfEscapedCharsWriting.jsoniterJava with GraalVM EE
[info] Benchmark (size) Mode Cnt Score Error Units
[info] StringOfEscapedCharsWriting.jsoniterJava 128 thrpt 10 194519.800 卤 2292.614 ops/s
[info] StringOfEscapedCharsWriting.jsoniterJava:CPI 128 thrpt 0.375 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:L1-dcache-load-misses 128 thrpt 367.522 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:L1-dcache-loads 128 thrpt 11625.797 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:L1-dcache-stores 128 thrpt 9626.260 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:L1-icache-load-misses 128 thrpt 488.652 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:LLC-load-misses 128 thrpt 0.382 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:LLC-loads 128 thrpt 4.465 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:LLC-store-misses 128 thrpt 10.801 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:LLC-stores 128 thrpt 19.697 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:branch-misses 128 thrpt 4.329 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:branches 128 thrpt 7911.434 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:cycles 128 thrpt 17585.816 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:dTLB-load-misses 128 thrpt 0.384 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:dTLB-loads 128 thrpt 11615.281 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:dTLB-store-misses 128 thrpt 0.053 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:dTLB-stores 128 thrpt 9634.336 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:iTLB-load-misses 128 thrpt 0.063 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:iTLB-loads 128 thrpt 34.478 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:instructions 128 thrpt 46901.538 #/op
## StringOfEscapedCharsWriting.jsoniterJava with GraalVM CE
[info] Benchmark (size) Mode Cnt Score Error Units
[info] StringOfEscapedCharsWriting.jsoniterJava 128 thrpt 10 1662262.711 卤 6159.823 ops/s
[info] StringOfEscapedCharsWriting.jsoniterJava:CPI 128 thrpt 0.227 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:L1-dcache-load-misses 128 thrpt 12.818 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:L1-dcache-loads 128 thrpt 1269.910 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:L1-dcache-stores 128 thrpt 1694.485 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:L1-icache-load-misses 128 thrpt 0.110 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:LLC-load-misses 128 thrpt 0.062 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:LLC-loads 128 thrpt 0.131 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:LLC-store-misses 128 thrpt 3.746 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:LLC-stores 128 thrpt 7.867 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:branch-misses 128 thrpt 2.048 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:branches 128 thrpt 1472.143 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:cycles 128 thrpt 2051.614 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:dTLB-load-misses 128 thrpt 0.208 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:dTLB-loads 128 thrpt 1266.724 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:dTLB-store-misses 128 thrpt 0.002 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:dTLB-stores 128 thrpt 1691.303 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:iTLB-load-misses 128 thrpt 0.003 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:iTLB-loads 128 thrpt 0.005 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:instructions 128 thrpt 9055.684 #/op
```
These all sound like array processing heavy benchmarks. @plokhotnyuk would it be possible for you to see if any of the following flags impact the results:
-Dgraal.Vectorization=false-Dgraal.OptDuplication=false-Dgraal.Vectorization=false helps for the 1st case (parsing) to make GraalVM EE results greater than from GraalVM CE:
[info] Benchmark (size) Mode Cnt Score Error Units
[info] ArrayOfEnumADTsReading.jsoniterScala 128 thrpt 10 459862.241 卤 3209.798 ops/s
[info] ArrayOfEnumADTsReading.jsoniterScala:CPI 128 thrpt 0.228 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-dcache-load-misses 128 thrpt 18.001 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-dcache-loads 128 thrpt 5936.236 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-dcache-stores 128 thrpt 2778.169 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-icache-load-misses 128 thrpt 0.555 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-load-misses 128 thrpt 0.072 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-loads 128 thrpt 0.364 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-store-misses 128 thrpt 1.917 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-stores 128 thrpt 4.181 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:branch-misses 128 thrpt 8.400 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:branches 128 thrpt 7588.057 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:cycles 128 thrpt 7420.989 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-load-misses 128 thrpt 0.278 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-loads 128 thrpt 5942.073 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-store-misses 128 thrpt 0.003 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-stores 128 thrpt 2777.664 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:iTLB-load-misses 128 thrpt 0.004 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:iTLB-loads 128 thrpt 0.015 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:instructions 128 thrpt 32498.391 #/op
Adding of -Dgraal.OptDuplication=false alone or together with -Dgraal.Vectorization=false doesn't change results noticeable.
Ok, thanks for checking. We will start with the vectorization issue first.
Thank you for the reproducer, @plokhotnyuk. I can reproduce the issue and am working on fixing it.
Thank you again for this report and the very valuable benchmarks. The fix is making its way through our system and will be included in the 20.1.0 release. For reference, here are performance numbers from my system with the fix applied:
$ sbt -java-home /home/gergo/work/graalvm-ce 'jsoniter-scala-benchmark/jmh:run -jvmArgsAppend "-Dgraal.UseBranchesWithin32ByteBoundary=true" -p size=128 -wi 10 -i 10 -r 3 -f 3 ArrayOfEnumADTsReading.jsoniterScala'
[info] Benchmark (size) Mode Cnt Score Error Units
[info] ArrayOfEnumADTsReading.jsoniterScala 128 thrpt 30 446250.223 卤 4596.980 ops/s
$ sbt -java-home /home/gergo/work/graalvm-ee 'jsoniter-scala-benchmark/jmh:run -jvmArgsAppend "-Dgraal.UseBranchesWithin32ByteBoundary=true" -p size=128 -wi 10 -i 10 -r 3 -f 3 ArrayOfEnumADTsReading.jsoniterScala'
[info] Benchmark (size) Mode Cnt Score Error Units
[info] ArrayOfEnumADTsReading.jsoniterScala 128 thrpt 30 580531.893 卤 7324.860 ops/s
EE is now 30% faster than CE
$ sbt -java-home /home/gergo/work/graalvm-ce 'jsoniter-scala-benchmark/jmh:run -jvmArgsAppend "-Dgraal.UseBranchesWithin32ByteBoundary=true" -p size=128 -wi 10 -i 10 -r 3 -f 3 StringOfEscapedCharsWriting.jsoniterJava'
[info] Benchmark (size) Mode Cnt Score Error Units
[info] StringOfEscapedCharsWriting.jsoniterJava 128 thrpt 30 1914490.120 卤 29781.498 ops/s
$ sbt -java-home /home/gergo/work/graalvm-ee 'jsoniter-scala-benchmark/jmh:run -jvmArgsAppend "-Dgraal.UseBranchesWithin32ByteBoundary=true" -p size=128 -wi 10 -i 10 -r 3 -f 3 StringOfEscapedCharsWriting.jsoniterJava'
[info] Benchmark (size) Mode Cnt Score Error Units
[info] StringOfEscapedCharsWriting.jsoniterJava 128 thrpt 30 2278332.519 卤 21766.991 ops/s
EE is now 19% faster than CE
I will close this issue now, please reopen if you test these benchmarks again on the 20.1.0 release and observe further problems.
Most helpful comment
Thank you again for this report and the very valuable benchmarks. The fix is making its way through our system and will be included in the 20.1.0 release. For reference, here are performance numbers from my system with the fix applied:
EE is now 30% faster than CE
EE is now 19% faster than CE
I will close this issue now, please reopen if you test these benchmarks again on the 20.1.0 release and observe further problems.