Graal: Performance issues with GraalVM EE

Created on 2 Mar 2020  路  5Comments  路  Source: oracle/graal

Describe GraalVM and your environment :

  • GraalVM version or commit id if built from source: 20.0
  • CE or EE: EE
  • JDK version: JDK8
  • OS and OS Version: macOS & Linux
  • Architecture: amd64
  • The output of java -Xinternalversion:
    ```
    Java HotSpot(TM) 64-Bit Server VM GraalVM EE 20.0.0 (25.241-b07-jvmci-20.0-b02) for linux-amd64 JRE (8u241), built on Jan 20 2020 20:45:37 by "buildslave" with gcc 7.3.0

**Have you verified this issue still happens when using the latest snapshot?**
No.

**Describe the issue**
Unexpected, more than 3x drop of throughput when running some benchmarks on GraalVM EE.

**Code snippet or code repository that reproduces the issue**

https://github.com/plokhotnyuk/jsoniter-scala


**Steps to reproduce the issue**
1. [Optional] Download and install the `sbt` tool using the following link:
https://www.scala-sbt.org/download.html
2. Clone the repo and change the working directory:

git clone --depth 1 https://github.com/plokhotnyuk/jsoniter-scala
cd jsoniter-scala

3. Run the parsing benchmark on GraalVM EE:

sbt -java-home /usr/lib/jvm/graalvm-ee-java8 'jsoniter-scala-benchmark/jmh:run -jvmArgsAppend "-Dgraal.UseBranchesWithin32ByteBoundary=true" -p size=128 -wi 10 -i 10 ArrayOfEnumADTsReading.jsoniterScala'

4. Run the serialization benchmark on GraalVM EE

sbt -java-home /usr/lib/jvm/graalvm-ee-java8 'jsoniter-scala-benchmark/jmh:run -jvmArgsAppend "-Dgraal.UseBranchesWithin32ByteBoundary=true" -p size=128 -wi 10 -i 10 StringOfEscapedCharsWriting.jsoniterJava

5. Run benchmarks from pp. 3 and 4 on GraalVM CE by altering value for the `-java-home` option.

**Expected behavior**
Expected to have values of throughput scores for EE to be greater than for CE.

**Additional context**
Below are results of both benchmarks for different versions of GraalVM with the perf profiler using the `-prof perfnorm` option.

## ArrayOfEnumADTsReading.jsoniterScala with GraalVM EE

[info] Benchmark (size) Mode Cnt Score Error Units
[info] ArrayOfEnumADTsReading.jsoniterScala 128 thrpt 10 122827.665 卤 17880.191 ops/s
[info] ArrayOfEnumADTsReading.jsoniterScala:CPI 128 thrpt 0.404 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-dcache-load-misses 128 thrpt 965.144 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-dcache-loads 128 thrpt 45268.994 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-dcache-stores 128 thrpt 17143.531 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-icache-load-misses 128 thrpt 597.267 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-load-misses 128 thrpt 10.637 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-loads 128 thrpt 187.110 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-store-misses 128 thrpt 10.974 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-stores 128 thrpt 30.594 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:branch-misses 128 thrpt 233.420 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:branches 128 thrpt 34185.243 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:cycles 128 thrpt 62726.759 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-load-misses 128 thrpt 6.092 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-loads 128 thrpt 45250.767 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-store-misses 128 thrpt 0.734 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-stores 128 thrpt 17196.921 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:iTLB-load-misses 128 thrpt 8.717 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:iTLB-loads 128 thrpt 26.634 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:instructions 128 thrpt 155360.063 #/op


## ArrayOfEnumADTsReading.jsoniterScala with GraalVM CE

[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark (size) Mode Cnt Score Error Units
[info] ArrayOfEnumADTsReading.jsoniterScala 128 thrpt 10 391932.863 卤 696.596 ops/s
[info] ArrayOfEnumADTsReading.jsoniterScala:CPI 128 thrpt 0.223 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-dcache-load-misses 128 thrpt 19.162 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-dcache-loads 128 thrpt 6327.540 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-dcache-stores 128 thrpt 3016.906 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-icache-load-misses 128 thrpt 0.560 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-load-misses 128 thrpt 0.276 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-loads 128 thrpt 0.629 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-store-misses 128 thrpt 1.891 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-stores 128 thrpt 4.031 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:branch-misses 128 thrpt 8.236 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:branches 128 thrpt 10309.307 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:cycles 128 thrpt 8712.587 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-load-misses 128 thrpt 0.271 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-loads 128 thrpt 6345.154 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-store-misses 128 thrpt 0.005 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-stores 128 thrpt 3036.552 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:iTLB-load-misses 128 thrpt 0.021 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:iTLB-loads 128 thrpt 0.040 #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:instructions 128 thrpt 39150.551 #/op


## StringOfEscapedCharsWriting.jsoniterJava with GraalVM EE

[info] Benchmark (size) Mode Cnt Score Error Units
[info] StringOfEscapedCharsWriting.jsoniterJava 128 thrpt 10 194519.800 卤 2292.614 ops/s
[info] StringOfEscapedCharsWriting.jsoniterJava:CPI 128 thrpt 0.375 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:L1-dcache-load-misses 128 thrpt 367.522 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:L1-dcache-loads 128 thrpt 11625.797 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:L1-dcache-stores 128 thrpt 9626.260 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:L1-icache-load-misses 128 thrpt 488.652 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:LLC-load-misses 128 thrpt 0.382 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:LLC-loads 128 thrpt 4.465 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:LLC-store-misses 128 thrpt 10.801 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:LLC-stores 128 thrpt 19.697 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:branch-misses 128 thrpt 4.329 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:branches 128 thrpt 7911.434 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:cycles 128 thrpt 17585.816 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:dTLB-load-misses 128 thrpt 0.384 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:dTLB-loads 128 thrpt 11615.281 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:dTLB-store-misses 128 thrpt 0.053 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:dTLB-stores 128 thrpt 9634.336 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:iTLB-load-misses 128 thrpt 0.063 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:iTLB-loads 128 thrpt 34.478 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:instructions 128 thrpt 46901.538 #/op


## StringOfEscapedCharsWriting.jsoniterJava with GraalVM CE

[info] Benchmark (size) Mode Cnt Score Error Units
[info] StringOfEscapedCharsWriting.jsoniterJava 128 thrpt 10 1662262.711 卤 6159.823 ops/s
[info] StringOfEscapedCharsWriting.jsoniterJava:CPI 128 thrpt 0.227 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:L1-dcache-load-misses 128 thrpt 12.818 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:L1-dcache-loads 128 thrpt 1269.910 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:L1-dcache-stores 128 thrpt 1694.485 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:L1-icache-load-misses 128 thrpt 0.110 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:LLC-load-misses 128 thrpt 0.062 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:LLC-loads 128 thrpt 0.131 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:LLC-store-misses 128 thrpt 3.746 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:LLC-stores 128 thrpt 7.867 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:branch-misses 128 thrpt 2.048 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:branches 128 thrpt 1472.143 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:cycles 128 thrpt 2051.614 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:dTLB-load-misses 128 thrpt 0.208 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:dTLB-loads 128 thrpt 1266.724 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:dTLB-store-misses 128 thrpt 0.002 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:dTLB-stores 128 thrpt 1691.303 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:iTLB-load-misses 128 thrpt 0.003 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:iTLB-loads 128 thrpt 0.005 #/op
[info] StringOfEscapedCharsWriting.jsoniterJava:instructions 128 thrpt 9055.684 #/op
```

bug compiler

Most helpful comment

Thank you again for this report and the very valuable benchmarks. The fix is making its way through our system and will be included in the 20.1.0 release. For reference, here are performance numbers from my system with the fix applied:

$ sbt -java-home /home/gergo/work/graalvm-ce 'jsoniter-scala-benchmark/jmh:run -jvmArgsAppend "-Dgraal.UseBranchesWithin32ByteBoundary=true" -p size=128 -wi 10 -i 10 -r 3 -f 3 ArrayOfEnumADTsReading.jsoniterScala'
[info] Benchmark                             (size)   Mode  Cnt       Score      Error  Units
[info] ArrayOfEnumADTsReading.jsoniterScala     128  thrpt   30  446250.223 卤 4596.980  ops/s

$ sbt -java-home /home/gergo/work/graalvm-ee 'jsoniter-scala-benchmark/jmh:run -jvmArgsAppend "-Dgraal.UseBranchesWithin32ByteBoundary=true" -p size=128 -wi 10 -i 10 -r 3 -f 3 ArrayOfEnumADTsReading.jsoniterScala'
[info] Benchmark                             (size)   Mode  Cnt       Score      Error  Units
[info] ArrayOfEnumADTsReading.jsoniterScala     128  thrpt   30  580531.893 卤 7324.860  ops/s

EE is now 30% faster than CE

$ sbt -java-home /home/gergo/work/graalvm-ce 'jsoniter-scala-benchmark/jmh:run -jvmArgsAppend "-Dgraal.UseBranchesWithin32ByteBoundary=true" -p size=128 -wi 10 -i 10 -r 3 -f 3 StringOfEscapedCharsWriting.jsoniterJava'
[info] Benchmark                                 (size)   Mode  Cnt        Score       Error  Units
[info] StringOfEscapedCharsWriting.jsoniterJava     128  thrpt   30  1914490.120 卤 29781.498  ops/s

$ sbt -java-home /home/gergo/work/graalvm-ee 'jsoniter-scala-benchmark/jmh:run -jvmArgsAppend "-Dgraal.UseBranchesWithin32ByteBoundary=true" -p size=128 -wi 10 -i 10 -r 3 -f 3 StringOfEscapedCharsWriting.jsoniterJava'
[info] Benchmark                                 (size)   Mode  Cnt        Score       Error  Units
[info] StringOfEscapedCharsWriting.jsoniterJava     128  thrpt   30  2278332.519 卤 21766.991  ops/s

EE is now 19% faster than CE

I will close this issue now, please reopen if you test these benchmarks again on the 20.1.0 release and observe further problems.

All 5 comments

These all sound like array processing heavy benchmarks. @plokhotnyuk would it be possible for you to see if any of the following flags impact the results:

  • -Dgraal.Vectorization=false
  • -Dgraal.OptDuplication=false

-Dgraal.Vectorization=false helps for the 1st case (parsing) to make GraalVM EE results greater than from GraalVM CE:

[info] Benchmark                                                   (size)   Mode  Cnt       Score      Error  Units
[info] ArrayOfEnumADTsReading.jsoniterScala                           128  thrpt   10  459862.241 卤 3209.798  ops/s
[info] ArrayOfEnumADTsReading.jsoniterScala:CPI                       128  thrpt            0.228              #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-dcache-load-misses     128  thrpt           18.001              #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-dcache-loads           128  thrpt         5936.236              #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-dcache-stores          128  thrpt         2778.169              #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:L1-icache-load-misses     128  thrpt            0.555              #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-load-misses           128  thrpt            0.072              #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-loads                 128  thrpt            0.364              #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-store-misses          128  thrpt            1.917              #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:LLC-stores                128  thrpt            4.181              #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:branch-misses             128  thrpt            8.400              #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:branches                  128  thrpt         7588.057              #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:cycles                    128  thrpt         7420.989              #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-load-misses          128  thrpt            0.278              #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-loads                128  thrpt         5942.073              #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-store-misses         128  thrpt            0.003              #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:dTLB-stores               128  thrpt         2777.664              #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:iTLB-load-misses          128  thrpt            0.004              #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:iTLB-loads                128  thrpt            0.015              #/op
[info] ArrayOfEnumADTsReading.jsoniterScala:instructions              128  thrpt        32498.391              #/op

Adding of -Dgraal.OptDuplication=false alone or together with -Dgraal.Vectorization=false doesn't change results noticeable.

Ok, thanks for checking. We will start with the vectorization issue first.

Thank you for the reproducer, @plokhotnyuk. I can reproduce the issue and am working on fixing it.

Thank you again for this report and the very valuable benchmarks. The fix is making its way through our system and will be included in the 20.1.0 release. For reference, here are performance numbers from my system with the fix applied:

$ sbt -java-home /home/gergo/work/graalvm-ce 'jsoniter-scala-benchmark/jmh:run -jvmArgsAppend "-Dgraal.UseBranchesWithin32ByteBoundary=true" -p size=128 -wi 10 -i 10 -r 3 -f 3 ArrayOfEnumADTsReading.jsoniterScala'
[info] Benchmark                             (size)   Mode  Cnt       Score      Error  Units
[info] ArrayOfEnumADTsReading.jsoniterScala     128  thrpt   30  446250.223 卤 4596.980  ops/s

$ sbt -java-home /home/gergo/work/graalvm-ee 'jsoniter-scala-benchmark/jmh:run -jvmArgsAppend "-Dgraal.UseBranchesWithin32ByteBoundary=true" -p size=128 -wi 10 -i 10 -r 3 -f 3 ArrayOfEnumADTsReading.jsoniterScala'
[info] Benchmark                             (size)   Mode  Cnt       Score      Error  Units
[info] ArrayOfEnumADTsReading.jsoniterScala     128  thrpt   30  580531.893 卤 7324.860  ops/s

EE is now 30% faster than CE

$ sbt -java-home /home/gergo/work/graalvm-ce 'jsoniter-scala-benchmark/jmh:run -jvmArgsAppend "-Dgraal.UseBranchesWithin32ByteBoundary=true" -p size=128 -wi 10 -i 10 -r 3 -f 3 StringOfEscapedCharsWriting.jsoniterJava'
[info] Benchmark                                 (size)   Mode  Cnt        Score       Error  Units
[info] StringOfEscapedCharsWriting.jsoniterJava     128  thrpt   30  1914490.120 卤 29781.498  ops/s

$ sbt -java-home /home/gergo/work/graalvm-ee 'jsoniter-scala-benchmark/jmh:run -jvmArgsAppend "-Dgraal.UseBranchesWithin32ByteBoundary=true" -p size=128 -wi 10 -i 10 -r 3 -f 3 StringOfEscapedCharsWriting.jsoniterJava'
[info] Benchmark                                 (size)   Mode  Cnt        Score       Error  Units
[info] StringOfEscapedCharsWriting.jsoniterJava     128  thrpt   30  2278332.519 卤 21766.991  ops/s

EE is now 19% faster than CE

I will close this issue now, please reopen if you test these benchmarks again on the 20.1.0 release and observe further problems.

Was this page helpful?
0 / 5 - 0 ratings