Hi guys,
I did a small main (comparable to just calling a CLI help command, no real logic) and the execution is always slower with graalvm than with oracle. Whereas with the rc14 it was slower but close, the rc15 is really noticeably slower. Here a capture of the execution time to illustrate that statement - using time command on the exact same command line (java -cp "li/*.jar" my.Main):
Oracle 1.8u191
real 0m0,127s
user 0m0,145s
sys 0m0,028s
Graal 14
real 0m0,119s
user 0m0,184s
sys 0m0,049s
Graal 15
real 0m0,477s
user 0m0,271s
sys 0m0,027s
Since Graal rc15 uses libjvm by default I expected it to be the opposite, did I miss anything or is this something to address?
Romain
@rmannibucau please try with -XX:-UseJVMCINativeLibrary option for RC15
Hi @plokhotnyuk,
thanks for the fast answer, here is what I get - does not look as fast as expected to me:
Oracle 1.8u191
real 0m0,175s
user 0m0,176s
sys 0m0,042s
Graal 14
real 0m0,228s
user 0m0,391s
sys 0m0,025s
Graal 15
real 0m0,262s
user 0m0,410s
sys 0m0,048s
Oracle 1.8u191
real 0m0,126s
user 0m0,161s
sys 0m0,019s
Graal 14
real 0m0,234s
user 0m0,377s
sys 0m0,035s
Graal 15
real 0m0,130s
user 0m0,241s
sys 0m0,020s
Oracle 1.8u191
real 0m0,137s
user 0m0,168s
sys 0m0,024s
Graal 14
real 0m0,217s
user 0m0,332s
sys 0m0,048s
Graal 15
real 0m0,218s
user 0m0,364s
sys 0m0,017s
Oracle 1.8u191
real 0m0,127s
user 0m0,149s
sys 0m0,029s
Graal 14
real 0m0,256s
user 0m0,403s
sys 0m0,038s
Graal 15
real 0m0,146s
user 0m0,248s
sys 0m0,029s
Oracle 1.8u191
real 0m0,151s
user 0m0,135s
sys 0m0,048s
Graal 14
real 0m0,249s
user 0m0,394s
sys 0m0,034s
Graal 15
real 0m0,228s
user 0m0,354s
sys 0m0,042s
Oracle 1.8u191
real 0m0,130s
user 0m0,129s
sys 0m0,056s
Graal 14
real 0m0,220s
user 0m0,352s
sys 0m0,032s
Graal 15
real 0m0,208s
user 0m0,342s
sys 0m0,026s
Oracle 1.8u191
real 0m0,121s
user 0m0,150s
sys 0m0,021s
Graal 14
real 0m0,226s
user 0m0,348s
sys 0m0,048s
Graal 15
real 0m0,125s
user 0m0,219s
sys 0m0,026s
Oracle 1.8u191
real 0m0,130s
user 0m0,165s
sys 0m0,017s
Graal 14
real 0m0,238s
user 0m0,376s
sys 0m0,029s
Graal 15
real 0m0,277s
user 0m0,423s
sys 0m0,034s
Oracle 1.8u191
real 0m0,154s
user 0m0,185s
sys 0m0,025s
Graal 14
real 0m0,313s
user 0m0,474s
sys 0m0,047s
Graal 15
real 0m0,142s
user 0m0,234s
sys 0m0,023s
Oracle 1.8u191
real 0m0,139s
user 0m0,167s
sys 0m0,029s
Graal 14
real 0m0,280s
user 0m0,425s
sys 0m0,033s
Graal 15
real 0m0,214s
user 0m0,335s
sys 0m0,043s
Here is the script i'm using:
#! /bin/bash
for i in {1..10}; do
echo "Oracle 1.8u191"
time ~/softwares/jdk1.8.0_191/bin/java -cp "dependency/*" com.github.rmannibucau.docker.compose.cli.Launcher &> /dev/null
echo
echo
echo "Graal 14"
time ~/softwares/graalvm-ce-1.0.0-rc14/bin/java -cp "dependency/*" com.github.rmannibucau.docker.compose.cli.Launcher &> /dev/null
echo
echo
echo "Graal 15"
time ~/softwares/graalvm-ce-1.0.0-rc15/bin/java -XX:-UseJVMCINativeLibrary -cp "dependency/*" com.github.rmannibucau.docker.compose.cli.Launcher &> /dev/null
echo
echo
done
@rmannibucau have you tried OpenJDK and GraavVM EE for a full picture?
@plokhotnyuk here the same run with zulu - as openjdk representant (I had it locally):
Oracle 1.8u191
real 0m0,130s
user 0m0,156s
sys 0m0,022s
Zulu 1.8u202
real 0m0,126s
user 0m0,169s
sys 0m0,021s
Graal 14
real 0m0,113s
user 0m0,206s
sys 0m0,019s
Graal 15
real 0m0,251s
user 0m0,405s
sys 0m0,037s
Oracle 1.8u191
real 0m0,130s
user 0m0,163s
sys 0m0,026s
Zulu 1.8u202
real 0m0,135s
user 0m0,191s
sys 0m0,019s
Graal 14
real 0m0,127s
user 0m0,223s
sys 0m0,024s
Graal 15
real 0m0,114s
user 0m0,197s
sys 0m0,028s
I didn't try with graalvm EE - don't think it is publicly and freely available, is it?
@rmannibucau AFAIK it is not free for usage in _production_ only
@plokhotnyuk here are the results - very close to CE:
Oracle 1.8u191
real 0m0,202s
user 0m0,218s
sys 0m0,026s
Zulu 1.8u202
real 0m0,155s
user 0m0,188s
sys 0m0,046s
Graal 14
real 0m0,140s
user 0m0,261s
sys 0m0,016s
Graal CE 15
real 0m0,292s
user 0m0,458s
sys 0m0,042s
Graal EE 15
real 0m0,165s
user 0m0,271s
sys 0m0,036s
Oracle 1.8u191
real 0m0,152s
user 0m0,185s
sys 0m0,023s
Zulu 1.8u202
real 0m0,150s
user 0m0,207s
sys 0m0,017s
Graal 14
real 0m0,178s
user 0m0,321s
sys 0m0,033s
Graal CE 15
real 0m0,159s
user 0m0,285s
sys 0m0,028s
Graal EE 15
real 0m0,155s
user 0m0,259s
sys 0m0,035s
Oracle 1.8u191
real 0m0,162s
user 0m0,200s
sys 0m0,027s
Zulu 1.8u202
real 0m0,178s
user 0m0,212s
sys 0m0,028s
Graal 14
real 0m0,274s
user 0m0,334s
sys 0m0,034s
Graal CE 15
real 0m0,158s
user 0m0,288s
sys 0m0,012s
Graal EE 15
real 0m0,216s
user 0m0,320s
sys 0m0,041s
Oracle 1.8u191
real 0m0,140s
user 0m0,153s
sys 0m0,045s
Zulu 1.8u202
real 0m0,167s
user 0m0,204s
sys 0m0,030s
Graal 14
real 0m0,141s
user 0m0,227s
sys 0m0,036s
Graal CE 15
real 0m0,270s
user 0m0,423s
sys 0m0,036s
Graal EE 15
real 0m0,337s
user 0m0,542s
sys 0m0,037s
Oracle 1.8u191
real 0m0,133s
user 0m0,179s
sys 0m0,008s
Zulu 1.8u202
real 0m0,155s
user 0m0,205s
sys 0m0,024s
Graal 14
real 0m0,343s
user 0m0,537s
sys 0m0,055s
Graal CE 15
real 0m0,276s
user 0m0,430s
sys 0m0,035s
Graal EE 15
real 0m0,172s
user 0m0,261s
sys 0m0,043s
Oracle 1.8u191
real 0m0,148s
user 0m0,196s
sys 0m0,012s
Zulu 1.8u202
real 0m0,142s
user 0m0,178s
sys 0m0,032s
Graal 14
real 0m0,134s
user 0m0,252s
sys 0m0,016s
Graal CE 15
real 0m0,253s
user 0m0,424s
sys 0m0,026s
Graal EE 15
real 0m0,307s
user 0m0,489s
sys 0m0,037s
Oracle 1.8u191
real 0m0,150s
user 0m0,192s
sys 0m0,018s
Zulu 1.8u202
real 0m0,164s
user 0m0,210s
sys 0m0,025s
Graal 14
real 0m0,347s
user 0m0,526s
sys 0m0,056s
Graal CE 15
real 0m0,153s
user 0m0,262s
sys 0m0,023s
Graal EE 15
real 0m0,329s
user 0m0,535s
sys 0m0,025s
Oracle 1.8u191
real 0m0,139s
user 0m0,173s
sys 0m0,025s
Zulu 1.8u202
real 0m0,148s
user 0m0,194s
sys 0m0,022s
Graal 14
real 0m0,264s
user 0m0,429s
sys 0m0,028s
Graal CE 15
real 0m0,131s
user 0m0,221s
sys 0m0,038s
Graal EE 15
real 0m0,334s
user 0m0,507s
sys 0m0,063s
Oracle 1.8u191
real 0m0,134s
user 0m0,170s
sys 0m0,023s
Zulu 1.8u202
real 0m0,177s
user 0m0,223s
sys 0m0,027s
Graal 14
real 0m0,427s
user 0m0,594s
sys 0m0,056s
Graal CE 15
real 0m0,323s
user 0m0,478s
sys 0m0,041s
Graal EE 15
real 0m0,153s
user 0m0,269s
sys 0m0,017s
Oracle 1.8u191
real 0m0,171s
user 0m0,224s
sys 0m0,016s
Zulu 1.8u202
real 0m0,173s
user 0m0,250s
sys 0m0,005s
Graal 14
real 0m0,139s
user 0m0,245s
sys 0m0,028s
Graal CE 15
real 0m0,138s
user 0m0,246s
sys 0m0,031s
Graal EE 15
real 0m0,320s
user 0m0,511s
sys 0m0,060s
Used command:
time ~/softwares/graalvm-ee-1.0.0-rc15/bin/java -XX:-UseJVMCINativeLibrary -cp "dependency/*" com.github.rmannibucau.docker.compose.cli.Launcher &> /dev/null
@rmannibucau do you have a possibility to create a native image for your tool?
That can be a game changer, especially with the profile guided optimization...
@rmannibucau do you have a possibility to create a native image for your tool?
That can be a game changer, especially with the profile guided optimization...
Will try this and respond asap!
@plokhotnyuk here are the results - very close to CE:
Used command:
time ~/softwares/graalvm-ee-1.0.0-rc15/bin/java -XX:-UseJVMCINativeLibrary -cp "dependency/*" com.github.rmannibucau.docker.compose.cli.Launcher &> /dev/null
Great work with comparing various JDKs with GraalVM - do you have an automated way to do it? I mean scripts and the likes? Would love to try them out for my own apps.
@neomatrix369 I used https://github.com/rmannibucau/docosh/blob/master/graalcomparison.sh (read the small warning at the beginning, tried to not put .jar in the git repo)
@plokhotnyuk i was generating a native image, it works but it is super slow and loose portability of java which is still a nice feature. My goal with graal rc15 was to see if libjvm was a game changer which could enable to not require to native-image applications.
What is the OS/machine you are running on? Generally, the new libgraal setup should be significantly faster in startup in the first few seconds. But it could be that you are dominated by loading the binary. Can you share the program so we can test on different machines?
Generally, for CLI apps, the native image configuration is indeed the recommended one. It gives you an order of magnitude better startup/footprint.
My OS is an ubuntu:
Linux RMANNIBUCAU 4.18.0-16-generic #17-Ubuntu SMP Fri Feb 8 00:06:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
The app is available in master of previous link.
@plokhotnyuk do you think pgo was not used in libjvm generation and could optimize it more? Goal is really to reduce the need to go native making the jvm itself pre-optimized. Can also imply to hack the default classloaders to use pre-indexed classes or so maybe?
@rmannibucau pgo wasn't used for this RC
BTW, have you tried to profile your app with async-profiler? It can build flame graphs with Java/Native/Hotspot/GC stacktraces like here.
We have identified the issue. It is related to "shutdown hooks" delaying an exit of the VM process when a compilation is currently running. We are working on a fix that should be ready for the next release candidate. Thanks for the report!
@rmannibucau yet another option to speed up the native image is using CPU features option, like:
-H:CPUFeatures=SSE4_1,POPCNT,AVX,AVX2,AES,CLMUL,MMX,SSE3,SSE4_2,TSC,CMOV
AFAIK There are different lists for different platforms. For AMD64 they are listed here
@thomaswue BTW, which options were used for building libgraal for RC15?
Fix shipped in RC16.
Hi @thomaswue ,
rc16 is significatively better but still slower than oraclejdk or openjdk:
Oracle 1.8u191
real 0m0,141s
user 0m0,154s
sys 0m0,038s
Zulu 1.8u202
real 0m0,116s
user 0m0,163s
sys 0m0,013s
Graal 14
real 0m0,198s
user 0m0,314s
sys 0m0,034s
Graal CE 15
real 0m0,200s
user 0m0,335s
sys 0m0,027s
Graal EE 15
real 0m0,117s
user 0m0,185s
sys 0m0,041s
Graal CE 16
real 0m0,130s
user 0m0,248s
sys 0m0,021s
The measurements you show have a high level of noise - i.e., running them multiple times will give you very different results. Can you do these measurements 100 times for each config and create some average/medium metric (potentially removing outliers)?
@thomaswue I did and pasted an iteration representing well the executions. real and sys times are always close - excluding singular points - but the user time is also always ~+80ms.
It seems like your metrics indicate that we are at least same level for "real" time:
Oracle 1.8u191 real 0m0,141s
Graal CE 16 real 0m0,130s
The "user" time could depend on how many compiler threads are started at which point in time. It is the less important metric. I will check if I can confirm higher user times on my machine.
@thomaswue yes, rc16 is definitively better than rc15. It is still a bit disappointing to not benefit from libjvm in terms of startup times but at least not being worth would be ok.
For substantially improved start-up time for your user application, we recommend creating a native image. Libgraal is only creating such AOT-compiled image for the compiler itself. This can improve start-up if your application benefits from faster compilation times; and it substantially improves start-up versus vs our previous versions where the compiler had to dynamically warm up in parallel to the application.
@thomaswue as mentionned earlier going native is not an option for the app and has a lot of pitfalls currently (not being portable anymore and requiring to cross compile vs just relying on prebuilt JVM, not supporting classpath extensibility, not supporting several libraries which rely on classpath etc...). So my goal is really to get the best of both world having an optimized boot for the JVM classes and a "normal" mode for user code.
Most helpful comment
We have identified the issue. It is related to "shutdown hooks" delaying an exit of the VM process when a compilation is currently running. We are working on a fix that should be ready for the next release candidate. Thanks for the report!