Describe the issue
Windows console output shows garbage when using native-image but looks well when running using the JVM.
Steps to reproduce the issue
public class Main {
public static void main(String[] args) {
System.out.println("UTF-8 text with some international chars:");
System.out.println("äëïöü áéíóú àèiòù Ññ Çç €");
}
}
C:\tmp>java -version
openjdk version "11.0.7" 2020-04-14
OpenJDK Runtime Environment GraalVM CE 20.1.0 (build 11.0.7+10-jvmci-20.1-b02)
OpenJDK 64-Bit Server VM GraalVM CE 20.1.0 (build 11.0.7+10-jvmci-20.1-b02, mixed mode, sharing)
C:\tmp>javac -encoding utf8 Main.java
C:\tmp>java Main
UTF-8 text with some international chars:
äëïöü áéíóú àèiòù Ññ Çç ?
C:\tmp>native-image Main
[main:7016] classlist: 1,402.34 ms, 0.96 GB
[main:7016] (cap): 2,629.87 ms, 0.96 GB
[main:7016] setup: 4,151.39 ms, 0.96 GB
[main:7016] (clinit): 131.62 ms, 1.21 GB
[main:7016] (typeflow): 4,459.82 ms, 1.21 GB
[main:7016] (objects): 3,230.77 ms, 1.21 GB
[main:7016] (features): 237.91 ms, 1.21 GB
[main:7016] analysis: 8,221.80 ms, 1.21 GB
[main:7016] universe: 261.23 ms, 1.21 GB
[main:7016] (parse): 940.52 ms, 1.21 GB
[main:7016] (inline): 902.42 ms, 1.66 GB
[main:7016] (compile): 5,910.52 ms, 2.27 GB
[main:7016] compile: 8,136.16 ms, 2.27 GB
[main:7016] image: 821.51 ms, 2.27 GB
[main:7016] write: 319.91 ms, 2.27 GB
[main:7016] [total]: 23,531.82 ms, 2.27 GB
C:\tmp>main.exe
UTF-8 text with some international chars:
õÙ´÷³ ßÚݾ· ÓÞi‗¨ б Ãþ Ç
Describe GraalVM and your environment:
chcp command)More details
-H:+AddAllCharsets option.This appears to be caused by building the native image in a non-UTF8 JVM, setting the encoding explicitly on the buildtime JVM solves the problem.
-J-Dfile.encoding=UTF-8
From memory the JVM default charset for Windows is _not_ UTF8.
@jaikiran @galderz @dmlloyd should we make it the default? Or make it more visible/easier to configure?
@jaikiran @galderz @dmlloyd should we make it the default?
There has been a conscious effort in the JDK team itself to move towards UTF-8 as the default charset to avoid issues like these. They are being tracked as part of the (currently draft status) JEP https://openjdk.java.net/jeps/8187041 (and corresponding JBS issue https://bugs.openjdk.java.net/browse/JDK-8187041).
So yes, (in Quarkus) I think we should default this to a consistent UTF-8 value. I don't think (in Quarkus) we should provide an "easy" config to override this. If at all for whatever reason someone wants to override this, they can do so even now using the quarkus.native.additional.build.args property. The reason I say we should avoid an additional specific config is because as per the linked JEP, the file.encoding system property is unsupported and could be removed whenever this JEP gets implemented. So, IMO, no point trying to expose that as a config when quarkus.native.additional.build.args can achieve the same.
So yes, (in Quarkus) I think we should default this to a consistent UTF-8 value. I don't think (in Quarkus) we should provide an "easy" config to override this. If at all for whatever reason someone wants to override this, they can do so even now using the quarkus.native.additional.build.args property.
Thinking a bit more, I guess we can't avoid a new config, _if_ we want users to be able to override this, since without a config they can't "remove" the -J-Dfile.encoding=UTF-8 we would be adding by default during the native image build. Perhaps, for now set it by default to UTF-8 and then wait and watch on adding the new config till someone says they want to override this value?
I agree; set the default to UTF-8 and wait for a complaint.
The root cause should be the same as in #2398 and has been fixed.