Graal: native-image windows shows garbage on console output UTF-8 text

Created on 22 May 2020  ·  6Comments  ·  Source: oracle/graal

Describe the issue
Windows console output shows garbage when using native-image but looks well when running using the JVM.

Steps to reproduce the issue

  1. Create a simple java code containing some international chars and the save it using UTF-8 encoding
public class Main {
    public static void main(String[] args) {
        System.out.println("UTF-8 text with some international chars:");
        System.out.println("äëïöü áéíóú àèiòù Ññ Çç €");
    }
}
  1. Compile and run the code using GraalVM. Output text looks well.
C:\tmp>java -version
openjdk version "11.0.7" 2020-04-14
OpenJDK Runtime Environment GraalVM CE 20.1.0 (build 11.0.7+10-jvmci-20.1-b02)
OpenJDK 64-Bit Server VM GraalVM CE 20.1.0 (build 11.0.7+10-jvmci-20.1-b02, mixed mode, sharing)

C:\tmp>javac -encoding utf8 Main.java

C:\tmp>java Main
UTF-8 text with some international chars:
äëïöü áéíóú àèiòù Ññ Çç ?
  1. Build native-image
C:\tmp>native-image Main
[main:7016]    classlist:   1,402.34 ms,  0.96 GB
[main:7016]        (cap):   2,629.87 ms,  0.96 GB
[main:7016]        setup:   4,151.39 ms,  0.96 GB
[main:7016]     (clinit):     131.62 ms,  1.21 GB
[main:7016]   (typeflow):   4,459.82 ms,  1.21 GB
[main:7016]    (objects):   3,230.77 ms,  1.21 GB
[main:7016]   (features):     237.91 ms,  1.21 GB
[main:7016]     analysis:   8,221.80 ms,  1.21 GB
[main:7016]     universe:     261.23 ms,  1.21 GB
[main:7016]      (parse):     940.52 ms,  1.21 GB
[main:7016]     (inline):     902.42 ms,  1.66 GB
[main:7016]    (compile):   5,910.52 ms,  2.27 GB
[main:7016]      compile:   8,136.16 ms,  2.27 GB
[main:7016]        image:     821.51 ms,  2.27 GB
[main:7016]        write:     319.91 ms,  2.27 GB
[main:7016]      [total]:  23,531.82 ms,  2.27 GB
  1. When running the "main.exe", garbage is shown
C:\tmp>main.exe
UTF-8 text with some international chars:
õÙ´÷³ ßÚݾ· ÓÞi‗¨ б Ãþ Ç

Describe GraalVM and your environment:

  • GraalVM CE 20.1.0 (build 11.0.7+10-jvmci-20.1-b02)
  • JDK major version: 11
  • OS: Windows 10
  • Architecture: AMD64
  • Windows console codepage: 850 (chcp command)
  • Windows SDK: Microsoft (R) C/C++ Optimizing Compiler Version 19.25.28614 for x64

More details

  • Same behaviour with GraalVM CE 20.0.0. or building native-image using the -H:+AddAllCharsets option.
  • Note the missing "€" in the JVM output. I think this is because MS-DOS CP 850 does not have the € (Euro symbol) and the JVM translates it to the "?" char.
bug native-image

All 6 comments

This appears to be caused by building the native image in a non-UTF8 JVM, setting the encoding explicitly on the buildtime JVM solves the problem.

-J-Dfile.encoding=UTF-8

From memory the JVM default charset for Windows is _not_ UTF8.

@jaikiran @galderz @dmlloyd should we make it the default? Or make it more visible/easier to configure?

@jaikiran @galderz @dmlloyd should we make it the default?

There has been a conscious effort in the JDK team itself to move towards UTF-8 as the default charset to avoid issues like these. They are being tracked as part of the (currently draft status) JEP https://openjdk.java.net/jeps/8187041 (and corresponding JBS issue https://bugs.openjdk.java.net/browse/JDK-8187041).

So yes, (in Quarkus) I think we should default this to a consistent UTF-8 value. I don't think (in Quarkus) we should provide an "easy" config to override this. If at all for whatever reason someone wants to override this, they can do so even now using the quarkus.native.additional.build.args property. The reason I say we should avoid an additional specific config is because as per the linked JEP, the file.encoding system property is unsupported and could be removed whenever this JEP gets implemented. So, IMO, no point trying to expose that as a config when quarkus.native.additional.build.args can achieve the same.

So yes, (in Quarkus) I think we should default this to a consistent UTF-8 value. I don't think (in Quarkus) we should provide an "easy" config to override this. If at all for whatever reason someone wants to override this, they can do so even now using the quarkus.native.additional.build.args property.

Thinking a bit more, I guess we can't avoid a new config, _if_ we want users to be able to override this, since without a config they can't "remove" the -J-Dfile.encoding=UTF-8 we would be adding by default during the native image build. Perhaps, for now set it by default to UTF-8 and then wait and watch on adding the new config till someone says they want to override this value?

I agree; set the default to UTF-8 and wait for a complaint.

The root cause should be the same as in #2398 and has been fixed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

helloguo picture helloguo  ·  3Comments

ilopmar picture ilopmar  ·  3Comments

schneidersteve picture schneidersteve  ·  3Comments

helloguo picture helloguo  ·  3Comments

guaporocco picture guaporocco  ·  3Comments