Godot: Compilation error in file test_string.cpp with japanese locale

Created on 11 May 2019  路  8Comments  路  Source: godotengine/godot

Godot version:
Reproduced on 3.1, 3.1.1, 3.1-stable and 3.1.1-stable

OS/device including version:
Windows 10, japanese locale, Python 3.7.3, Visual Studio Express 2019

Issue description:
Compilation error in file test_string.cpp

E:\godot>scons -j4 p=windows vsproj=yes
scons: Reading SConscript files ...
Configuring for Windows: target=debug, bits=default
Found MSVC compiler: amd64
Compiled program architecture will be a 64 bit executable (forcing bits=64).
YASM is necessary for WebM SIMD optimizations.
WebM SIMD optimizations are disabled. Check if your CPU architecture, CPU bits or platform are supported!
Checking for C header file mntent.h... (cached) no
scons: done reading SConscript files.
scons: Building targets ...
[Initial build] Compiling ==> main\tests\test_string.cpp
[Initial build] test_string.cpp
[Initial build] main\tests\test_string.cpp(951): warning C4819: The file contains a character that cannot be represented in the current code page (932). Save the file in Unicode format to prevent data loss
[Initial build] main\tests\test_string.cpp(1060): error C2001: newline in constant
[Initial build] string.cpp(1062): error C2146: syntax error: missing ';' before identifier 'String'
[Initial build] Compiling ==> thirdparty\libvpx\vp9\common\vp9_scale.c
[Initial build] Compiling ==> thirdparty\libvpx\vp9\common\vp9_scan.c
[Initial build] scons: * [main\tests\test_string.windows.tools.64.obj] Error 2
Compiling ==> thirdparty\libvpx\vp9\common\vp9_seg_common.c
vp9_scale.c
vp9_scan.c
vp9_seg_common.c
scons: building terminated because of errors.

Steps to reproduce:
Follow instructions from official docs on compilation with japanese locale

Minimal reproduction project:
Follow instructions from official docs on compilation with japanese locale

bug windows buildsystem

All 8 comments

I've been able to compile succesfully by adding a space at the end of both test strings (lines 1059 and 1060 in tag 3.1.1-stable):
String upper = L"袗袘袙袚袛袝衼袞袟袠袡袣袥袦袧袨袩袪小孝校肖啸笑效楔些歇蝎鞋协挟携 ";
String lower = L"邪斜胁谐写械褢卸蟹懈泄泻谢屑薪芯锌褉褋褌褍褎褏褑褔褕褖褗褘褜褝褞褟 ";

I added this test as part of #26760, seems like it's not that portable to define wchar_t literals that way, perhaps file encoding needs to be changed (UTF-8 is by default).

@RicardRC check the file encoding for test_string.cpp and try resaving it with UTF-8/UTF-16 encoding and see if that works.

Note: storing the test string with u prefix results in this:

main\tests\test_string.cpp(1071): error C2440: 'initializing': cannot convert from 'const char16_t [34]' to 'String'

So most likely the file encoding should be fixed instead. I don't know how git checks out files on systems with japanese locale, but something tells me it defaults to UTF-16 which leads to this problem.

Note: test_string.cpp compiles fine with UTF-16 LE encoding on my system.

I'm using japanese locale as the secondary language, only kicks in sometimes. Anyway, I'll try what you propose and report results when I find a minute, before 48 hours.

Windows 10, Python 3.6.4, Visual Studio Community 2017

I can't seem to reproduce the issue, but at some step I've noticed that after compiling Godot under japanese locale Test 32: lstrip and rstrip would fail (godot --test string) which does test some other unicode characters, but I'm unable reproduce it again as well...

I reported only the error that prevents compilation. I do see another warning, on test 31, that states something related to your proposed solution.

1>main\teststest_string.cpp(951): warning C4819: The file contains a character that cannot be represented in the current code page (932). Save the file in Unicode format to prevent data loss

UTF-8, MSVSC2019

UTF-8 with BOM results:
Just warnings on test 32, the same that you seem to have reproduced sporadically
But at least no errors, so the engine compiles and works

1>main\teststest_string.cpp(1010): warning C4566: character represented by universal-character-name 'u00BF' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1010): warning C4566: character represented by universal-character-name 'u00B5' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1010): warning C4566: character represented by universal-character-name 'u00FF' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1011): warning C4566: character represented by universal-character-name 'u00BF' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1011): warning C4566: character represented by universal-character-name 'u00B5' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1011): warning C4566: character represented by universal-character-name 'u00FF' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1012): warning C4566: character represented by universal-character-name 'u00B5' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1012): warning C4566: character represented by universal-character-name 'u00BF' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1012): warning C4566: character represented by universal-character-name 'u00FF' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1013): warning C4566: character represented by universal-character-name 'u00B5' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1013): warning C4566: character represented by universal-character-name 'u00BF' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1013): warning C4566: character represented by universal-character-name 'u00FF' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1039): warning C4566: character represented by universal-character-name 'u00BF' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1039): warning C4566: character represented by universal-character-name 'u00B5' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1039): warning C4566: character represented by universal-character-name 'u00FF' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1040): warning C4566: character represented by universal-character-name 'u00BF' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1040): warning C4566: character represented by universal-character-name 'u00B5' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1040): warning C4566: character represented by universal-character-name 'u00FF' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1041): warning C4566: character represented by universal-character-name 'u00B5' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1041): warning C4566: character represented by universal-character-name 'u00BF' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1041): warning C4566: character represented by universal-character-name 'u00FF' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1042): warning C4566: character represented by universal-character-name 'u00B5' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1042): warning C4566: character represented by universal-character-name 'u00BF' cannot be represented in the current code page (932)
1>main\teststest_string.cpp(1042): warning C4566: character represented by universal-character-name 'u00FF' cannot be represented in the current code page (932)

Thanks for reporting, well the universal fix would be to replace those unicode characters by unicode literals (\u00B5), but that would make the code less readable, so I'm not sure about this.

Found related discussion in some other repository with similar configuration: abseil/abseil-cpp#85

So it makes sense to replace those characters by literals in order to stay portable between different compilers.

It does not solve it. Looks like before:

warning: "character represented by universal-character-name 'u00BF' cannot be represented in the current code page (932)"

AND

error: "error reading end of line" in string literal at the end of the Upper cyrillic character literal string

I've been researching a little and found this (which is frustrating...)
https://stackoverflow.com/questions/688760/how-to-create-a-utf-8-string-literal-in-visual-c-2008

The only way for me to compile is still to encode the file in UTF-8-BOM, which gets rid of the cyrillic error, but not the warnings, in the previous version as well as your proposed commit. But at least compiles. I'm not advocating for changing the file encoding for everyone, we should find a true solution. Until then, at least there's a workaround.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

timoschwarzer picture timoschwarzer  路  3Comments

EdwardAngeles picture EdwardAngeles  路  3Comments

blurymind picture blurymind  路  3Comments

ducdetronquito picture ducdetronquito  路  3Comments

n-pigeon picture n-pigeon  路  3Comments