OS: Windows 10
Version: 0.1.3
Commit/Build: d0ad94b
There are hundreds of OpenRCT2 custom-made scenarios in korean community, and also they have korean texts in their scripts such as scenario description.
But it is impossible to load such scenarios, nor hovering mouse on the scenario list. Game just crashes.
When I run them by using openrct2.com, its console returns like this:
WARNING[c:\projects\openrct2-ject9\src\openrct2\localisation\localisation.cpp:1155 (format_string)]: Truncating formatted string "ํฆ์ฑ?์
ใฒใ๋ณ๋ผฏ ์ฑ ํฌํ์ฑํทํต์ฑํํ์ฑํซ์ง์ฑํถ??์ฑํํจ์ฑ?๋ฟ๋ต ์ฑํํฑ ์ฑํทํ์ฑํกํฌ ์ฑํ?์ฑํถ?์ฆใข๋
๊ถ์ฑํดํจ์ฑํถํ ์ฑํํฃ์ฑํซํต์ฑํกํฌ์ฑ?ใ
๊ฑ?๋บ๋ผฑ!, ์ฑ์งธ?์ฑ์งธ์งค ์ฑํํ์ฑํกํฌ ์ฑ?๋ณย์ฆ?์๋ฌ?์ชย์ฆใ๋ณ๋ผฏ ์ฑํทํฑ ์ฑ ํฎ??์ฑํํฑ ์ฑํทํ์ฑํต?๋ใ๋ท๋ ใ๊ฒญํฃ" to 256 bytes.
Note that 0.1.2 stable works fine, and default scenarios work fine, too.
Steps to reproduce:
Dump file
Screenshots / Video:
And, the three scenarios in the very bottom in the list are custom scenarios made by users.
Their name were "์จ๊ฒจ์ง ์ง์ค", "ํ๊ต๋ ๋ํ๊ต", "ํ์์์ฌ ์ ์ฃผ", which is korean text.

And, an error message when I double-clicked sc6 file directly:

Save game:
Until recently, OpenRCT2 would save newly created scenarios in the wrong encoding. But since it also loaded scenario names in the wrong encoding, this was never obvious.
In any case, it should not crash because of encoding going wrong.
adding utf-8 string check to GetScenarioInfo solves this problem.(not convert to utf-8 if string is already utf-8)
but it will be better if encoding is checked by library like libicu.
http://rctsc.telk.kr
There are about 600 custom scenarios literally and most of them are unplayable because of this bug.(having korean in names)
#7414 fixes the scenario list, but no fixes at scenario description. (=> My misunderstand)
And even the rendering bug is occured as you can see below (No pause button in main menu, sometimes window rendering is made incorrectly):

I can reproduce the rendering issue using English, too. For me, the pause button does appear, but chunks of the menus themselves are missing, including the background. This notably happens on the save menu, for me. I'm guessing this is unrelated to the character set changes, but I have not had time to bisect the issue.
@telk5093 names are not fixed. try delete scenarios.idx file. scenario names will break again
@telk5093 In the original Korean version of RCT2, is it possible to save scenario descriptions in Korean?
@Gymnasiast Yes, it is stored in modified CP-1252.
Can you provide me with a scenario with Korean descriptions, created in the original Korean version of RCT2?
@Gymnasiast Please wait a couple of hours.
Korean scenario.zip
Scenario name: ํ๊ตญ์ด ์ด๋ฆ
Park name: ํ๊ตญ์ด ์ด๋ฆ
Scenario description: ํ๊ตญ์ด ์ค๋ช
Generated from vanilla RCT2 (not a steam edition. Steam edition does not support korean)
So, it is not solved until today.
Is it hard to solve it or nobody is interested in?
@Gymnasiast @AaronVanGeffen
@telk5093 making converter and convert all wrong encoding scenario files might better than converting in openrct2.
there's already solution for this(2e7d6e58d0daef41c2216081314ad9551049446d) but that commit is not complete(that did not check name, needs rebase). also, some short korean string might parsed as utf-8 anyway(since it only depends having colour code in string and short korean strings might not have that)
@Lastorder-DC That, plus I would also want to add support for the encoding that RCT2 used, if possible.
By the way, could someone provide me with the Korean exe of RCT2, so I can test a few more things?
@Gymnasiast Here you are: rct2.zip
hang.zip
And maybe you would need this hang.dat file which is similar to kanji.dat, stored in Data folder
@Gymnasiast
I have tried to avoid this issue like below:
But It seems that all korean scenario name/desc are kept in broken after Danish translation is added.
I think LANGUAGE_KOREAN is shifted one in src/openrct2/localisation/Language.h so that my palliative does not work, do you think am I right?
Yes, I think so.
Is it right to put danish in the middle of enum, instead of being added at the last position?
Yes, because it's needed to keep another list sorted alphabetically.
I think recently saved scenario in OpenRCT2 mangles korean encoding.
I just made a scenario in OpenRCT2(3ccad7c) with name ์๋๋ฆฌ์ค์ด๋ฆ, but when I decode SC6 file and look its hex data, it stores name as 20 C2 9C 20 20 20 20 20 20 20 C6 A4 20 C7 B4 20 20 20
I know that 20 means a blank, but there are no blanks in the name I made.
(There is a posibility that I decoded my sc6 files incorrectly, but I read 64 bytes integers from 0x48, in decoded chunk 1 as this code says.)
Originally, ์๋๋ฆฌ์ค์ด๋ฆ = BD C3 B3 AA B8 AE BF C0 C0 CC B8 A7 in EUC-KR encoding. (BD C3 = ์ / B3 AA = ๋ / ...)
And the vanilla would stores ์๋๋ฆฌ์ค์ด๋ฆ as FF BD C3 FF B3 AA FF B8 AE FF BF C0 FF C0 CC FF B8 A7. Note that there are FFs in front of each two bytes.
I finally got around to testing it.
This is what the scenario you provided looks like in vanilla RCT2:
And there lies the problem: RCT2 simply assumed that the scenario was in the same encoding as the language the EXE was in. It basically means that we cannot import scenarios with Korean descriptions properly without breaking support for scenarios with English/German/Dutch/etc. descriptions.
It basically means that we have to stick with one of the encodings RCT2 used, and we picked the modified Windows-1252 used for languages in the Latin alphabet. That also means we won't be able to save Korean text properly until we switch to our own save format.
I can't understand that it requires new save format.
Former OpenRCT2 had supported korean scenario names/descriptions very well(Eg. before 0.2.0), and it has been ruined at some time.
Is it impossible to revert back to check what code is ruined korean scenario names/desc?
No, because that breaks importing scenarios from vanilla with non-ASCII characters. The code wrongly assumed that RCT2 saves used UTF-8 encoding.
I know it's not nice that you cannot save Korean descriptions now, but I cannot fix that without breaking compatibility with the vanilla encoding.
This issue can only be _properly_ solved with a new save format. All other options break _something_.
As I understand it, vanilla RCT2 used different encoding in the scenario files depending on its language. Unfortunately, it does not save which language a scenario is describing, so we have to pick the most compatible oneโฆ
Like @Gymnasiast says, if we assume it's described UTF-8, it breaks custom English/Dutch/German/etc. scenarios created with vanilla RCT2 (particularly ordinal values >= 128; e.g. characters with diacritics/accents).
So we have to make a choice. I'd like to switch to UTF-8 and be done with it, but then you're breaking compatibility with vanilla RCT2 for _all_ languages. Unfortunately, that's not a trade-off everyone wants to make. Regrettably, that would mean scenario descriptions for CJK languages will be broken for a little while longerโฆ
Then it will be very serious since Korean users can't use most of custom scenarios until new save format is applied(and we have no idea when it will be applied). I think it is not proper to wait until we apply new save format. Any alternatives?
@Gymnasiast Are there no padding bytes in the SV6 format left that we could use to flag that strings are UTF-8 encoded? I'm looking at e.g. uint8_t pad_013573D6[2]; right after the park name.
Of course, scenarios whose strings are encoded in UTF-8 would still have their text mangled when opened in vanilla, but at least OpenRCT2 could then be compatible with both vanilla scenarios, and ones exported through itself.
Would you be able to detect the encoding based on the strings in the scenario file?
I don't think so. I can detect the 0xFF character, which means it's multibyte (and since I think most CJK users will be Korean, we could assume Korean if the SV6 file has them). I can also detect if the converted CP1252 contains colour codes or other stuff not expected in descriptions - meaning I can detect the broken scenarios with UTF-8 encoding. Saving is far more complicated, though.
I can detect the 0xFF character, which means it's multibyte
Sorry, I don't follow. 0xFF is forbidden by the UTF-8 specificationโฆ?
Chris Sawyer used the 0xFF character to separate characters in multibyte encodings. So if it contains them pre-conversion, it means that the text is using a multibyte encoding.
@telk5093 I can load the park just fine, with no crash and no pause button glitch, so I'm closing this issue. Please let us know if you can still reproduce it.
Most helpful comment
Korean scenario.zip
Scenario name:
ํ๊ตญ์ด ์ด๋ฆPark name:
ํ๊ตญ์ด ์ด๋ฆScenario description:
ํ๊ตญ์ด ์ค๋ชGenerated from vanilla RCT2 (not a steam edition. Steam edition does not support korean)