Devilutionx: Translation support

Created on 25 Mar 2019  ·  28Comments  ·  Source: diasurgical/devilutionX

Hi,

Have see patch in FR for diablo but on linux with devilutionx how to make that ?

Best Regards

enhancement

Most helpful comment

Diablo was build with Windows-1252, but in-game it is limited to ASCII. Moving to UTF-8 shouldn't really be an issue and is what we plan on doing. Moving to language specific code pages like Windows 1258 is problematic as it locks the binery to a specific language, and breaks chat between various clients.

The biggest blocker atm is getting gettext building on Mac and Windows. Next step would be to create a TTF version of the diablo font and figure out how to do font substitution in order to support languages that needs glyphs not found in ASCII.

The current progress can be found here: https://github.com/diasurgical/devilutionX/pull/533
I did just discover that https://github.com/SuperTux/supertux might be the perfect fit for us to (gettxt, sdl_ttf, cmake, font substitution). So if anyone is willing to work on this it would be a good place to look for examples of how to implement things.

All 28 comments

Little other question have possibility to play in 1080 Full Screen ?

Hi @liberodark
We currently do not have the translation infrastructure in place, but if you have any suggestions we could talk about what solution would be best. I usually have used Gettext on previous software solutions. It generates a .po file for the translator to work on. There is plenty of software for editing them like PoEdit.

The game will upscale to your current resolution, so yes it will run at 1080p with black bars in the side and bilinear filtering (same as GOG.com's version). That said we are currently working on the renderer to improve the visual output, here amounts having it change the output resolution meaning you will be able to see more of the world at one time (no black bars etc).

Great thank you for your work !

@AJenbo We could acquire gamedata from around the world and do some basic text analysis. The purpose of this exercise is to find out what encoding tables were used for the gamedata.

Analogously, OpenMW and VCMI allow the user to switch between Windows encoding tables 1250 (Eastern European), 1251 (Cyrillic) and 1252 (Western European). I pitched them an idea way back ago that we could do away with manual selection. Morrowind's main data file has a header with a 256 byte comment. In the commercial releases, this comment is in the same language as the rest of the gamedata, so we can compare that 256 byte chunk against known chunks to determine the language and encoding table. This is a little resource hungry since each acknowledged language would take up 256 bytes in resource segment and the language check is O(n). But it takes away hassle from the end user.

If Diablo doesn't use UTF-8, then this is an available solution to compare found gamedata against known gamedata - either by part or checksum and size.

Diablo always uses ISO_8859-1, changing this would break multiplayer and save game compatability. The only exception is the PS1 release, but we are unable to use the data from this release as it is in a mostly unknown format (and has a lower resolution and frame rate).

There are some errors in Diablo's fonts:

005C is / instead of \
2018 includes ‘ from Windows-1252
2019 includes ’ from Windows-1252
00B7 is ? instead of ·
00B8 is ? instead of ¸
00BC is ½ instead of ¼
00BD is ¼ instead of ½

(this is based on the UI font)
Extending the fonts to being Windows-1252 should be easy enough if some one is willing to render out the addition letters in a matching style. We could also fix the errors this way.
The in-game fonts should be the same except that they are missing some of the letters, this can be solved the same way.

This has been done at least partially previously:
https://github.com/diasurgical/devilution/issues/32

Expanding to Windows-1252 adds the following symbols:
€ ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ Ž “ ” • – — ˜ ™ š › œ ž Ÿ

We currently do not have the translation infrastructure in place, but if you have any suggestions we could talk about what solution would be best. I usually have used Gettext on previous software solutions. It generates a .po file for the translator to work on. There is plenty of software for editing them like PoEdit.

This sounds just lovely save for one thing: we would have to find a way to give the lines context. I would suggest some sort of annex companion document included in the package a priori. Assuming gettext would always generate the lines under the same order, having a context file (like an excel sheet) would be very easy and it would help prospective translators a lot.

Gettext support adding comments to the message in code and then including them in the exported .pot file for the translators.

Could you describe in detail how would this work? Would it eventually be possible to export the .po files into an online translation environment such as Transifex?

As a translator myself, there are two ways this could be done. Keeping translations in a free online translation platform, or hosting a translation kit including up-to-date translation memories, which would have to be maintained manually.

My suggestion is the following: whether translations are to be included in the package or hosted online, do set targets for priority languages first (giving priority to French, Italian, German, Spanish is an industry standard) and ensure their translations are finished and polished before making a package out of them. For greek, hebrew, arabic, slavic and asian languages, first handle font support before allowing linguists to begin working on them. More importantly do not let incomplete translations make their way into the package.

A few other things that would be nice tot have:
1 - context field - short description about the specific line and where does it appear in game
2 - character limit field - a comment indicating the max amount of characters (including spaces) every specific line can have in game without it bleeding out of its bounding box.
3 - Male/female/plural tags - romance languages have special cases with male and female words. A good example would be Deckard Cain's "Hello my friend. Stay a while and listen." line. "Friend" can be translated by either "ami" and "amie" respectively a person of either male or female gender. A translation for a genderless word would be possible too (such as "camarade", for example), but would flexibility would vary from language to language and good results are not always guaranteed. OpenXCom made a wonderful implementation of this, and I would advise to look into their code for this matter.
4 - Tags for special formatting - For example [newline] for a linebreak, and so on.

Finally, if possible ensure every language has at least a qualified reviewer who is a native speaker and has at least some translation experience.

These should answer most of your questions:
https://docs.transifex.com/formats/gettext
https://www.gnu.org/software/gettext/manual/gettext.html#Names

Regarding targeting specific languages and having native speaking translators doing reviews; we are a hobby project and as such not part of the industry so there isn't much we can do beyond piquing others interest the same way as is the case for the code that has been written.

It does, thank you for the info.

In regards to language targeting: I would at least say include only fully translated languages in the package milestone builds (incomplete ones could still be in the nightlies or when downloading source for compilation). This is so translators are motivated to finish their work, as incomplete/unpolished translations are a familiar sight in many open source projects.

Appreciate the input

If Transifex is chosen in the end, count me in for the German translation. I've helped with translations of a couple of Open Source projects that I'm far less familiar with than with Diablo. Also I kind of always wanted to do this; it's almost a quarter of a century since I first loaded up Diablo.exe in an hex editor and begun translating the strings found there (which was actually really difficult due to length limitations and the fact that English usually is quite a bit shorter). I'd love to eventually do this properly!

I addressed parts of this issue by patching the MPQ with alternative audio files from the playstation release of diablo, see https://github.com/john-tornblom/psx-tools/tree/audio-lang

I got the audio going with diabloweb. However, these audio files are encoded in a different PCM format, which devilutionX (SDL) does not recognize. I assume its a simple fix though. Re-encoding the WAVs with sox or something might be quick and crude way forward...

I am not 100% sure if I did the WAV mappings correctly, I used a pretty simple spectrum analysis approach to correlate audio files between the PSX and PC version, so there could be errors here.

Also, there seems to be some kind of locale files on the PSX release, so at least extracting some text is feasible. I'm not sure how complete it is with respect to the PC version though...

@john-tornblom make sure you didn't enable audio compression when generating the MPQ, it's a feature only avalible in later version (starcraft era) of storm (was included in some updates) and we do not support it in DevilutionX

thanks @AJenbo, that did the trick! I've now gotten it to work with DevilutionX (v1.0.1).

Extracting text from a PS1 CDROM is pretty easy if you just neglect the binary preamble of the locale files. I posted gettext msgids from MAINTXT.ENG on pastebin: https://pastebin.com/61GmL3BL

If these msgids are similar to the strings used in devilutionX, translating to French, German, Swedish and Japanese should be quick and strait-forward.

Looks like the dialog texts are missing from this. But yeah it should be easy to merge thease with the devilutionx PO files. If only some one knew how to link things on Windows :(

@AJenbo I would love to help out with Vietnamese translation. Ideally UTF-8 font is recommended however I am aware of the breaking changes with multi-player and save game data. Alternatively I think the fonts could be extended to Windows 1258. Thoughts?

Diablo was build with Windows-1252, but in-game it is limited to ASCII. Moving to UTF-8 shouldn't really be an issue and is what we plan on doing. Moving to language specific code pages like Windows 1258 is problematic as it locks the binery to a specific language, and breaks chat between various clients.

The biggest blocker atm is getting gettext building on Mac and Windows. Next step would be to create a TTF version of the diablo font and figure out how to do font substitution in order to support languages that needs glyphs not found in ASCII.

The current progress can be found here: https://github.com/diasurgical/devilutionX/pull/533
I did just discover that https://github.com/SuperTux/supertux might be the perfect fit for us to (gettxt, sdl_ttf, cmake, font substitution). So if anyone is willing to work on this it would be a good place to look for examples of how to implement things.

Supertux seems to be using tinygettext.

From a programmer point of view, its a simple API choice that has to be made. Using string keys with simple macro, e.g., #define _(x) gettext(x), seems very popular. I've also seen enum/int keys which seems to reduce readability of source code. However, the latter would fit well with the audio lookup techniques already being leveraged in devilutionX. I guess one could just use a different macro where enum keys are preferable, e.g, #define _(x) gettext("x"), while using the same underlying translation table.

From a translator point of view, we would like to have a fileformat with good tool support, and possibly the ability to convert translations to other file formats.

For assets (images and audio) we would use mpq or folders with localized language files. So no changes would be needed there except loading an extra mpq file.

For assets (images and audio) we would use mpq or folders with localized language files. So no changes would be needed there except loading an extra mpq file.

Right, ofc. I'm not that well acquainted with the src, I just assumed assets in MPQs are accessed using enums, like the ones in https://github.com/diasurgical/devilutionX/blob/master/enums.h

to hell with the mpqs! :P

@AJenbo

Next step would be to create a TTF version of the diablo font and figure out how to do font substitution in order to support languages that needs glyphs not found in ASCII.

I know a designer friend who could help with the TTF font. Feel free to let me know once you have sorted out all technical details with the implementation.

@runlevel5 I would really appreciate some sample color fonts in the various OTF formats: SVG | COLR | SBIX | CBDT. If you can help with that then it will be much clear for us to figure out where to go from here.

What I was initially looking at was to edit the AA rendering of the font for a specific size since that would let me paint a pixel-based grayscale version that could then be recolored at render time. It looks like FontForge is capable of this, but I have been unable to figure out how to achieve this since the option appears grayed out when I try the program.

If your friend knows of any other ways to implement textured fonts that would also be helpful.

Ok, change of plans we will continue to use bitmap (image) font. The way that we will support Unicode in DeivlutionX is by using an image per Unicode Block. This allows us to break up the font in a way that doesn't create massive images and swallow all your ram. You can get an overview of what each block contains
https://en.wikibooks.org/wiki/Unicode/Character_reference/0000-0FFF

The first release will probably contain fonts for Basic Latin, Latin-1 Supplement, and Cyrillic. But if people contribute more translations and fonts we will ofcause be happy to add them as well.

@runlevel5 would the Tai Viet block cover what you need for a Vietnamese translation?
https://en.wikipedia.org/wiki/Tai_Viet_(Unicode_block)

The text would be UTF8 encoded, this won't actually break savegame support, but any hero that is using a non-US-ASCII name won't appear correctly if loaded in the original Diablo.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ctrl-meta-f picture ctrl-meta-f  ·  30Comments

Chance4us picture Chance4us  ·  15Comments

spitfire picture spitfire  ·  22Comments

predator8bit picture predator8bit  ·  21Comments

julealgon picture julealgon  ·  16Comments