Godot-proposals: Add plurals support for CSV translation files

Created on 1 Aug 2020  ·  3Comments  ·  Source: godotengine/godot-proposals

Describe the project you are working on:

A game (main language is Russian).

Describe the problem or limitation you are having in your project:

CSV translation files do not support plurals. godotengine/godot#40443 adds plurals support for .po files, but CSV is overlooked.
.po files are designed to use English as the primary language, while CSV also allows identifiers.


Comments

Me:

It looks like the tr_n function is not very suitable if you are using identifier system:

tr_n("MY_ID", "", n) # Or `tr_n("MY_ID", "MY_ID", n)`?


From the docs
There are two approaches to generate multilingual language games and applications. Both are based on a key:value system. The first is to use one of the languages as the key (usually English), the second is to use a specific identifier. <...> In general, games use the second approach and a unique ID is used for each string.

@Calinou:

@dalexeev In my experience, gettext PO files are heavily centered around using English text as identifiers. On the other hand, custom formats (like Godot's CSV format) and XLIFF tend to recommend using keys as identifiers.

@pycbouh:

In my experience, gettext PO files are heavily centered around using English text as identifiers.

This is definitely the intended way to use it by the creators for translating Linux, but the file format itself is not enforcing this as a rule in any way. If you use keys as identifiers, some tools may warn you that your translation language is English (POEdit does that, for one), but it's on the user to handle this. In this case the user being the engine.

Describe the feature / enhancement and how it helps to overcome the problem or limitation:

For CSV, we should also implement plurals support. For example like this:

KEY |en |ru
-----------|-------------|-------------
DAYS_AGO[0]|%d day ago |%d день назад
DAYS_AGO[1]|%d days ago |%d дня назад
DAYS_AGO[2]|- |%d дней назад

Usage:

var s = tr_n(n, "DAYS_AGO") % n

That is, we just have to make n the first argument, and it will be compatible with both systems.

Indeed, some cells remain empty. But there are relatively few of them. Note that strings without numeric substitution still require only one row:

KEY |en |ru
--------------|-----------|------------
REGULAR_KEY |Regular key|Обычный ключ
... |... |...
SPECIAL_KEY[0]|%d key |%d ключ
SPECIAL_KEY[1]|%d keys |%d ключа
SPECIAL_KEY[2]| |%d ключей
... |... |...
ANOTHER_KEY |Another key|Другой ключ
... |... |...

There is another option:

KEY |en[0] |en[1] |ru[0] |ru[1] |ru[2]
--------|----------|-----------|-------------|------------|-------------
JUST_KEY|Just a key| |Просто ключ | |
DAYS_AGO|%d day ago|%d days ago|%d день назад|%d дня назад|%d дней назад

But I like the first option better, because strings usually don't have numeric substitutions. Moreover, each language in this variant requires multiple columns. Although if we split the table into two files (for tr() and for tr_n()), then there will be no empty cells at all. But this is also not good, because it complicates the work (2 files instead of 1). In general, the first option is the most compromise.

Describe how your proposal will work, with code, pseudocode, mockups, and/or diagrams:

It's not hard to implement. Here's an example to help you understand how this should work:

func tr_n(n: int, key: String) -> String:
    return tr("%s[%d]" % [key, f(n)])

func f(n: int) -> int:
    match TranslationServer.get_locale():
        "en_US":
            if n == 1:
                return 0
            else:
                return 1
        "ru_RU":
            if n % 10 == 1 && n % 100 != 11:
                return 0
            elif n % 10 >= 2 && n % 10 <= 4 && (n % 100 < 10 || n % 100 >= 20):
                return 1
            else:
                return 2
        ...

The only thing, the first option only works with identifiers. The second option also works with English strings as the primary key.

If this enhancement will not be used often, can it be worked around with a few lines of script?:

This is a commonly used feature. In addition, there is currently no way to globally redefine the tr_n function.

Is there a reason why this should be core and not an add-on in the asset library?:

.po files are not a complete replacement for CSV (see above). Therefore, CSV should support plurals as well as .po files.


@akien-mga:

For CSV plurals, I would suggest opening a proposal indeed and doing research on how plurals are handled by other projects that support CSV translations.

From what I found, there are many different CSV translation workflows and the few that support plurals have it hacked in in a way as suggested e.g. here, but there's no common standard. It's a simple system so we can indeed design our own plurals logic, but if there was a somewhat "popular" way of doing plurals with CSV used e.g. in other game engines, it would be best for us to follow that.

core

Most helpful comment

I have implemented this feature. It functions like how the proposal describes, using tr_n(n, "DAYS_AGO") will fetch the correct plural translation from the CSV using adjusted key, i.e. DAYS_AGO[0], DAYS_AGO[1] etc. depending on the locale and n.

The PR should be coming soon.

All 3 comments

Side note:

.po files are designed to use English as the primary language

That might be a convention, but I think it's not true. I do use identifiers in my game with .po files and it works fine in Godot. I dunno where these convention differences come from but it's not enforced into the formats themselves.

@Zylann The API added in godotengine/godot#40443 assumes:

# tr_n(message, plural_message, n, context = "")
var s = tr_n("%d day ago", "%d days ago", n) % n
# ru.po
msgid "%d day ago"
msgid_plural "%d days ago"
msgstr[0] "%d день назад"
msgstr[1] "%d дня назад"
msgstr[2] "%d дней назад"

If using IDs:

# tr_n(message, plural_message, n, context = "")
var s = tr_n("DAYS_AGO", "", n) % n
# en.po
msgid "DAYS_AGO"
msgid_plural ""
msgstr[0] "%d day ago"
msgstr[1] "%d days ago"

I suggested changing the order of the arguments:

That is, we just have to make n the first argument, and it will be compatible with both systems.

# tr_n(n, message, plural_message = "", context = "")
var s = tr_n(n, "DAYS_AGO") % n

However, CSV still needs full plurals support. If only because CSV files can be opened in any spreadsheet processor, and .po files are inconvenient to edit without special software.

I have implemented this feature. It functions like how the proposal describes, using tr_n(n, "DAYS_AGO") will fetch the correct plural translation from the CSV using adjusted key, i.e. DAYS_AGO[0], DAYS_AGO[1] etc. depending on the locale and n.

The PR should be coming soon.

Was this page helpful?
0 / 5 - 0 ratings