Gutenberg: Add keywords to core blocks for the sake of the translation

Created on 8 May 2018  Â·  26Comments  Â·  Source: WordPress/gutenberg

Hi, for now core blocks doesn't seems to use keywords.

Many languages are not as easy as english, for example in French there are several translations possible for the subhead block:

It can be :

  • _sous-titre_ (but it's more like subtitle than subhead)
  • _chapô_ (used by journalists)
  • _extrait_ (excerpt)

So as none of theses translation are perfect, the use of keywords could help people to find the block without knowing its official name.

To go further with this issue :

In fact each language has specific needs (some won't need any keywords, others will need 2 or 3 according to the context).
And from one language to another the keywords are not necessarily the same.

I really don't see how it can be done technically (mayby just set some keywords in english and other language will adapt instead of literally translate).

cc @audrasjb from the french translation team with whom I have discussed this issue

Internationalization (i18n) [Feature] Blocks [Type] Enhancement

Most helpful comment

Then if the author wants to make the plugin accessible via the lyrics keyword (for song lyrics for example) and add lyrics to the mix, the string will become 'blockquote,quote,lyrics', which will require re-translation of the whole string in all languages as it has changed and no longer matches.

This is arguably a more tangible concern for third-party blocks than core ones, that's true. Honestly, though, given the flexibility of the comma-separated method, I think it would be fine if a block author particularly concerned with break translations and discoverability were to do the following:

  1. Starts with keywords: [ __('blockquote,quote') ], which has been translated to target locales.
  2. Decides to add lyrics: keywords: [ __('blockquote,quote,lyrics') ].
  3. Wants to allow time for translators to catch up, so temporarily declares keywords: [ __('blockquote,quote,lyrics'), __('blockquote,quote') ].

It's up to the block author to manage these deprecations and clean up keywords afterwards, but it would solve the problem. Naturally, keywords would be compiled by flattening the arrays and uniq'ing.

All 26 comments

Thanks for bringing this up, @maximebj. I would agree with the issue, but it also seems to me, in this particular case, that the real issue is the ambiguous semantics of the Subhead block, especially when looking at _extrait_ but also to some extent at _chapô_. Perhaps the problem lies right here:

https://github.com/WordPress/gutenberg/blob/1d9288443b38d254092f38d0d818f96be22346ad/packages/block-library/src/subhead/index.js#L18

Subhead is loosely defined (and its definition based on size), which incidentally may also encourage using it for cosmetic purposes more than functional. What do you think? Could we start acting there instead?

I'm not saying that we shouldn't work on a way to interpolate alternate translations into a block type's keywords, but I'd rather only do that if and when we have compelling examples.

@mcsf I think I have a compelling example! See #8365.

@chrisvanpatten, thanks for the cross-ref! In that issue:

So I expect here the issue is actually allowing translators to provide additional, language-specific options for a block’s keywords. This would be useful as in the case in the issue report […] or [for] any languages that might have multiple words to refer to a concept that only has one word in English.

Thinking about this a bit, and intending to keep things simple, I'm think string context (with _n) to be enough. Thusly:

 title: __( 'Separator' ),
- keywords: [ __( 'horizontal-line' ), 'hr', __( 'divider' ) ],
+ keywords: [ __( 'horizontal-line' ), 'hr', __( 'divider' ), _n( '', 'synonyms for Separator block' ) ],

Observations from this approach:

  • Keywords are limited to three per type, so this example raises a warning.
  • By default, _n( '', 'synonyms for foo' ) will return '' (we could also choose null or undefined), which is automatically ignorable — and could even be explicitly cleaned up with lodash#compact.
  • If more than one synonym is provided for a locale, no special token needs to be used to separate synonyms: a simple space will do. This is because matching already takes word boundaries into account. So a translator could provide a string like so:

String | Context | Translation
--|--|--
(empty) | synonyms for Twitter block | sns, tweet, ツイッター

_n is for plural forms, not context. For context _x() has to be used.

If more than one synonym is provided for a locale, no special token needs to be used to separate synonyms: a simple space will do. This is because matching already takes word boundaries into account. So a translator could provide a string like so:

A translation could consist of multiple words though, so a space is not enough. A comma is more safe, as you have used in your example as well.

I just want to point out that right now it's difficult for translators to translate dozens of keywords for all the blocks. It would be way easier if for each block there was just a single translatable string like _x( 'horizontal line, divider, separator, block keywords' ) instead of an array of words with no context or anything.

This way it's also possible for some locales to keep the english strings if necessary, which makes using the block inserter autocomplete much easier. This is similar to the list of words in wptexturize or the list of stopwords in WP_Query.

_n is for plural forms, not context. For context _x() has to be used.

A translation could consist of multiple words though, so a space is not enough. A comma is more safe, as you have used in your example as well.

Must’ve been half asleep when I wrote this. :/ Thanks for making sense of it.

easier if for each block there was just a single translatable string

This sounds apt to me as well. It could mean that the block API also accepts keywords as a string that it then splits on comma. This would, as a bonus, prevent translations from bypassing Gutenberg’s imposed limit of three keywords per block type. /cc @mtias

Note sure if the 3-keywords-limit is still appropriate for translations though.

Note sure if the 3-keywords-limit is still appropriate for translations though.

As a translator, I agree; as someone who knows how these things can easily be exploited, I don't so much. :)

13848 has landed which removes the limit of 3 keywords altogether. Does it solve this issue?

I'm not sure it could specifically fix this issue, but it's a start.

If a block only has 2 keywords, all translators should use only 2 translations. Which doesn't fit in any cases.

In my opinion, keywords should be a simple string as @swissspidy suggested before

keywords: _( 'word, another, thing, stuff'),

So translator could use only the amount of keyword needed in each language.
In JS the keyword would still be searchable.

I think we still need to discuss this issue before closing it

In my opinion, keywords should be a simple string as swissspidy suggested before

I see no compelling argument _against_ adopting this.

This would just require some backward compatibility to existing blocks using arrays, but not a big deal.

Let's explore this in a PR then 🙂

To be clear: are you owning that, @swissspidy? Or would you like someone to help?

It would be great if someone else could help with this as I don't have that much time to devote to this at the moment.

Alright then. Any takers, @maximebj, @bisko?

I can take on that sometime next week! It seems like a good problem to use to dive into the Gutenberg world!

I've been playing a bit with this last week and I think I won't be able to fully solve the issue short of what @gziolo suggested here.

The main issue is that we get an already translated block name/title when we build up the autocomplete cache. Relevant portion is in the loadOptions method - # - the data that gets passed contains all the data already translated and I'm not sure what's a good way to get the "source" data, without the translation.

If we look at the Quote block configuration for example - # - we can see the issue is how we define the data:

export const settings = {
    title: __( 'Quote' ),
    description: __( 'Maybe someone else said it better -- add some quoted text.' ),
    icon: <SVG ...(shorted)... /SVG>,
    category: 'common',
    keywords: [ __( 'blockquote' ) ],

If we didn't specify the title with translation directly, then we can use it to build up better set of keywords. Unfortunately if we only put the non-translated version there we have no good way of providing the translation markup that Babel uses to detect the strings.

I'm thinking of several possible solutions, none of which is ideal, but people with more experience can provide more feedback and thoughts here.

  1. Provide untranslated block title in the set of keywords. This seems the best option if we want to keep the current settings syntax. It's now possible after #11949 got merged and lifted the 3 keywords limit.
  2. Dynamically add the block id to the list of keywords after some cleanup as suggested by @gziolo here
  3. Add a title_native or title_untranslated property to the block settings and add that to the list of keywords
  4. This is just a theory, as I'm not that well versed in the JS build systems - During build time, capture all the translatable strings from title and keywords, make the list unique and then add this list to the list of keywords.

    • For example in the Quote block above - during build, the process is going to grab 'Quote' out of title and put it into the keywords list, so in the end we'll have a list like keywords: [ __( 'blockquote' ), 'blockquote', __( 'Quote' ), 'Quote' ]

Options 1 and 3 will require manual work and maintenance on all current and future blocks to keep the keywords lists up to date.

Option 2 is a bit of a hack that solves the issue in the immediate future but as mentioned it's going to cause problems with generated names.

Option 4 seems (to me) as most future proof if the syntax keeps as it is now. Blocks that didn't follow the same build process will "revert" to the current behavior and not appear in the list during search, while blocks that had the "new" build process will appear properly translated.

I'm not sure what option is the best, as I mentioned above, so let's discuss that! :)

Can we perhaps leave it to translators to add the original untranslated string if they want to? Because always including the untranslated keywords doesn't make sense in every case because it adds unnecessary noise and leads to unexpected results for non-English speaking people.

This is easily possible if the keywords are just a comma separated string.

Take the Quote block as an example:

Instead of keywords: [ __( 'blockquote' ) ], we could use comma-separated strings, e.g. keywords: __( 'blockquote,quote' ),.

For the German translation, polyglots can then translate this to quote,zitat.

I want to add a bit of a background on why I got involved in this discussion out of nowhere, to hopefully clarify what the issue I'm trying to solve is.

I'm constantly switching between English and Bulgarian layouts throughout the day and also using WordPress(.com but that doesn't matter in this case) translated to Bulgarian (so I can get a sense of problematic translations and update them) to write on our internal P2s using Gutenberg.

With that constant switching, I'm often typing in the wrong language when I go back to an app and start typing in the wrong language.

Slack and PHPStorm handle this very well as I can go in and start typing the "English" version of what I want to type and it gives me what I wanted:

screenshot 2019-02-26 at 15 11 28
screenshot 2019-02-26 at 15 09 21

_(note: I understand that's more of transliteration between Cyrillic and Latin scripts, but it has to do with the usability and user expectations)_

Gutenberg on the other hand doesn't give me anything that's not an exact string match:

screenshot 2019-02-26 at 15 13 37

In this case I'm trying to add a Title block, which in Bulgarian is translated to Заглавие. Since I'm writing in English for my colleagues, I have to switch to Bulgarian, type Загл insert the block, switch back to English and continue typing.

An ideal case (for me) would be that I would be able to insert this title block with all the following options: Title, Титле, Заглавие, Zaglavie or a substring of that word.

Can we perhaps leave it to translators to add the original untranslated string if they want to?

With the above said, I don't fully agree to making this optional as it will create an annoying disparity between blocks that support that and blocks that do not, especially in multi-lingual setups.

Instead of keywords: [ __( 'blockquote' ) ], we could use comma-separated strings, e.g. keywords: __( 'blockquote,quote' ),.

For the German translation, polyglots can then translate this to quote,zitat.

I'm a bit worried of the manual approach here for translators. What happens if the plugin author updates the keywords? Wouldn't they need to be re-translated? It seems a bit more stable to have both translated and untranslated versions in the keywords list so if a new keyword is added only that one will be translated (if not already translated).

I'm sorry if I'm derailing the discussion off the main topic of the issue. I can open another issue to discuss the above if needed.

It seems a bit more stable to have both translated and untranslated versions in the keywords list so if a new keyword is added only that one will be translated (if not already translated).

True, it's more _stable_ as in _reliable_, but as I mentioned sometimes also not necessarily wanted. Hence my suggestion to leave it to the locale managers.

What happens if the plugin author updates the keywords? Wouldn't they need to be re-translated?

Yes, but that is also the case when it's an array and the plugin author changes the keywords… So I don't see your point here.

Please see https://github.com/WordPress/gutenberg/issues/6633#issuecomment-438218408 for my original reason for suggesting comma-separated strings.

If it's an array of 3 keywords, often times in German we have 4 or 5 keywords to describe the block. There's currently no way to support that.

That's why a comma-separated string is preferable.

I don't feel strongly about whether that ends up in comma-separated originals plus comma-separated translations being used for the autocomplete. That would solve both problems, no?

Yes, but that is also the case when it's an array and the plugin author changes the keywords… So I don't see your point here.

I think I gave a bad example here.

A single string would be as you mentioned above:

keywords: __( 'blockquote,quote' ) which the translator has to translate to blockquote,quote,zitat for German.

Then if the author wants to make the plugin accessible via the lyrics keyword (for song lyrics for example) and add lyrics to the mix, the string will become 'blockquote,quote,lyrics', which will require re-translation of the whole string in all languages as it has changed and no longer matches.

If it's kept as an array - keywords: [ __( 'blockquote' ), 'blockquote', __( 'quote' ), 'quote' ], the addition of lyrics will become just 2 more entries at the end of the array: [ __( 'blockquote' ), 'blockquote', __( 'quote' ), 'quote', __( 'lyrics' ), 'lyrics' ], meaning that the translators will have to translate only the word lyrics (which can also be already translated).

The autocomplete search already uses an array loop to find a match ( # ), so this change would be only required for the block configurations, not the search code.

If it's an array of 3 keywords, often times in German we have 4 or 5 keywords to describe the block. There's currently no way to support that.

That's a whole another set of problems for i18n :( Is there a supported way to add aliases for translations in the engine that's used by WordPress? That would be a great candidate.

Another way I'm thinking of right now would be to add a "dynamic" entry to the translation file, since it's already machine-generated (gutenberg.pot) that's something along the lines of <block-id>-aliases and translators can add all the aliases to that block for the language they're translating for?

This way we don't add too much syntax to the configuration and have the option of multiple aliases for the same block.

I had a whole tirade with wild ideas about managing breaking changes to strings, etc., :) but the reality is that this is why there is a cycle to WP and Gutenberg development. In WordPress core, there is an actual schedule that encompasses string freezes. There is no such thing in the Gutenberg plugin, though; the closest equivalent would be the narrow window between a plugin release candidate and its release (typically occurring 48 hours later). So there's room for improvement if we want to provide some localisation stability for users of the plugin who have chosen something other than English.

Going back to the issue at hand, I think that, overall, comma-separated strings would solve most issues. As Pascal points out, it lets locale managers deal with each locale's idiosyncrasies. There are even more technical precedents, such as delegating decisions on font families, so it seems more than fair that they should now decide not only what synonyms to provide for each keyword, but also whether to include fallback strings.

Then if the author wants to make the plugin accessible via the lyrics keyword (for song lyrics for example) and add lyrics to the mix, the string will become 'blockquote,quote,lyrics', which will require re-translation of the whole string in all languages as it has changed and no longer matches.

This is arguably a more tangible concern for third-party blocks than core ones, that's true. Honestly, though, given the flexibility of the comma-separated method, I think it would be fine if a block author particularly concerned with break translations and discoverability were to do the following:

  1. Starts with keywords: [ __('blockquote,quote') ], which has been translated to target locales.
  2. Decides to add lyrics: keywords: [ __('blockquote,quote,lyrics') ].
  3. Wants to allow time for translators to catch up, so temporarily declares keywords: [ __('blockquote,quote,lyrics'), __('blockquote,quote') ].

It's up to the block author to manage these deprecations and clean up keywords afterwards, but it would solve the problem. Naturally, keywords would be compiled by flattening the arrays and uniq'ing.

That's a whole another set of problems for i18n :(

It's exactly what this issue here tries to address though. Quote:

In fact each language has specific needs (some won't need any keywords, others will need 2 or 3 according to the context).

I read this Issue. And I think It's better than now.

Japanese users can't use these shortcuts actually.

So, I tested some keywords add to the button block like 3c7ec60 , I could get the button block by the keyword 'link' and 'button' in Japanese, not only 'ボタン' or 'リンク'. I also tested Heading blocks adding the keywords __( 'title,subtitle,heading' ),, I could get the heading block by 'heading'.

It's a big change for me (and maybe the other not English language users), so if this way looks good for you, I would like to add the keywords to the other core blocks ASAP.

Was this page helpful?
0 / 5 - 0 ratings