Pkp-lib: Replace bespoke translation toolset with more standards-based options

Created on 18 May 2019  ·  112Comments  ·  Source: pkp/pkp-lib

Currently, all translations are done in XML files, like mentioned in: https://github.com/pkp/pkp-lib/issues/4029#issuecomment-417907420, which is very inefficient for translators to translate, or sync a few of items between dozens of XML files.

Is there any chance to use a more advanced online translation platform like: https://crowdin.com/ ? In crowdin, all translators only need to do the translation in web browser, and no need to track which words have not been translated yet. The translation will be deployed automatically with a new git commit. Can we consider it?

Most helpful comment

If we can set the server, plus the code you wrote to make OJS understand XLIFF... seams feasible to announce the translation server at the PKPBCN19, isn't it?

You probably intentionally missed to answer this question? ;-)

Yup, I'm planning to include some XLIFF-compatible tweaks that interested parties can experiment with for the 3.2 release.

All 112 comments

@hsluoyz, we have been hoping to replace our own translation tools with something else for some time, but unfortunately have not been able to make much progress on it. I was not aware of Crowdin, which does appear to have a free open source/academic plan:

Can I get an Open Source or a free Academic License?

Yes. If you want to use Crowdin for an Open Source project, sign up for a free account, set up your project and send us a request. Apply for an Academic License if your project has educational purposes. Each granted license will include an unlimited number of projects, strings, and members.

We're hesitant to use freemium services for necessary elements of the software, but some of the other translation options we've been considering are freemium as well.

(Tagging @mtub and @marcbria)

As you know I'm not a fan of a privative software, so my vote will be always no... and less when we have free alternatives (free as in freedom. I'm ok if it's a paid service) that covers all the requirements (github/lab integration, import/export, translation memories, glossaries, multiple formats...). If somebody is interested, we made a comparative of the requirements.

So Heildelberg we started a weblate instance, that it's still up and running.
I can keep it in production for PKP if you need hosting... but we need somebody with time to set up it all correctly. OJS native files need to be parsed and converted to XLIFF, and then mapped in the tool.

IMHO, it's a huge task at the beginning, but will make the translation task a piece of cake and facilitate the integration of non tech profiles in the translation team.

If we are not going to host our own tools (that IMHO is an error ;-)), SaaS could be an option, but...
a) it need to be something based on free software...
b) and we need to be completely sure we can move everything outside the tool if in future we don't like the service conditions.

Weblate SaaS accomplish with those 2 conditions while crowdin doesn't.

You all know what happens when we trust in proprietary tools that start as "free of charge" to get enough people, and then move to a restrictive business model.

Hi @asmecher ,

I was not aware of Crowdin, which does appear to have a free open source/academic plan:

In fact, I'm using Crowdin in the docs site of my own project: https://crowdin.com/project/xxx, the site is here: https://xxx.org/. You can see there's a "English" button in the top to switch the translation. Of course there are many popular projects using it (see here) including Minecraft, Khan Academy, GitLab. I'm also recommended by other people about it, and currently it seems to be the No.1 popular online translation platform (correct me if I'm wrong).

HI @marcbria ,

As you know I'm not a fan of a privative software, so my vote will be always no... and less when we have free alternatives (free as in freedom. I'm ok if it's a paid service) that covers all the requirements (github/lab integration, import/export, translation memories, glossaries, multiple formats...). If somebody is interested, we made a comparative of the requirements.

I think Crowdin has covered these requirements (GitLab integration not checked, as I'm using GitHub only).

a) it need to be something based on free software...

Using self-hosted translation tool indeed gives ourselves more control. But it also brings many more efforts. The main task of this project is academic journal manuscript software, not a translation software. We don't need to build or host all services on our own. GitHub is actually a non-free software but we are still using it for free and open-source projects, right? GitLab is still not as popular as GitHub.

We can let professional people do their professional job. Currently, this project (ojs) is already short in person as many translations are not complete (at least in Chinese as I checked). We don't have the efforts to build/host a translation system.

b) and we need to be completely sure we can move everything outside the tool if in future we don't like the service conditions.

I understand your concern, but as I said above, much larger and popular projects like Minecraft, Khan Academy, GitLab are already using Crowdin. We are not the one to be hit first when the sky falls. Even if one day, Crowdin broke up, We still get the all translation files (which will be stored in our repository). It's no worse than current. We have nothing to lose.

Using self-hosted translation tool indeed gives ourselves more control.

Sounds like a good idea to me.

But it also brings many more efforts.

Not necessarily. Weblate offers free hosting for free software projectes.
If not, I offer my servers for free.

The main task of this project is academic journal manuscript software, not a translation software.

Thanks for sharing your thoughts about the goal of OJS and PKP, but I think you are missing the whole picture.
From my perspective PKP project it's not only about tools... it's mainly about "Public Knowledge" and, as said, if when we have free alternatives, I have no doubts it's the way to go.
We need to ensure we don't depend on proprietary initiatives and supporting free software is also a way to empower the whole community.

We don't need to build or host all services on our own.

Of course we don't, but we can if we like.
At the end, moving from our own translation tool to a community build one it's also a way to optimize our dev resources.

GitHub is actually a non-free software but we are still using it for free and open-source projects, right?

And I think is an error, but don't get me started... ;-)

I'm also OK with weblate. It didn't know it before and found it to be very excellent after some googling. Hope this platform would be ready soon so we can get started to translate now..

I'm also OK with weblate. It didn't know it before and found it to be very excellent after some googling.

Great. :+1:

Hope this platform would be ready soon so we can get started to translate now.

Me too, but there is a lack of hands. :-(

Never mind what platform we use... in all cases, we need to translate our native XML to something standard (XLIFF sounds like a good plan), then setup the tool to define translation units and set the git-whatever exportation, and after this, change OJS (or every OxS tool) to read XLIFF instead of our native XML... and right now I have my hands full.

If somebody is interested in doing the job, I'm pretty sure he/she will make Marco be very happy. ;-)

Till then, I'm sorry but editing the xmls or using the native translation tool are the only ways the community has to contrib with translations.

@mtub, sorry to annoye you with this, but... you are the boss? ;-P
Weblate is fine or you prefer others?
Something to address in Pittsburgh or Barcelona sprint this year?

Here are two draft PRs that alter OJS and pkp-lib to use XLIFF sources instead of the current PKP-specific XML files:

To use them...

  1. Pull in the above modifications to your installation
  2. Go into lib/pkp and update your composer dependencies (composer update)
  3. Convert your locale files from PKP XML into XLIFF:
for name in `(find locale/*/*.xml && find lib/pkp/locale/*/*.xml) | sed -e "s/xml$//" | grep -v bic21 | grep -v countries | grep -v currencies | grep -v languages | grep -v emailTemplates`; do php lib/pkp/tools/xmlToXliff.php ${name}xml ${name}xliff; done

(This is equivalent to running php lib/pkp/tools/xmlToXliff.php path/to/source-locale-file.xml path/to/target-xliff-file.xliff for all translations that are present, excepting plugins.)

  1. Flush your file cache: rm -f cache/*.php

This is a work in progress, but should allow experimentation with XLIFF-based translation tools to see how well they work with this toolset.

(@mbria, @marco: https://github.com/pkp/pkp-lib/issues/4779#issuecomment-496722877)

@asmecher that looks great.

If I read well, with your changes we have now OJS ready to read XLIFF and, by the same price, also a php helper script to convert local XMLs to XLIFF, isn't it?

You are a really fast coder!! ;-) Thanks a lot!!

BTW, Travis is claiming something here: https://github.com/pkp/ojs/pull/2413
Do we need to worry?

@mtub , with Alec changes, now we only need somebody to configure weblate correctly and make some testing to see if we can integrate weblate with gitlab.

I won't have time for this, at least, till the end of the next month. :-(
Is there any body in PKP that can do the job or some money to hire someone?

Cheers,
m.

Hi @marcbria,

BTW, Travis is claiming something here: pkp/ojs#2413
Do we need to worry?

No, don't worry -- I didn't include the converted xliff files with the commits, so the tests will break because of untranslated locale keys. (It won't make sense to commit/maintain converted files until we're ready to take the plunge.)

I think the next step would be to get confirmation from someone who has worked with xliff files that the automatically-converted ones aren't totally crazy. I've attached one here for reference:
submission.xliff.txt

I forwarded your question to our CAT expert, and I hope he will answer in a couple of days.
Thanks a lot for your work Alec.

@marcbria, a few questions I'd want them to consider:

  1. Symbolic vs. English-language keys

We use symbolic locale keys in the code (e.g. navigation.journalHelp), then all locales, including English, are specified in locale files. This differs a bit from the Gettext standard in that usually English-language text would be embedded in the code, then the locale files would provide translations from English into other languages.

As a result, the XLIFF will have translations like this (for French):

     <segment>
        <source>author.submit.submissionCitations</source>
        <target>Fournir une liste structurée de références pour les travaux cités dans cette soumission.</target>
      </segment>

...instead of...

     <segment>
        <source>Provide a formatted list of references for works cited in this submission. Please separate individual references with a blank line.</source>
        <target>Fournir une liste structurée de références pour les travaux cités dans cette soumission.</target>
      </segment>

Will this work e.g. with Weblate?

  1. The distribution of locale files into various directories and repositories

The translations are split between a number of Git repositories:

Within the Application and pkp-lib repositories, there are several locale files (example:
pkp-lib
), divided roughly into topics. (I'm open to change on this, if it's not a good fit for standard practices.)

Tools like Pootle and Weblate appear to support Projects and Components. Will that mapping match well against our use of multiple repositories and sometimes multiple locale files within them?

Hi @asmecher

I have been out of the office a couple of days and I missed your last comment.
I will read it all in deep next Thuesday but let me advance some questions from Adrià (the CAT expert).

He need more time but at first sight he said he is very much agree with you about this point:

"We use symbolic locale keys in the code (e.g. navigation.journalHelp), then all locales, including English, are specified in locale files. This differs a bit from the Gettext standard in that usually English-language text would be embedded in the code, then the locale files would provide translations from English into other languages."

And he extends with:

"Of course, if the XLIFF file does not contain the original segments, the translation programs will not correctly recognize the file structure and the translators will not be able to translate.

I understand that the problem stems from the conversion process to XLIFF. If I do not remember badly, when creating XLIFF you should ask the converter to leave the targets blank. If you want, pass me the original file (which is behind submission.xliff) and try to take a look."

I send him this one: https://github.com/pkp/pkp-lib/blob/master/locale/es_ES/submission.xml

I planned to meet him next week and look together weblate to see if we can make PKP a proposal that I think you won't be able to refuse. (Right now, I can't say more) ;-)

Cheers,
m.

Thanks, @marcbria, sounds very intriguing! The XLIFF conversion tool was put together fairly quickly and there are surely a lot of ways of adjusting it. The submission.xliff file linked above comes from https://github.com/pkp/pkp-lib/blob/master/locale/en_US/submission.xml.

Hi, any update on this?

Not yet. Sorry. Let us one or two more weeks.

Hey guys, any progress on this issue?

Nop. Thanks for your interest, and sorry again.
We have a meeting next week that (hopefully) will offer some light in some issues we still need to fix.

We arranged a meeting with some CAT experts for tomorrow night.
BTW, if somebody is an expert translator (good knowledge of translation formats and tools) opinions and suggestions are welcome.

Any update?

Sorry again for the silence. I'm overwhelmed and sometimes is difficult to find time to write down what happened.

Long story short:

  • A fellow that is a chair in the OASIS XLIFF consortium expressed doubts about XLIFF being the best format for native OJS files and recommends PO instead.
  • Other fellow (also CAT expert) wants to help us in the format decision (we still like to make a deeper research) and also with the weblate configuration.
  • UAB offers it's resources to host the PKP translation server for free.
  • If PKP says we can wait till then, we will start working on it after summer vacations (with luck finished in BCN november's sprint).

@asmecher and @mtub what do you think about talking about this the next technical meeting?
or do you prefer a different space?

We are talking about this at the PGH Sprint, and Slack is down. Did we decide on .po files as the new standard?

Enjoy,

  • Clinton Graham
    Systems Developer
    University of Pittsburgh | University Library System
    412-383-1057

From: Marc Bria notifications@github.com
Sent: Tuesday, July 23, 2019 7:15:25 PM
To: pkp/pkp-lib pkp-lib@noreply.github.com
Cc: Subscribed subscribed@noreply.github.com
Subject: Re: [pkp/pkp-lib] Replace bespoke translation toolset with more standards-based options (#4779)

Sorry again for the silence. I'm overwhelmed and sometimes is difficult to find time to write down what happened.

Long story short:

  • A fellow that is a chair in the OASIS XLIFF consortium expressed doubts about XLIFF being the best format for native OJS files and recommends PO instead.
  • Other fellow (also CAT expert) wants to help us in the format decision (we still like to make a deeper research) and also with the weblate configuration.
  • UAB offers it's resources to host the PKP translation server for free.
  • If PKP says we can wait till then, we will start working on it after summer vacations (with luck finished in BCN november's sprint).

@asmecherhttps://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fasmecher&data=02%7C01%7Cctgraham%40pitt.edu%7Ca0e76fd1f91e48140ee608d70fc3a527%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C636995205269975984&sdata=mPIowXjBoLWjvqIzSMOSiV4F8P%2Faf2mp6nvRxa4Ns4o%3D&reserved=0 and @mtubhttps://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmtub&data=02%7C01%7Cctgraham%40pitt.edu%7Ca0e76fd1f91e48140ee608d70fc3a527%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C636995205269975984&sdata=%2BOn%2F3XOPe8B5qKw7d9sBqXhMt1vIjDermxPXnQrqoXQ%3D&reserved=0 what do you think about talking about this in tomorrow's technical meeting?
or do you prefer a different space?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpkp%2Fpkp-lib%2Fissues%2F4779%3Femail_source%3Dnotifications%26email_token%3DABVNJ2VLO52754UENVNVCKLQA6GI3A5CNFSM4HNZZ34KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2UWFKY%23issuecomment-514417323&data=02%7C01%7Cctgraham%40pitt.edu%7Ca0e76fd1f91e48140ee608d70fc3a527%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C636995205269985982&sdata=i%2BHwXG6GhU4JHnuaFuhyu2h6O3oSXWJkX2SFgt5KSU8%3D&reserved=0, or mute the threadhttps://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABVNJ2X5B7EIO7J7BNWJJBDQA6GI3ANCNFSM4HNZZ34A&data=02%7C01%7Cctgraham%40pitt.edu%7Ca0e76fd1f91e48140ee608d70fc3a527%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C636995205269985982&sdata=DywcoK7VU3sIyf1WabGrnrZqZeM%2B%2F7tPU2XZVBgx7Fs%3D&reserved=0.

We figured XLIFF was a better match than PO, IIRC.

I can't avoid a strong feeling that PO is not the right way so this is why I wanted to dig deeper to find projects working over XLIFF as a monolingual format (I read about symphony, but not more than this).

Talking with the experts, I discovered the XLIFF format was created as a bilingual "transport" format. It means that it's main goal is letting you move from your native format to something that every translation tool could read... but it didn't make much sense to us. I mean, if we are ready to move from our native XML format to something more standard, we won't need extra transformations and things will work smoothly (no real clues about this right now... just a strong feeling).

I mean, our final goal is building a translation server that will be able to read and publish (push/pull) directly to gitHub/Lab, so don't make much sense to me keeping intermediate formats to do the job.

A CAT expert called Marc (yes, we are not original with names) likes to join the team and will help us (please Marc, say something if you are reading :-)). I'm in Mexico right now, so we plan to work on this after summer vacations.

It work will be:
1) Study what formats are people using and make an argued proposal to PKP to take a final decision.
2) Specify clearly how the translation files need to be to let PKP make the code tuning.
3) Setup weblate to help us in the translation workflow (been able to push/pull to/from our gitHub/Lab).

I will be quite busy till PKPBCN19, but probably Nov. sprint will be a good moment to show advances and talk about this.

My question here is if PKP is able to wait till then or we need something before that.

We can certainly wait for the "right" solution. In the meantime, at the Pittsburgh Sprint I think we will update the Translation documentation so that it is accurate for the legacy XML files.

Thanks Clinton. So we have a plan. ;-)

Here in Mexico I completed some visible-missing chains for the es_ES translation but (even me) I'm too lazy to upload to push them to the different repos so I can't imagine translators doing the job. I mean, we all agree in this, but I wanted to show why I think the translation server it's a must.

BTW, I know but here I realized some terms need to be adapted so I found a fellow that likes to join the translation team and create and maintain a es_MX localization (may be over the es_ES version, just fixing main stuff). I hope to convince him to join us in Nov. in Barcelona and extend ojs lang list.

Best wishes for the Pittsburgh sprint. I'm sorry to miss this one.

I did a little bit of experimenting with POEdit 2.2 (which has Xliff support and also built-in crowdin dot com support).

If we don't mind running tools over the XLIFF files, we should be able to represent our translations adequately using both English text and symbolic locale keys. This should make them usable both in OJS and by third-party translation tools.

For each locale key in a non-English language, it would look like this inside the XLIFF file:

<unit id="locale-key-goes-here">
    <segment>
        <source>English text goes here</source>
        <target>Translated text goes here</target>
    </segment>
</unit>

(Per XLIFF requirements, we would need to convert locale keys to use dashes - instead of periods . as separators.)

When OJS loads the XLIFF files, it can ignore the <source> text, and just use the unit ID to identify the text. Thus both OJS and the XLIFF editor are happy.

In workflow terms, this will mean creating a tool that will munge non-English XLIFF files as follows:

  • Replace the contents of all <source> elements with their latest English equivalents from the en_US locale file
  • Flag all locale keys that appear in the translation, but not in English, as suspect/needing removal
  • Add empty placeholders in the translation file for untranslated English content

These are all things the translator plugin already does, albeit with our XML files rather than XLIFF.

It means we'll periodically have to run this over the translation files, e.g. working that into our translation schedule, but I can't think of any downsides beyond slight inconvenience.

This sounds good. My main concern stays that there will be translators who want to directly work on the translation files without using any other translation tool (POEdit, translation plugin…). It could still be possible, but how would we make sure that the source entries (English text) will stay unchanged? Additional comments:

  • It will be more obvious that English is the de-facto standard, something we tried to avoid to say in the past. I don't see a problem there.
  • In the past, we made it very clear that we did not want to have English text in the translated files so that missing translations can be easily spotted and as an incentive to complete the translation. We even released a plugin so that installations could revert this approach and display English text as a default when a translation is missing. Are we going to step away form this approach now? We will run into discussions then in cases where translators want to intentionally keep the English text because they think it works better than a translated string.

...How would we make sure that the source entries (English text) will stay unchanged?

We're following standard practice here -- when creating a translation template for the translator to work on, an empty XLIFF file is created by extracting English text directly from the source code using a variety of tools. No changes are ever applied back to the source code -- the <source> contents are essentially always throw-away with XLIFF, and the <target> elements are all that matters.

(Our only diversion from standard practice is that we will, in fact, have an English-language XLIFF file.)

we did not want to have English text in the translated files so that missing translations can be easily spotted and as an incentive to complete the translation. [...] Are we going to step away from this approach now?

No, we're going to keep using symbolic keys in the source code. There's no change here. XLIFF also has a nice flag for translations that need review; we can use that to hint to translators (through translation tools) that something might have changed in the English locale that could mean a change is required in the translator's locale.

Hi @asmecher,

The proposal you point in your post is one of the 3 options we have in front now.
Thanks a lot to take a look in detail and show that is feasible.

I recognize was really difficult to me understand why do we need to keep the English original string in our translation files, but this is how XLIFF. It's a bilingual format (thought for "transportation"... it means, go from one format to something standard). In my head this is still an big overheat... and something useless that add noise instead of make things easy.

Then, I though "why not using XLIFF as a monolingual format". Less chains to keep sync... smaller files but based on a more complete standard than PO.
After meeting our CAT experts (please Marc and Adirà say hello to the team to show that you are not my imaginary friends) and with a fellow of the OASIS consortium (specialized in XLIFF) we made preliminary research and we only found a couple of projects using this approach. I like to talk with them further to discover why only a few are doing this and make some testing to know how weblate will work with this.

What I'm still looking for is a KISS solution: I want OJS to work with a format weblate can pull/push directly to github/Lab without any intermediate conversion script. Clean and simple. I want weblate to manage the transaltion workflow and let our translation coordinator (@mtub) pull and push when he thinks work is good enough to create a new branch that dev team can merge.

The point is that more than I research... more doubts I have with XLIFF because it's goal as a format and I'm starting to understand why most of the people is mainly using PO, but I like to test weblate and check all this with Marc and Adrià to be completely sure about what I'm saying.

...why most of the people is mainly using PO...

There is only one thing that XLIFF offers that I haven't found in PO files: the ability to associate IDs with strings.

In XLIFF files, for each string, we have...

  • English version
  • Translated version
  • Unit ID

In PO files, we only have

  • English version
  • Translated version

Most projects won't need to use the unit ID for anything, but for OJS, we need it for the symbolic locale key.

If there is a capacity for PO files to include an ID with each string, we could use the same whole toolchain with PO files instead.

BTW, I asked Oscar... the oscarotero/gettext developer the right path and he suggests PO:
https://github.com/oscarotero/awesome-design/pull/1#issuecomment-522146338

BTW, I asked Oscar... the oscarotero/gettext developer the right path and he suggests PO

I trust his guidance in the general case but we have a few project-specific wrinkles. Using PO files would take us back into a situation we either have to...

  • Convert all our source code so the English is hard-coded, then PO files translate from English to Whatever, or
  • Use symbolic.locale.keys instead of English text in the PO file msgid, which I suspect won't work well with any of the translation editing/creation tools, as they wouldn't have access to the English text to present to translators.

I think both of those are show-stoppers.

We can work around that problem with XLIFF as I described above, by embedding the symbolic.locale.key in the unit ID attribute of the XLIFF file so that it contains all 3 pieces of information -- locale key, English text, and translated text. But I don't think PO files have an equivalent ID field.

All of this stems from the decision back in the first days of OJS 2.0 to map from symbolic.locale.keys to localized text, instead of English language into other languages.

Convert all our source code so the English is hard-coded, then PO files translate from English to Whatever, or

You're all going to hate me but I really think this should be our long-term goal. I think its admirable to try to avoid giving English preference over other languages, but our current approach of using symbolic keys is structured in a way that actively encourages the introduction of bugs during the developer workflow.

The great benefit of PO's approach of putting text directly into the source code is that it puts related concerns together. By separating text from the source code, we regularly introduce three problems when we write new code:

  • Typos are introduced to symbolic keys, typically when updating something that is not clearly visible during development or secondary to the focus of the work being done (eg - notifications, author/reviewer workflows).
  • Locale files are not loaded when they are supposed to be, typically when there is more than one route to a screen and the locale file is loaded during one route but not another. Or when updating something not clearly visible during development.
  • Outdated locale strings are not removed when they are no longer used. And in fact we have no way to conclusively determine that a locale string is no longer used, so they may remain forever.

Any translation solution that we aim for should also aim to reduce the number of bugs we introduce to the system and the ease of keeping the codebase clean of outdated stuff.

I've specifically said "aim for" and "long-term goal" because it is probably not feasible for us to jump directly to inlined language. If I understand Alec's XLIFF proposal to use id attributes as symbolic keys, I wonder if this gives us a route towards inlined language in the future.

Would it be possible to support two ways of using translatable strings like the following?

// In the source code
$current = __('locale-key-current', ['number' => 10]);
$total = __('{$total} items in total', ['total' => 100]);
<!-- In the local files -->
<unit id="locale-key-current">
    <segment>
        <source>Showing {$number} items</source>
        <target>Montrant {$number} articles</target>
    </segment>
</unit>
<unit>
    <segment>
        <source>{$total} items in total</source>
        <target>{$total} articles au total</target>
    </segment>
</unit>

In the medium term, our locale script would first look up a translated string by id. If none was found, it would then match against <source>.

If this is possible, this would give us a near-term migration that would get our translators on to sensible tools and allow them to start translating now. Then, over time, we would be able to refactor our source code to use inlined language, gradually reducing the amount of code we maintain the old way and reducing the number of translation-related bugs we introduce.

I understand this would not solve the concern that English is the primary language. But I don't see a way around this. The source code is written in English. English is the only common language for all of our developers and the language in which core development is discussed.

The best thing we can do to support other languages is to make it easy to translate the software and easy to maintain those translations over time. Inlined language will help us do that.

Convert all our source code so the English is hard-coded, then PO files translate from English to Whatever...

... I really think this should be our long-term goal.

^ Second.

IIRC, the reason this is a "show stopper" is because it will be painful and initially brittle to make the change. But, if this _('English text') is the industry standard (and we are the only project I know which is using symbolic keys), then we should be moving in that direction.

I updated the PRs while in transit between different corners of South America without Internet access and am only uploading the results now -- lest everyone think I'm ignoring the discussion above :)

It requires https://github.com/oscarotero/Gettext/pull/221 (which is not IMO ready for inclusion).

I'll consider the rest of this discussion and see how I feel about it in a few days -- thanks for the input, everyone!

So based on the opinions of @ctgraham and @NateWr I can be convinced to set our long-term goal for what seems to be the industry standard of unilingual text in the code, and mappings from there to other languages. That gives us a series of major transitions...

  1. Change the translation file format from PKP's XML to something standard
  2. Change the translation tools over from our home-brew stuff to something 3rd-party
  3. Change the text in the code from symbolic locale keys over to English-language text.

All three of those are pretty major, so I propose staging it like this:

Stage 1:

Stage 2:

  • Convert code to use English text instead of symbolic locale keys
  • (If desired, convert XLIFF to PO.)

PO and XLIFF are interchangeable in as far as we would use them, so I don't think a conversion later would be a big deal, if we think it's warranted. But if we wanted to use PO files, we'd need to move the English text into the code first.

Don't forget that we'd need to push the changes out to all the plugins etc., so we might want to leave considerable time between stage 1 and stage 2 for the adaptations to take hold!

@NateWr I can't hate you, but... :-)

There is only one thing that XLIFF offers that I haven't found in PO files: the ability to associate IDs with strings.

Please, take a look to this post:
https://phptherightway.com/#discussion-on-l10n-keys

Using "msgid as a unique, structured key" as @asmecher suggested in a former post, is a practice quite extended.

As Alec said, I also suspected it "won't work well with any of the translation editing/creation tools, as they wouldn't have access to the English text to present to translators" so this is why I was asking for time to test weblate/POedit/etc. before taking the decision.

PO and XLIFF are interchangeable in as far as we would use them, so I don't think a conversion later would be a big deal.

I have my concerns about this. In former post Oscar commented his library was not tested deeply with XLIFF files... so it's better if we expect surprises.

If this _('English text') is the industry standard (and we are the only project I know which is using symbolic keys), then we should be moving in that direction.

Well, the maing projects I know a little (Drupal and Wordpress) are hardcoding English text but as pointed before, we are not "the only project using symbolic keys" and there is not an standard here.

More than this, it's a fact that, after 20 years, PO is the _the facto standard_ but the format is quite limited and some big projects (as Ssymfony) are including XLIFF, YAML or JSON in their i18n libraries so looks like something is changing.

I mean, I understand Nate's arguments to include hadcoded English text in code and I really don't mind if English is always the primary language (because it is a fact and nothing to hide), but I still see benefits in keeping our symbolic locale keys. Not a very thought list but:

  • Typos introduced in English will mean a cascade of changes in all languages.
  • Same will happen if we want to change the writing style en English.
  • Translators usually work without context. Structured keys offer a little context that could be very useful.

Symfony people give more arguments:
https://symfony.com/doc/current/components/translation/usage.html#creating-translations

In short... if we keep symbolic keys, we have a little bit more flexibility in translations: we don't need to make literal translations and we can adapt it to each local need without been afraid the english chain will change a coma or a capital letter.

Any way, you are right in the fact that we need QA tools to remove locale strings that are not used any more. Hopefully, with standard formats, we can find a tool for this.

So, my personal conclusion here is:

Yes. PHP community is mainly working with PO, Oscar suggested PO, CAT experts won't recommend XLIFF and also suggested PO instead... I mean, if we need to take the decision right now I will go with PO, but I'm a chicken with big decisions and I want to be completely sure. ;-)

But: IF we decide PO is the right move... is to crazy moving directly to PO (with "msgid as a unique, structured key")? Just asking because I imagine moving to XLIFF will be a huge task so if we are going to do it... why not going directly to what we want.

But: IF we decide PO is the right move... is to crazy moving directly to PO (with "msgid as a unique, structured key")?

If you can find a good, free translation toolset that supports what you're proposing (PO files that are structured using symbolic keys), I'm game. But so far I haven't found one, which means the translators would be left trying to translate some.locale.key into French rather than the actual English source text.

(Thanks for the info on Symfony -- they appear to use Loco for their translations, which presumably would support the locale key-based philosophy, but they aren't FOSS and their free account level wouldn't remotely support our needs -- so I'm hesitant to bank on them.)

Hello, this is Marc Riera (the other Marc mentioned by @marcbria).

Sorry for jumping into the conversation this late, I would have appeared earlier but I wanted to experiment with possible solutions myself before proposing them and was not able to do so until now.

Basically, given the existing format (strings in XML called by the program using IDs), the easiest solution (as stated some posts above) would be to use XLIFF. This provides three key advantages:

  1. Minimal changes (no need to embed text in the source code, very similar to current approach).
  2. Context-aware translation (identical strings with different IDs can be translated differently).
  3. Standard-compliant format.

The approach would be to use two types of XLIFF files: one for the "base" language (English) containing only the IDs and source text, and another for the rest of languages, containing IDs, source text and target text. This is used in a project I am part of, openBVE, which switched to XLIFF recently after years using key=value text files (check https://github.com/leezer3/OpenBVE, assets/Languages, en-US.xlf is the base file).

I am completely aware of potential issues already mentioned in this discussion, specially keeping all the translation files in sync between languages. Fortunately, there is a great piece of software that would do the hard work for you and allow anyone to contribute to the translation directly from their browser: Weblate (https://weblate.org). It needs to be hosted somewhere, but libre projects may be hosted free of charge under certain conditions (check https://hosted.weblate.org/hosting/ for more information).

I did some tests myself with openBVE and Weblate with excellent results. With repository access configured using an application password, the program adds new strings for translation when they are added in the repository, and pushes translations to the XLIFF files automatically as users translate. The base language file is used as a template, meaning that adding new strings to the base file is enough: they are automatically added for the other language files. There is even the possibility of editing the base file inside Weblate to add, edit and remove source strings, removing the need to directly modify the XLIFF files.

Taking all this into account, using XLIFF seems logical, even if it is not widely used as a final format like and is technically an interchange format. Moving to the PO format in a "second stage" after the switch to XLIFF would then look like a step backwards (having English strings embedded in the source code would not provide any advantage in my opinion).

Regards,

Marc

@MarcRiera, thankyou, that is very helpful. What you're proposing is my "stage 1" but without stage 2. It's good to know Weblate operates well in this mode; I've already confirmed that poedit does as well, and since poedit has integration with CrowdIn, I suspect that'll serve too. Which suggests to me that we may have a decent ecosystem of translation software to scout through for good github integration.

@ctgraham, @NateWr, @marcbria, do you feel like we're getting towards a workable plan?

Yes, I think XLIFF with ids now for better translation tooling, with a goal to eliminate some of the technical maintenance issues down the line:

  • Ability to identify and remove unused strings.
  • Prevent "forgot to load" translation file errors (would be great if all locale strings were available or loaded in automatically somehow).

If we can sort those two things out in the long run I'd be happy to keep id-based locale strings in the source code, rather than English.

Ability to identify and remove unused strings.

There should be tools from the text extraction phase of the standard translation process (whatever fishes English-language text from the code for compilation into a .pot or .xliff file -- I'll investigate these. We'll need to support both PHP and Smarty. I used to have a few homegrown scripts to help with this but they were unreliable. I think the solution will involve some consistently-applied coding standards (e.g. never concatenate locale strings) as much as anything.

Prevent "forgot to load" translation file errors (would be great if all locale strings were available or loaded in automatically somehow).

I've been thinking about this too. We originally had all translations compiled into a single XML for each language, and cached the XML to flat files using our current caching methodology. At the time we considered the cost of loading all strings into memory for each request to be prohibitive; on the one hand, memory is cheaper than it was, but on the other hand each page load now involves many requests and the system has grown more complicated (= more translations).

Glancing at my cache directory, the English text for OJS comes to about 326kb of written-out PHP arrays (which is pretty efficient, size-wise). The more I think about it, the more this pales in comparison with the overall system size -- and as an added benefit, these PHP files are going to be bytecode-compiled and cached by most PHP installations.

So technically I see benefits to going to a single locale file per language (well, per repository, since we'll still have pkp-lib and OJS and plugins to consider). But I'd like word from a few of our translators (@MarcRiera, your opinion is welcome!) on what the impact would be to translators. XLIFF and PO both have organizational tools to sort translations into categories, but I suspect using those would lead us back to the same situation as "forgot to load" errors give us -- e.g. for .po files, a translation in the wrong context is equivalent to a missing translation as far as the calling code is concerned.

Ability to identify and remove unused strings.

Need to be tested, but this plugin is supposed to do the job:
https://docs.weblate.org/en/latest/admin/addons.html#cleanup-translation-files

Prevent "forgot to load" translation file errors (would be great if all locale strings were available or loaded in automatically somehow).

I also have been thinking about this for a while and I have doubts.

My first though is "a project as big like OJS with a single file sounds like a bad idea".

Yes... now memory is cheaper, but thinking this way is bad programming, isn't it?
I mean, this approach will make OJS more resource exigent (now is really lightweight) so it won't be a problem in single installations/cached platforms, but think in virtualization or containers where you won't be able to cache.

Apart of this, during development, those single files will be touched by everybody at the same time so I see here a potential collision point and the most important point... And this change will also mean more work for "stage 1" so more things can fail... so why move in this way if it won't completely fix the issue we are trying to address?

A structured approach based on folders (like we have now, with chains in common.xml if they appear in multiple folders) is much more efficient in resource usage, will give more context (to developers and translators) and will make migration easier.

And at the end (please @MarcRiera correct me) developers don't need to worry much worry much if they repeat a few chains in different translation files because, with a translation server, it will help us looking for coincidences and we can keep the chains sync.

In the other hand, it's true that reducing the number of translation files will make developers work easier (don't need to grep to discover where to place the translation chain) and adding 400k to our memory requirements don't looks like a big deal (that will be less if we compile PO as MO)... and probably it will facilitate the migration to PO (if "stage 2" is still a requirement).

So I don't have a clear winner here, but I think is better a conservative approach and plan a "stage 1" as simple as possible and avoid the unification.

Cheers,
m.

Files don't necessarily have to be combined into one in order to load them automatically -- either at once or on-demand. For example, an index could link keys to files and they could be loaded when an unloaded key is requested.

I just saw this discussion, so I'll leave my two cents =]

  • Keys vs plain text
    Keyed translations is the official standard in C# and it works fine. Both have pros/cons, writing text directly in English is cool, it will leave a less enigmatic message than ##key##, but it's more verbose/not as flexible as the keys (e.g. sounds dirty to pass and keep text around functions/classes instead of short keys). So, if I had to choose I would go with keys.

  • Breaking/organizing in files
    In my opinion loading translation files manually is error prone. E.g.: One piece of code I wrote was working only because the key was loaded by another file, and double checking if every key you used is being loaded by your code isn't very friendly... So if we're going to keep them organized, it's doable to build an indexer and auto load the right file.
    About organizing, I would personally avoid it... Since that will be an extra thing to think when creating a key (like naming variables).

  • Duplicated/missing keys
    Here I think we need some tooling, before pushing code, one could run a linter, which would basically display similar entries (levenshtein?!) and detect missing keys. I think it's also doable to write such tool if it doesn't exist.

The remaining problems will probably be resolved by adopting crowdin et al.

@NateWr and @jonasraoni, on this idea...

Files don't necessarily have to be combined into one in order to load them automatically -- either at once or on-demand. For example, an index could link keys to files and they could be loaded when an unloaded key is requested.

Thinking this over, there are two approaches off the top of my head:

  • Automatic index maintenance, where the index is regenerated when necessary. This would require too many file modification date checks to determine whether the file is old. (I'm not sure whether relying e.g. on a directory's file modification time would be reliable cross-platform? If it is, then two would be all that's needed.)
  • Manual index maintenance, where file a script needs running after a locale file modification. I'm not super enthusiastic about this -- it seems like a pain and common point of confusion.

And at the end (please @MarcRiera correct me) developers don't need to worry much worry much if they repeat a few chains in different translation files because, with a translation server, it will help us looking for coincidences and we can keep the chains sync.

In the other hand, it's true that reducing the number of translation files will make developers work easier (don't need to grep to discover where to place the translation chain) and adding 400k to our memory requirements don't looks like a big deal (that will be less if we compile PO as MO)... and probably it will facilitate the migration to PO (if "stage 2" is still a requirement).

Yes, it is always better to have duplicate strings than trying to save resources by calling the same string from different parts of the code. There are situations where a target language may need different translations depending on the context, so if everything was reused and there was such a situation, specific action by the developers would be necessary. In addition, CAT tools and translation platforms (such as Weblate) detect repetitions and similar strings, so it would be minimal effort by the translator.

there are two approaches off the top of my head

I was expecting a third approach, which would be a script to pre-compile the index. I expected it to be a pre-commit hook, so that the index is generated and automatically committed whenever a change is required.

If we find that's too difficult to do for some reason, it could be run during packaging, with an alternate developer mode that would run without the index during development, similar to how our legacy JS files are compiled.

pre-commit hook

That's totally do-able, but would exclude anyone working with translations outside of a git environment. Translation tweaks are a very frequent modification.

My understanding is that the workflow software we're adopting will commit changes back to the project. Even if translation tweaks are modified outside of git, one of us still has to commit it, right?

What I mean is that direct modifications to the locale files are one of the most frequent tweaks made by end users. (We do have the custom locale plugin to help avoid this, but it's not universally used, nor very well polished.)

Ahh..... hmm. What about a cached file that could be cleared through the admin like template/css cache?

A PHP data cache would be fine, much as we already have for locale XML; the existing data reset tool would work for that as well. It would be a chance in expectations, though. If we synchronize it with the move to XLIFF, it might be something we could tuck into a new workflow without needing a second round of disruptions...

Files don't necessarily have to be combined into one in order to load them automatically -- either at once or on-demand. For example, an index could link keys to files and they could be loaded when an unloaded key is requested.

Thanks Nate. I didn't thought in this. :+1:

My understanding is that the workflow software we're adopting will commit changes back to the project.
Even if translation tweaks are modified outside of git, one of us still has to commit it, right?

To clarify this: Yes, "commit back/forward to/from project" is one of the main goals the translation server needs to accomplish automatically, but... what else can be "done outside git"? I mean, changes will be in the translation server (by translators) or in git (by developers), isn't it?

A PHP data cache would be fine, much as we already have for locale XML; the existing data reset tool would work for that as well. It would be a chance in expectations, though. If we synchronize it with the move to XLIFF, it might be something we could tuck into a new workflow without needing a second round of disruptions...

Sorry... I'm completely lost here. @asmecher do you think you can simplify this for dummies?
If is not possible... no worry. I can perfectly live without knowing about this. :-)

The remaining problems will probably be resolved by adopting crowdin et al.

Thanks @jonasraoni for your "two cents". They make sense to me but let's see what dev guys say. :-)

Only a comment. About crowdin, I pointed license issues here so this is why I encourage to use weblate instead.

Finally... @asmecher in parallel to the deep-Dev discussion ¿do you think it's safe to go with "stage 1"?

If is ok for you, with @MarcRiera we planned to start working on this during September so I hope we can clarify those questions) and do the weblate configuration/testing during this month.

If we can set the server, plus the code you wrote to make OJS understand XLIFF... seams feasible to announce the translation server at the PKPBCN19, isn't it?

¿do you think it's safe to go with "stage 1"?

I think it's safe to go ahead with the stage 1 proposal as Alec has described it (https://github.com/pkp/pkp-lib/issues/4779#issuecomment-524495563). All of our discussions are about how to improve things beyond stage 1.

do you think you can simplify this for dummies?

We're talking about how to automatically build a file that will tell us where to look for translations. So, for example, when the code hits __('my.locale.string'), the application will know which locale file to parse and load. That way we don't have to load every translation file every time (which is not performant), but we also don't have to manually load the correct one (which is prone to mistakes).

The solution we're discussing regarding a PHP data cache is similar to how CSS and Smarty (.tpl) files are built and cached, because rebuilding them for every page load is not performant.

If is ok for you, with @MarcRiera we planned to start working on this during September so I hope we can clarify those questions) and do the weblate configuration/testing during this month.

@marcbria, the plan outlined in the 2 stages removes the need for us to check whether Weblate works with symbolic keys -- I think we're OK on that front. But if you could run a quick Weblate test with e.g. the French XLIFF samples I've generated, that would be excellent -- the only thing we need to ensure is that Weblate will preserve the <unit id="..."> attribute while editing.

We're talking about how to automatically build a file that will tell us where to look for translations. So, for example, when the code hits __('my.locale.string'), the application will know which locale file to parse and load. That way we don't have to load every translation file every time (which is not performant), but we also don't have to manually load the correct one (which is prone to mistakes).

Muuuch more clear now. Thanks @NateWr. ;-)

the plan outlined in the 2 stages removes the need for us to check whether Weblate works with symbolic keys -- I think we're OK on that front.

But are we completely sure about moving to PO (stage2)? Even nextcloud (IMHO one of the best php developments ever) is avoiding PO and is going to JSON.

And yes... XLIFF was originally though for "transportation", but in our case will mean a "minimal" change from our native xml format, other projects started this way and we can talk with the OASIS XLIFF fellow to ask them to encourage people walking this way.

If you three (@asmecher , @NateWr and @ctgraham) are sure about this I won't ask again, but I can't stop thinking we are moving in the wrong direction.

But if you could run a quick Weblate test with e.g. the French XLIFF samples I've generated, that would be excellent -- the only thing we need to ensure is that Weblate will preserve the attribute while editing.

Thanks you both.
We can do this test but I think @MarcRiera did the job before and said the key is preserved.

About weblate, I was concerned about all other features we pointed as a requirement (push/pull, workflows, permissions/roles, glossaries, translation memories... that we wrote somewhere but I can't find it now. @mtub do you have a copy somewhere?)

If we can set the server, plus the code you wrote to make OJS understand XLIFF... seams feasible to announce the translation server at the PKPBCN19, isn't it?

You probably intentionally missed to answer this question? ;-)

I found the notes Marco took in Heidelberg:
slides.html.txt

And here we made a comparative between weblate and transifex: (that include requirements)
https://docs.google.com/spreadsheets/d/1rSp350oJEEb6PYOfjpMzQnlTNGOH_UiJDpZU2GbXWbE/edit#gid=2143124171

If we can set the server, plus the code you wrote to make OJS understand XLIFF... seams feasible to announce the translation server at the PKPBCN19, isn't it?

You probably intentionally missed to answer this question? ;-)

Yup, I'm planning to include some XLIFF-compatible tweaks that interested parties can experiment with for the 3.2 release.

Gettext library feature add for supporting XLIFF unit IDs is now merged: https://github.com/oscarotero/Gettext/pull/221#event-2634582256

Another blocker, unfortunately :) https://github.com/oscarotero/Gettext/issues/224

Latest update:

I'm tinkering with both using Weblate because Weblate manages both monolingual (symbolic locale keys) and bilingual (main language in source code, mapping from there to secondary languages) modes for both file formats. (See https://docs.weblate.org/en/latest/formats.html for the list.)

Command line to batch-convert XLIFF:

for locale in `ls locale`; do for file in `fgrep -l locale.dtd locale/$locale/*.xml | cut -d "." -f 1`; do php lib/pkp/tools/xmlToXliff.php `echo $file.xml | sed -e "s/$locale/en_US/"` $file.xml $file.xlf; done; done
for locale in `ls lib/pkp/locale`; do for file in `fgrep -l locale.dtd lib/pkp/locale/$locale/*.xml | cut -d "." -f 1`; do php lib/pkp/tools/xmlToXliff.php `echo $file.xml | sed -e "s/$locale/en_US/"` $file.xml $file.xlf; done; done

I'm still favouring XLIFF because our XLIFF files are bog-standard, rather than stepping outside the spec, as monolingual PO files do (even if they're used in some projects in practice).

Yikes, it looks like Weblate may not support XLIFF 2.0!

Commands to batch-convert XML to PO:

for locale in `ls locale`; do for file in `fgrep -l locale.dtd locale/$locale/*.xml | cut -d "." -f 1`; do php lib/pkp/tools/xmlToPo.php $file.xml $file.po; done; done
for locale in `ls lib/pkp/locale`; do for file in `fgrep -l locale.dtd lib/pkp/locale/$locale/*.xml | cut -d "." -f 1`; do php lib/pkp/tools/xmlToPo.php $file.xml $file.po; done; done

A teaser :)

image

@asmecher this is impressive!! It's almost finished!
It's too crazy thinking that we will be able to make OJS3.2 es_ES and ca_ES transaltion over weblate, isn't it? :-)

I didn't find time to contact @MarcRiera and make the tests we promised.
I will be a little more relaxed at the end of the month...

Cheers,
m.

@marcbria wrote:

I didn't find time to contact @MarcRiera and make the tests we promised.

Because https://github.com/oscarotero/Gettext works with XLIFF 2.0, but Weblate seems to only work with XLIFF 1.2, I've chosen (at least for now) to focus on "monolingual PO" as our chosen format. So if you were planning to experiment with the sample XLIFF, I'd suggest holding off on that for now. Here are some sample PO files -- Weblate appears to work well with them in monolingual mode.

I missed this one: :-(

Yikes, it looks like Weblate may not support XLIFF 2.0!

So this other one made me think you were now focused and succeed on XLIFF:

I'm still favouring XLIFF because our XLIFF files are bog-standard, rather than stepping outside the spec, as monolingual PO files do (even if they're used in some projects in practice).

I just ask in weblate github if they are planing to support XLIFF 2.0 anytime soon.

Otherwise, I'm unsure about the options we have here:
a) move directly to PO.
b) look for a different free software translation server.
c) find how to downgrade to 1.2.
d) ...

Looking into the differences between both XLIFF specifications the downgrade (c) will be complex.
We look deep but [1] we didn't found any good alternative free soft (b) to do the job... so, does it mans we need to go with (a)?

Please Alec, let us know if we can help with something.

[1] [mojito](https://www.mojito.global/docs/refs/mojito-file-formats/#xliff-example) looks promising and supports xliff 2.0, but it's still very simple compared to weblate.

I just ask in weblate github if they are planing to support XLIFF 2.0 anytime soon.

Here's an already-open issue for XLIFF 2.0 support in Weblate: https://github.com/WeblateOrg/weblate/issues/972

I'm OK to go with a) move directly to PO, as long as everyone understands that we're going to be using monolingual PO files rather than bilingual PO files. This is not how PO files were initially intended, but there are projects that use them this way, and Weblate includes support for it.

Please Alec, let us know if we can help with something.

Yes, if it's possible to start putting together a production-capable Weblate install for us to use, that would be very helpful :)

We have multiple options here:

  1. We offered our journals production server (hudge cpu, plenty of space and memory) to host the weblate docker (with daily backups).
  2. Weblate itself offer a SaaS option somewhere.
  3. In their documentation, they talk about Bitnami and Yunohost.
  4. We can talk with other PKP partners with more resources to host the server.

Witch do you like best?
If we go with a docker approach, and we decide to move from one place to other, the migration it's supposed to be trivial.

Cheers,
m.

Ok... I couldn't resist the temptation. Server with last weblate version is up and running at: http://revistes.uab.es:8081

Sending by mail the login credentials to you as well as some indications about the docker configuration.
We still need to setup the git push/pull feature (in confidence, I have no idea about how it is supposed to work), but we can worry about this after isn't it?

BTW, if everything is as advanced as you show, I offer myself and my team as guinea pigs to make the es_ES and the ca_ES OJS 3.2 translations over the brand new server.

See you soon in Barcelona,
m.

BTW, looks like XLIFF 2.0 is not implemented a widely and there is not backwards compatibility to XLIFF 1.2 so PO solution is the more standard.

I like a lot XLIFF (even I'm still surprise it is only used as a "transport" format and only a few are using it natively) but the fact is that only a free CAT tools suppport XLIFF 2.0 so IMHO won't be a good idea work with xliff 2.0 if our translators are not able to work with their favourite tools external.

I missed one question you made in a former post:

I'm OK to go with a) move directly to PO, as long as everyone understands that we're going to be using monolingual PO files rather than bilingual PO files. This is not how PO files were initially intended, but there are projects that use them this way, and Weblate includes support for it.

I think we are fine with this (as you said, some projects work in this way), but let me ask @MarcRiera if it's a safe road.

@NateWr and I discussed how to stage this out and roughly decided:

  1. Review/merge changes into master with select locales converted to PO (en_US, fr_FR, es_ES, de_DE).
  2. Test/document translation process using these translations.
  3. Fork a pre-conversion branch for tardy XML translations to be submitted to (but do not advertise :); these can be converted with some headache if needed
  4. Batch convert all remaining translations
  5. Translation round for 3.2 using weblate!
  6. Around the 3.3 release mark, remove backwards-compatibility tools (https://github.com/pkp/pkp-lib/issues/5090)

@NateWr, for step 1, could you look at...
https://github.com/pkp/ojs/pull/2479
https://github.com/pkp/pkp-lib/pull/5107

(Obviously I'll generate PRs for OMP and probably PPS once we're ready for a merge.)

This looks great, with a remarkably small impact on the codebase outside of the locale files. :+1:

One question I had was how editing will work during development. Will I modify the en_US po files myself, similar to how its done now with the XML files? Or do these need to be generated from something?

Also, is there any tooling (po, gettext, weblate, etc) that will automatically identify changed/removed en_US strings, so I don't have to delete these from other locales when committing changes?

Translation round for 3.2 using weblate!
@asmecher if we manage to do it before PKPBCN19 I will pay all the beers you can drink during after the sprint. (not before because during the sprint we still need your brain) ;-)

Any update?

Server installed. Working on configuration.
Code upadated. Testing soon.
Goal? Translate OJS 3.2 with weblate.
Follow this thread for detailed info: https://github.com/pkp/ojs/pull/2479

Notes to self: On using import_json to create translation components...

  1. Generate the JSON. Use:
<?php

/**
 * Generate JSON-formatted component list for Weblate's manage.py import_json command.
 * Usage:
 *
 * git ls-files *.po | grep en_US | xargs php tmp.php https://github.com/pkp/ojs
 *
 * ...where https://github.com/pkp/ojs can also be https://github.com/pkp/pkp-lib
 * 
 */
$output = [];
array_shift($argv); // Take PHP script filename off the top
$repo = array_shift($argv);
foreach ($argv as $arg) {
    $pieces = explode('/', $arg);
    if (($index = array_search('plugins', $pieces)) !== false) {
        $slug = $pieces[$index + 1] . '-' . $pieces[$index + 2];
    } else {
        $slug = str_replace('.po', '', array_pop($pieces));
    }
    $output[] = [
        'slug' => $slug,
        'name' => $slug,
        'file_format' => 'po-mono',
        'filemask' => str_replace('en_US', '*', $arg),
        'vcs' => 'git',
        'repo' => $repo,
        'branch' => 'master',
        'template' => $arg,
        'license' => 'GNU General Public License v2',
    ];
}
echo json_encode($output);
  1. Execute this tool, saving the results to a JSON file:
git ls-files *.po | grep en_US | xargs php tmp.php https://github.com/pkp/ojs > components.json
  1. In the Weblate install, paste this into a local file and run the Weblate tool on it:
weblate import_json /tmp/tmp.json --project "open-journal-systems" --ignore
  1. Wait.

Also, is there any tooling (po, gettext, weblate, etc) that will automatically identify changed/removed en_US strings, so I don't have to delete these from other locales when committing changes?

The support for monolingual Gettext in Weblate includes the "needs editing" status (in addition to "untranslated" and "translated"). See right-most column in the table of "Translation types capabilities"
https://docs.weblate.org/en/latest/formats.html#translation-types-capabilities

This is super important to keep translations consistent with changes in the base English version:
https://docs.weblate.org/en/latest/workflows.html#translation-states

The only requirement is setting en_US as monolingual base language file in Weblate:

For correct use of monolingual files, Weblate requires access to a file containing complete list of strings to translate with their source - this file is called Monolingual base language file within Weblate, though the naming might vary in your application.

https://docs.weblate.org/en/latest/formats.html#bilingual-and-monolingual-formats

Any update? Is there any chance that we can start to translate the .PO files first? I think it is compatible with the later Weblate?

@veotax, what translation are you interested in working on?

Chinese (zh_CN). Is it ready to be translated now? And what project (ojs or pkp-lib) and what branch (master or stable-3_1_2) should I work on and send PR?

Can I copy the .po files from another folder like en_US to zh_CN and then translate the words? Or is there another process?

@veotax, excellent! The 3.1.2-x releases will continue to be in our old .xml format, but version 3.2 and onward will use monolingual .po files (supported by Weblate). I would recommend targeting 3.2 (due for release early next year) and using .po. I have converted only selected languages in the master branch to .po, but if you're ready to begin working with Chinese in that format, I can convert it as well. Just let me know! There is already an existing zh_CN translation, it just needs to be updated.

So I should use master branch.

What do you mean by I can convert it as well.? You mean you have a tool/script to generate .po files from existing .xml files? If yes, please do it. The original zh_CN .xml files already miss some words (not translated words in UI). So I think the generated .po files also miss them, right? So I need to translate them.

@veotax, I've just converted the .xml files over to .po for the zh_CN locale. (There's a tool for this in lib/pkp/tools/xmlToPo.php in the master branch, which will be released as OJS 3.2.)

For the .po files to work, you'll need a full checkout of the master branch, rather than an existing OJS 3.1.2.x release.

The commits with the file conversions are here: https://github.com/pkp/ojs/commit/831f4a386ef56ec68e407bd0eef42f108af64c5f https://github.com/pkp/pkp-lib/commit/57ccd97f7c42c9e31c8061be30a3921352a8f565

@asmecher: should emailTemplates.xml be converted to PO, too?

@fgnievinski, no, that's a different XML dialect; I'm still considering what best to do with that. It probably makes the most sense to convert it to .po as well, but we would need a mechanism to link email keys (e.g. NOTIFICATION) with email body, subject, and description for each language.

@asmecher check your mail and confirm weblate server is working, please. ;-)

@marcbria, check Slack :) Too many venues!

@asmecher still about emailTemplates.xml, how about replacing everything between <email_text key="NOTIFICATION"> and </email_text> for {translate key="email_text_key_NOTIFICATION"}? The HTML tags can be left inside the localization text, we translators are used to deal with those.

@fgnievinski, we may well end up doing something like that. Thanks for the suggestion!

@asmecher is there any way to list all un-translated words together in the PO file? I found the converted zh_CN PO files (https://github.com/pkp/pkp-lib/commit/57ccd97f7c42c9e31c8061be30a3921352a8f565) don't contain all the words. Currently, I have to copy each untranslated keyword from web UI (sometimes uncopiable) to PO and translate it. It's too slow.

image

@veotax, Weblate will help with that (and I suspect other translation tools capable of working with monolingual PO files as well). They'll do that by fetching the full list of locale keys from the English locale files, then comparing them with your translation to determine what's missing.

Thanks. So before our official Weblate is online, can you recommend some tool (local or web-based) that I can use to start to translate painlessly?

@veotax, our XML-based translation toolset used to do this, and I'm sure Weblate does, but I haven't tried other tools.

If you are working over the OJS native xml format, the only tool is the OJS translation plugin.

If you are working in the new PO files, you have plenty of them. I suggest you two:

Here you have an article with a list of the most usual ones:

It's still soon and some research need to be done, but I'm planning to encourage my translators to work offline if they are working in big translations. Weblate will be also great, but when you are doing a looong work, the web lag will kill your patience. Desktop tools include more features, are faster and when you finish, (hopefully) you will be able to upload the results to weblate.

Converting email templates to the PO format!

PRs:

This preserves the old XML format (locale/en_US/emailTemplates.xml), but replaces the (localized) contents with {translate ...} calls, e.g.:

        <email_text key="NOTIFICATION">
                <subject>{translate key="emails.notification.subject"}</subject>
                <body>{translate key="emails.notification.body"}</body>
                <description>{translate key="emails.notification.description"}</description>
        </email_text>

Then the translations themselves come from a new PO file, e.g. locale/en_US/emails.po for English.

There's a new conversion tool to help with this in lib/pkp/tools/xmlEmailsToPo.php. It generates the new PO file and changes the old files over to {translate ...} calls.

@marcbria, could you take a quick look? Does this seem like a workable approach? If so, I can merge and set up a new "Emails" component in Weblate.

@asmecher I'm missing something important here.

Does it means that we will need to convert xml to po (1), then load inside weblate (2), after this do the translation (3), then export outside weblate (4) and finally convert from po to xml (5) and push back to github (6)?

Why not moving from xml to po and work with a single format?
As far as weblate will be able to pull and push without any conversion work, isn't it?

Sorry in advance for the Mr.Obvius comment...
Knowing you I missed the real issue here.

@marcbria, not to worry, it's complicated :)

No, the translations can be managed inside Weblate as with everything else. There's no need to round-trip the files manually. The only reason we're keeping the old-style emailTemplates.xml is to map the three pieces of each translated template -- description, title, and body -- to the email template key. The addition of the {translate ...} calls to that file delegates the translation to the .po file.

The reason for the conversion tool is so that we can easily migrate the translation files when we upgrade translations to 3.2.

Thanks Alec. Clear now.

BTW, I'm kind of worried about how are we going to work with weblate.

I mean, is somebody testing translation memories features or translation workflows or glossaries or CAT tools or CLA agreement...?

I think is important to review it before we open the platform to translators... and maybe write some documentation.

Are you planning to do this? do you need help?

Cheers,
m.

For the moment we can't do anything with Weblate because it is still not able to deliver emails, thus register users. I'm working to get that resolved.

I don't have experience with the translation memory features but I know those exist -- some expert feedback/testing on those would be welcome.

As for documentation, I have some provisional documentation but it needs to be co-developed with some of our translators.

I plan to manage merges of translations manually, at least for now, and intend to manage CLA agreements manually for the first while (when granting translator accounts translation privileges during the registration process). I'd like to automate this later, but baby steps :)

Committed the email text to PO format PRs and adapted and committed the same changes for OMP.

@ajnyga, this will require a conversion to the PPS email files too -- would you like me to create a PR for that?

Thanks, go ahead. There are only a couple of templates there now and could be that even those are not all in use right now. But go ahead with the conversion.

@ajnyga, I opened a PR for that: https://github.com/ajnyga/ojs/pull/7

@asmecher Thanks for the amazing work! So what's the process to translate? I thought it would be:

  1. Pull pkp-lib and ojs from origin master. Or only pull ojs and latest pkp-lib (which contains latest translation) will be pulled automatically via Git Submodules?
  2. See if there's missing/wrong words in my OJS instance.
  3. Translate the words on https://translate.pkp.sfu.ca/

Then what to do next? How to make my translated words on Weblate sync to my instance?

Hi @hsluoyz!

I periodically merge the latest translations that have been provided via weblate into the official repos. We'll sometimes get translation contributions via other means and will merge them there as well. You can also download the latest files directly from Weblate:

image

Weblate itself pushes translations up to these repos: https://github.com/pkp-translations/
...so if you really want the latest content that's in weblate, you can get it there. If you're interested in working with a git master installation and round-tripping translation work with weblate, that might be the easiest way.

Thanks ! I saw translation commits in: https://github.com/pkp-translations. So I think I can use the following steps:

  1. Pull master from https://github.com/pkp-translations/pkp-lib and https://github.com/pkp-translations/ojs.
  2. See if there's missing/wrong words in my OJS instance.
  3. Translate the words on https://translate.pkp.sfu.ca/. My translation will become commits in https://github.com/pkp-translation
  4. Pull the code again, my OJS instance will have the latest translation.

Is this correct? BTW is https://github.com/pkp-translations a mirror to latest master branch or a stable release?

The pkp-translations repos are a mirror to the current master branch. Once we release OJS 3.2, then I expect we'll flip the translation toolset over to the stable branch until our next big push for a release from the master branch (that'll eventually be 3.3). Yes, your summary is correct -- we're still working out the kinks in our Weblate workflow, but my sense is that Weblate should be pushing up to pkp-translations more or less immediately.

Was this page helpful?
0 / 5 - 0 ratings