It should be possible to write smarter po file merger for git, actually using content of the files and pushing translations to new template. It might be indeed complex, but should by doable.
I'd note the comment I just made on issue #98 here. I am currently working on 3-way merge translate toolkit. It really does not have much to do with weblate. I only use .po, but I am writing it so that it should be extensible to any other format supported by translate toolkit (e.g. xliff).
I ran into a conflict today where the only difference was the po header timestamp.
I haven't tested it yet, but looking around I found this gist which points to git-whistles. The heavy lifting by git-whistles is done by msgmerge/msgcat/msguniq, the driver itself doesn't do any .po parsing (source code / explanation)
There is already one po merge driver in Weblate sources: https://docs.weblate.org/en/latest/admin/continuous.html#updating-repositories
Just to document current state:
Well, even msgcat alone (from gettext tools) is capable to perform the merge. But the merge itself is not sufficient. Most probably, the source of the conflict is pot file change and corresponding po file edit.
We want:
msgcat --use-first from_weblate.po from_upstream.po -o intermediate.po
msgmerge --previous --lang=my_language intermediate.po new_pot_file.pot -o merged.po
rm intermediate.po
Here is a problem: Weblate knows where the pot file is. Git does not. gettextize/autopoint knows it as well (because it creates the infrastructure). ⇒ It would be nice to integrate this feature to gettext tools, and additionally add to a tool generating a merge script (needs access to weblate component description).
Also note, that such a generic script requires pot file being merged before all po files.
Note that it would be better to replace --use-first by --use-latest (using PO-Revision-Date), but it is not implemented in gettext yet.
Here is a problem: Weblate knows where the pot file is. Git does not.
And I believe it shouldn't be relevant. Merge is a merge and the best approach is to do a 3-way merge. I have prototyped 3-way merge for PO files in https://github.com/jan-hudec/podiffutils. I have used it on project for some time (before changing jobs last summer) and it worked. It would deserve integrating into translate-toolkit, but it would need to be polished a bit first.
Also note, that such a generic script requires pot file being merged before all po files.
Not if you do a 3-way merge. Also, the only way to merge the pot is a 3-way merge and the algorithm is the same for pot and po, because pot is just a po with no translations filled.
@stanislav-brabec This is what we were discussing yesterday.
hello, thanks to your documentation and my lack of patience, I wrote this script: https://pagure.io/fedora-docs/translations-scripts/blob/master/f/solve_weblate_merge_failures.sh
only the pot file path is specific to the way the localization are stored: https://pagure.io/fedora-docs/translations-scripts/blob/master/f/solve_weblate_merge_failures.sh#_43
If I understand correctly, weblate should be able to provide the upstream repository to this script (and the full path of the local git repo for the weblate's one, yes, you can git clone /any/local/path)
As we already have the "new_base", exposed by the component API, I assume this should be possible to automate it?
In my experience, one of the biggest ongoing source of git merge conflicts are the gettext fields POT-Creation-Date: and PO-Revision-Date:. So I would not like to see those fields be required by Weblate, e.g. --use-latest. IMHO, I think they should be stripped if possible.
So I would not like to see those fields be required by Weblate, e.g.
--use-latest. IMHO, I think they should be stripped if possible.
I disagree. These fields are mandatory for po file format. And they are one of most important fields for human review, e. g. check of maintenance status, check that po files are in sync with pot.
In the context of Weblate, the .po files will always be committed to
git. That means there will always be a date of the git commit. So for
.pot files, POT-Creation-Date could be derived from the last git commit
that contains a change to the .pot file. For .po files,
PO-Revision-Date could be derived from the last git commit that changed
that .po file. But that could be complicated sometimes.
I guess those dates are used in the merge/update processes? Seems to me
that PO is the only format that has them. In my experience, it would be
worth it to me to disable the features that require POT-Creation-Date
and PO-Revision-Date so that there will be fewer merge conflicts.
But maybe you have something else in mind to solve this?
it would be worth it to me to disable the features that require POT-Creation-Date and PO-Revision-Date so that there will be fewer merge conflicts.
Again, these fields are mandatory parts of the file format definition (established 25 years ago). Third party tools can reject these po files, fail, or even applications and webs can fail (any runtime tool can check this header, and many really do it when they display translation credits). You would need to initiate file format definition change negotiation, and then request modification of all tools that depend on it, software and even webs. It would require ~10 years to complete.
It would make impossible fast translation status check. That it why the above will be rejected by GNU/FSF.
Example: gettext libc '' | grep ^PO-Revision-Date: allows to check glibc translation age in the _runtime_. There is no alternative for this feature.
Merge conflicts are caused by a diff/patch merge algorithm that is inferior for po files. This bug addresses it in the right way. Once we will provide the correct merge algorithm, rejects will rarely occur.
If there is a standard way to manage git merge algorithms for specific
file types, and this can be included in GNU/Linux and git distros, then
I think that sounds like a good solution. Or maybe it would be enough
that the Weblate server is setup with this PO merge algorithm, but my
guess is no.
If there is a standard way to manage git merge algorithms for specific file types
Git has support for custom merge strategies. https://git-scm.com/docs/git-merge#_merge_strategies
right, I mean that if it is something that can be included when someone
does apt-get install weblate or something like that. I think it is
important that whatever the solution is, it is easy to deploy and
transparent to use.
right, I mean that if it is something that can be included when someone does
apt-get install weblateor something like that. I think it is important that whatever the solution is, it is easy to deploy and transparent to use.
It is not related with Weblate. But yes, ideally Weblate should install it automatically to hosted git repositories.
It is easy to do such three way merge manually:
msgcat --use-first local-old.po remote.po -o temporary.po # or new --use-newest feature
msgmerge --previous temporary.po local-new.pot -o local-new.po
rm temporary.po
Writing a generic three way merge command working with any translation layout is more complicated.
it would be worth it to me to disable the features that require POT-Creation-Date and PO-Revision-Date so that there will be fewer merge conflicts.
- Again, these fields are mandatory parts of the file format definition (established 25 years ago). Third party tools can reject these po files, fail, or even applications and webs can fail (any runtime tool can check this header, and many really do it when they display translation credits). You would need to initiate file format definition change negotiation, and then request modification of all tools that depend on it, software and even webs. It would require ~10 years to complete.
- It would make impossible fast translation status check. That it why the above will be rejected by GNU/FSF.
Example:gettext libc '' | grep ^PO-Revision-Date:allows to check glibc translation age in the _runtime_. There is no alternative for this feature.- Merge conflicts are caused by a diff/patch merge algorithm that is inferior for po files. This bug addresses it in the right way. Once we will provide the correct merge algorithm, rejects will rarely occur.
While I agree it's a standard I still think that
I say ,let the end-user make its own decisions and decides if he wants it in the file or not.
If this doesn't break other functions inside Weblate of course.