Hi,
Arabic Tatweel U+0640 and related tatweel letters are decorative letters and usually does not pass spell checking and are useless in a translation and pollute TMs. Can we add a check for them to inform users to avoid using them ?
http://graphemica.com/%D9%80
Something like in : https://hosted.weblate.org/checks/
Regards,
So this char should not be used at all? Can you please post some reference on that?
I've just quickly checked translations on Hosted Weblate service and it's used in 314 translations right now.
Hi,
In this document, please search for Kasheeda or Tatweel :
http://thesai.org/Downloads/Volume8No2/Paper_37-Sentiment_Analysis_Challenges_of_Informal_Arabic_Language.pdf
RFC5564, _section 2.1.2. Kasheeda or Tatweel (Horizontal Character Size Extension)_ in domain names.
I think that this section of the RFC should be applied in all translation plateforms. Users can still use Tatweel or Kasheeda in arabic texts in a Word Processor such as LibreOffice of LateX but if the words are extracted to be normalized, it is recommended to remove U+0640 and it variant ligatures :
{U+FCF2, U+FCF3, U+FCF4, U+FE71, U+FE77, U+FE79, U+FE7B, U+FE7D, U+FE7F};
Regards,
Thanks for adding more detailed information! I think such check certainly makes sense, it should be easy implement, there is similar check for zero width space which could be used as a inspiration:
Thank you for your report, the issue you have reported has just been fixed.